Report from Tucson: from characters to annotations with text mining

March 30, 2013

There is a wealth of phenotypic information in the evolutionary literature that comes in the the form of semi-structured character state descriptions. To get that information into computable form is, right now, an awfully slow process. In Phenoscape I, we estimated that it took about five person-years in total to curate semantic phenotype anphenowordcloudnotations from 47 papers. If we are to get computable evolutionary phenotypes from a larger slice of the literature, we really need to figure out ways to speed this up.

One promising approach is to use text-mining.  This could contribute in a few different ways.  First, one could efficiently identify all the terms in the text that are not currently represented in ontologies and add them en masse, so that data curation does not have to stop and resume whenever such terms are encountered. Second, one could present a human curator with suggestions for what terms to use and what relations those terms have to one another, speeding the process of composing an annotation.

CharaParser, developed by Hong Cui at the University of Arizona, is an expert-based system that decomposes character descriptions into recognizable grammatical components, and it is now being used in several different biodiversity informatics projects. Baseline evaluation results from BioCreative III showed that a naive workflow combining CharaParser and Phenex, the software curators use to compose ontological annotations and relate them to character states, was capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average) but had difficulty translating those into ontological annotations.  This first iteration workflow also was not yet reducing curation time.

In March, a small contingent from NESCent (Jim Balhoff, Hilmar Lapp and Todd Vision) visited Hong Cui’s group in Tucson. We talked through improvements to CharaParser and the curation workflow, brainstormed plans for a more thorough set of evaluation tests, began refactoring of the code so that it can be more easily shared across projects, and gained a better understanding of what features make a character difficult to curate for humans vs. text-mining.  We made substantial progress on all fronts, and are looking forward to seeing how much improvement in the accuracy and efficiency of curation will be achieved in the next round of testing.

We are also pleased to report that the CharaParser codebase will now be available from GitHub under an open source (MIT) license.


Phenoscape goes mobile

July 9, 2012

Previous layout of the KB faceted browsing page on the iPhone. Text is tiny and must be zoomed and panned.

The NESCent Informatics group periodically holds “hack days”, one day mini-hackathons where we take a break from our usual schedule and push forward on a specific topic of interest. Most recently, the topic was support for the mobile web. I took a look at the Phenoscape Knowledgebase layout on the iPad and iPhone. In general the site did not adapt well to small screen sizes.

In order to avoid serving different layouts to specific devices, I applied techniques from the Responsive Web Design approach, which uses new functionality from CSS 3 to dynamically adjust the page layout based on the size of the browser window. In the new layout, when the window is small, controls move from the side to the top, allowing both the controls and the content table to use the full screen width.

Using responsive web design, the controls and content become stacked on small screens.

The new layout works across most of the pages on the Knowledgebase site. In general, it is a big improvement on mobile devices. However, there are a few remaining glitches to address, such as controls that appear upon mouse hover: difficult to use on a touchscreen device, where there is no mouse.


Third beta release of Phenoscape Knowledgebase 2.0

February 7, 2011

Phenoscape Knowledgebase 2.0 beta release 3 is now available at http://kb.phenoscape.org/. This version includes an enhanced publication info interface [example] which displays the original character matrix, as well as a list of taxa including the taxonomic names and museum specimens used in the dataset. Other recent developments in the KB are global term info popups and hierarchical browsing of ontology terms on their info pages [example].

We have also improved our software and data release processes so that the public Knowledgebase can easily keep up with new developments and the latest data updates from our curators. Looking forward, the next major feature to be added to the Knowledgebase is a faceted browsing interface which is currently under development. This interface should help provide an overview of how the data are organized within the various ontologies used in the Knowledgebase.


Lapp gives NCBO webinar for Phenoscape

November 17, 2010

Hilmar Lapp gave a great overview today of the ongoing work in the Phenoscape project to 29 participants in the NCBO Webinar series.  This series showcases new projects, technologies and ideas in biomedical ontology, many of which use ontologies for interoperability.  Hilmar presented the biological context (evolution, conservation, development, etc.) into which our work fits, and the challenges involved in representing phenotype.  A videorecording of his talk will be posted in case you missed it.

Update: The slides are also posted on Slideshare


Revising the Knowledgebase interface

March 24, 2010

We have been developing mockup versions of new web interfaces for the Phenoscape Knowledgebase.  In order to design an updated interface which is both more powerful and easier to use than the existing one, in February I presented a series of mockups to faculty, post-docs, and graduate students at the University of Oregon, the home of ZFIN.  Following user-testing expertise at ZFIN, I met with the researchers in pairs and recorded their feedback on newly designed interfaces for viewing anatomical and taxonomic terms within the ontology hierarchy, configurable queries for phenotype annotations, and data visualization on phylogenetic trees.  The feedback proved to be extremely valuable and has led to several modifications to the planned interface revisions.


Phenex 1.0.3 released

February 23, 2010

Phenex 1.0.3 is now available.  This release fixes a serious bug which caused Phenex to append modified phenotype annotations within files, instead of replacing the previous data. Phenex will now read and write NeXML files correctly. It should also automatically recover the latest data from files saved with older versions of Phenex.

All Phenex users should replace their current copy of Phenex with the latest release. It can be downloaded from the Phenex homepage on the Phenoscape wiki.


Phenoscape internship experience

February 23, 2010

Hello all,

As an online student of Bioinformatics based in Nairobi, Kenya, I had a strong desire to undertake a project that would enhance my knowledge and skills in software development. Hence, after completing MSc. Course work at the University of Manchester, UK, I was happy to be awarded an internship from the Phenoscape project for an 11-week traineeship beginning September 21st, 2009 at the National Evolutionary Synthesis Center (NESCent). This project seeks to establish the developmental and genetic basis of the astonishing morphological heterogeneity across diverse species. In addressing this, a rich and rigorous knowledge base, PhenoscapeKB, constituting evolutionary variable characters across a clade of fishes connected to mutant phenotypes from ZFIN has been developed. Core to the PhenoscapeKB is the modeling of the character entities   using ontologies thus facilitating the knowledge synthesis via logical/mathematical reasoning. Read the rest of this entry »


Phenex 1.0.2 released

January 20, 2010

Phenex 1.0.2 is now available.  This is a minor update which fixes an interface problem caused by a recent Mac OS X Java update.  It also fixes a file loading bug which occurred on specific older versions of Mac OS X.  Phenex can be downloaded from its homepage on the Phenoscape wiki.


Phenoscape solicits feedback on new interfaces at AmphibAnat Kansas City meeting

December 4, 2009

In early November Wasila and I attended the AmphibAnat workshop in Kansas City, MO (Nov. 5-8) that was organized by Anne Maglia. As you may know, Phenoscape has a close relationship with this group, not only because they work on herps (ichthyologists and herpetologists have a long tradition of working together…), but because they are also developing ontologies to annotate the published comparative anatomical literature. I presented the status of our work in Phenoscape to the large group (~40) of amphibian development and anatomy experts who were present. As these folks added new terms, synonyms, and images to the amphibian ontologies over the course of the next few days, we solicited comments on the prototypes of three new interfaces for the Phenoscape Knowledgebase. Using both images and paper copies of these prototypes, we invited people to sit down with us on a one-on-one basis and describe in detail what worked and what was missing or unclear. The feedback was extremely useful, and we appreciated the AmphibAnat time. We have now gone over all the comments within Phenoscape and logged them individually to FogBugz, our internal tracking system. We’ll be generating new versions of these prototypes through early February, when we plan a formal round of usability testing.


Beta release of the Phenoscape Knowledgebase

October 12, 2009

We are pleased to announce the beta release of the Phenoscape Knowledgebase (KB) at http://kb.phenoscape.org/ and would like to solicit feedback.

Phenoscape KB integrates phenotypic data from genetic studies of zebrafish with evolutionarily variable phenotypes from the literature of fishes. It currently contains 333,987 phenotype statements about 2,310 taxa (mainly ostariophysan fishes), from 51 publications, and 11,267 phenotype statements about 2,953 genes retrieved from ZFIN (zfin.org). You can explore these data by searching for anatomical terms, taxa (by Latin name), or genes (by ZFIN gene symbol). Read the rest of this entry »