What are the challenges in building, visualizing and using the Tree of Life? How can we best utilize and build on existing phylogenetic knowledge and look ahead to address the challenges of data integration? Recently, fellow Phenoscaper Jim Balhoff and I attended the first FuturePhy workshop in Gainesville, Florida (February 20-22, 2016). The workshop brought together three taxonomically-defined working groups (catfish, beetles, barnacles) to build megatrees from existing phylogenetic studies, and identify and begin applying diverse data layers for their respective groups. Open Tree and Arbor personnel were on hand discuss and help solve issues in data integration.
There is a wealth of phenotypic information in the evolutionary literature that comes in the the form of semi-structured character state descriptions. To get that information into computable form is, right now, an awfully slow process. In Phenoscape I, we estimated that it took about five person-years in total to curate semantic phenotype annotations from 47 papers. If we are to get computable evolutionary phenotypes from a larger slice of the literature, we really need to figure out ways to speed this up.
One promising approach is to use text-mining. This could contribute in a few different ways. First, one could efficiently identify all the terms in the text that are not currently represented in ontologies and add them en masse, so that data curation does not have to stop and resume whenever such terms are encountered. Second, one could present a human curator with suggestions for what terms to use and what relations those terms have to one another, speeding the process of composing an annotation.
CharaParser, developed by Hong Cui at the University of Arizona, is an expert-based system that decomposes character descriptions into recognizable grammatical components, and it is now being used in several different biodiversity informatics projects. Baseline evaluation results from BioCreative III showed that a naive workflow combining CharaParser and Phenex, the software curators use to compose ontological annotations and relate them to character states, was capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average) but had difficulty translating those into ontological annotations. This first iteration workflow also was not yet reducing curation time.
In March, a small contingent from NESCent (Jim Balhoff, Hilmar Lapp and Todd Vision) visited Hong Cui’s group in Tucson. We talked through improvements to CharaParser and the curation workflow, brainstormed plans for a more thorough set of evaluation tests, began refactoring of the code so that it can be more easily shared across projects, and gained a better understanding of what features make a character difficult to curate for humans vs. text-mining. We made substantial progress on all fronts, and are looking forward to seeing how much improvement in the accuracy and efficiency of curation will be achieved in the next round of testing.
We are also pleased to report that the CharaParser codebase will now be available from GitHub under an open source (MIT) license.
A new bugfix release of Phenex is available. Phenex 1.4.2 addresses the following issues:
- Fixed missing “not” relationship in post-composition editor, https://github.com/phenoscape/Phenex/issues/14
- Fixed term filters to allow choosing provisional terms, https://github.com/phenoscape/Phenex/issues/13
- Fixed “freezing” panels display anomalies
- Fixed some ontology loading issues by updating internal OBO-Edit components to latest versions
We have recently released version 1.2.1 of our Phenex annotation software. This release adds some functionality for easier collaborative editing of data files. While our curators have used Subversion revision control software in the past, the new features make it more reliable to share Phenex data files with user-friendly file synchronization software such as Dropbox. While a NeXML document is open in Phenex, the application monitors for changes to the document file in the background. If the file is being shared via Dropbox and is simultaneously edited by someone else, Phenex will alert the user that the file has changed and offer to load the new version. If there are no unsaved edits then Phenex will reload the file automatically. Phenex 1.2 also provides an autosave feature which saves the document after every edit—this reduces the chance that the file might be edited elsewhere while one has unsaved changes, avoiding complicated file merges.
Phenoscape and colleagues meet with PATO on ontology and phenotype representation issues, Sept. 25-27, 2010November 12, 2010
At the end of September, members of Phenoscape (Mabee, Balhoff), the Hymenoptera Anatomy Ontology (HAO) project (Yoder, Deans, Seltmann) and TAIR (Huala) met with developers of the Phenotype and Trait Ontology (PATO) (Gkoutos, Mungall, Westerfield, Lewis) at the University of Oregon. Our discussions were focused on finding solutions to problems that have arisen as a result of PATO ontology structure, and problems for representing phenotypes in the EQ model, which have arisen in the course of annotating comparative phenotype data from the fish and hymenoptera literature. We prepared for this meeting by developing a list of common issues and importantly, specific examples, on a Google doc shared among participants. We all co-edited this document during the meeting with notes, decisions and examples, and we ‘published’ this Google doc for you all to see. A number of important changes to the PATO hierarchy were proposed and subsequently made. We also clarified best practices for modelling some common but tricky phenotypic features. One additional outcome was the participants strong recommendation that a ‘shape jamboree’ be held to improve the usability of this branch of the PATO ontology. Read the rest of this entry »
We’re happy to report that a paper describing the Phenex curation tool has just recently been published in PLoS ONE:
Balhoff JP, Dahdul WM, Kothari CR, Lapp H, Lundberg JG, et al. (2010) Phenex: Ontological Annotation of Phenotypic Diversity. PLoS ONE 5(5): e10500. doi:10.1371/journal.pone.0010500.
Abstract: Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices. Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.