March 30, 2013
There is a wealth of phenotypic information in the evolutionary literature that comes in the the form of semi-structured character state descriptions. To get that information into computable form is, right now, an awfully slow process. In Phenoscape I, we estimated that it took about five person-years in total to curate semantic phenotype annotations from 47 papers. If we are to get computable evolutionary phenotypes from a larger slice of the literature, we really need to figure out ways to speed this up.
One promising approach is to use text-mining. This could contribute in a few different ways. First, one could efficiently identify all the terms in the text that are not currently represented in ontologies and add them en masse, so that data curation does not have to stop and resume whenever such terms are encountered. Second, one could present a human curator with suggestions for what terms to use and what relations those terms have to one another, speeding the process of composing an annotation.
CharaParser, developed by Hong Cui at the University of Arizona, is an expert-based system that decomposes character descriptions into recognizable grammatical components, and it is now being used in several different biodiversity informatics projects. Baseline evaluation results from BioCreative III showed that a naive workflow combining CharaParser and Phenex, the software curators use to compose ontological annotations and relate them to character states, was capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average) but had difficulty translating those into ontological annotations. This first iteration workflow also was not yet reducing curation time.
In March, a small contingent from NESCent (Jim Balhoff, Hilmar Lapp and Todd Vision) visited Hong Cui’s group in Tucson. We talked through improvements to CharaParser and the curation workflow, brainstormed plans for a more thorough set of evaluation tests, began refactoring of the code so that it can be more easily shared across projects, and gained a better understanding of what features make a character difficult to curate for humans vs. text-mining. We made substantial progress on all fronts, and are looking forward to seeing how much improvement in the accuracy and efficiency of curation will be achieved in the next round of testing.
We are also pleased to report that the CharaParser codebase will now be available from GitHub under an open source (MIT) license.
February 26, 2013
At the end of October 2012, the working groups of the Phenotype Research Coordination Network (RCN) all met at the Asilomar Conference Center, in Pacific Grove, CA. One of the groups, the Vertebrate working group, made it their goal to discuss methods of representing phylogenetic and serial homology in anatomy ontologies, an issue that is central to Phenoscape as well. Though common ancestry is implicit in the semantics of many classes and subclass relationships (see for example the ‘homology_notes’ for digit in Uberon), most multispecies anatomy ontologies, including Uberon, VSAO, and TAO, do not assert homology relationships between anatomical entities. Nonetheless, homology is central to comparative biology, and therefore to enriching computations across data types, species, and evolutionary change.
Read the rest of this entry »
March 7, 2012
On 15–16 February 2012, I visited NESCent to work with Peter Midford, Jim Balhoff, and, especially, Wasila Dahdul. The focus of my trip was to push forward on the continued development of the Amphibian Anatomical Ontology and the integration of phenotypic data for amphibians into the larger Phenoscape project.
With Peter Midford, I worked to make a significant update to the Amphibian Taxonomy Ontology based largely on a recent revision to the higher-level taxonomy used on AmphibiaWeb (for which I am part of the steering committee). AmphibiaWeb provides an excellent resource for Phenoscape and other related projects because it provides a list of currently recognized species of living amphibians and is updated daily.
The majority of my visit was spent working with Wasila Dahdul on issues related to the Amphibian Anatomy Ontology (AAO) and on curating our first evolutionary dataset related to the fin–limb transition (Ruta et al., 2003). During this work, we plowed through a significant portion of AAO terms lacking parent terms (either adding parents or synonymizing the terms with others in either VAO or AAO). We also evaluated whether to add terms to the AAO that are present in the Xenopus Anatomy Ontology (XAO; Xenopus is a genus of African frogs used as a model system) but absent in the AAO. In some cases, this led to recommending that those terms be removed from the XAO. As we have started to curate morphological characters related to the limbs from the study by Ruta et al. (2003), we encountered many terms not present in existing anatomy ontologies, such as AAO or the Vertebrate Anatomy Ontology. Some terms had been slated for inclusion in the Amniote Anatomy Ontology (AmAO) being developed by Nizar Ibrahim and Paul Sereno (University of Chicago). Because these terms are also present in non-amniotes, we are recommending that they be migrated from the AmAO to the higher-level VAO.
As we start to focus on curating phenotypes from the literature of vertebrate paleontology, a few issues are emerging. One important issue is that curation of data from paleontological studies will likely necessitate adding a field to our information for specimens to accommodate free text alongside museum abbreviations and catalog numbers. The reason for this is that paleontological studies can rely on a combination of materials, including both specimens and examination of literature. We will also need to add to and refine the collection of museum codes used to curate specimen data. These last points about accurately curating data related to specimens examined are important if we are to use the Phenoscape knowledgebase to point to records for those same specimens in on-line databases, or if databases (such as those for museum collections) want to point to records of specimens in the Phenoscape knowledgebase.
February 3, 2012
With the help of Phenoscape and DeepFin intern Ben Frable, I recently finished adding 117 French anatomical terms and synonyms from Chanet & Desoutter’s glossary publication  to the Teleost Anatomy Ontology (TAO). These authors spent many years defining and translating Paul Chabanaud’s anatomical analyses of flatfishes into modern French and English to help researchers understand his important publications. Adding these terms to the TAO takes their translation one step further, enabling computers to link Chabanaud’s unusual terms to an ontology ID for each anatomical ‘concept’, which in turn enables connections among all phenotypic and related data that reference this ID.
These synonyms can now be used in searches of the Phenoscape Knowledgebase. For example, you can see the French synonyms for ‘paired fin’. One can imagine ultimately being able to select a preferred language or term label when browsing the ontology in the Knowledgebase.
These were the first set of foreign terms to be added to the teleost ontology, and we had to tweak the Phenoscape Knowledgebase interface to display the diacritical marks correctly. We are ready to accept more! Please send me anything you’d like added or changed to the TAO term tracker.
 Chanet, B., & Desoutter-Meniger, M. (2008). French-English glossary of terms found in Chabanaud’s published works on Pleuronectiformes. Cybium, Electronic Publication no 1:1-23. PDF download
November 22, 2011
Last month, I (Jim Balhoff) and Hilmar Lapp attended the Biodiversity Information Standards meeting (TDWG 2011), in New Orleans. As a representative of both Phenoscape and the Hymenoptera Anatomy Ontology project, I presented a poster, with co-authors Matt Yoder and Andy Deans, detailing an OWL model showing the explicit semantics of linking an Entity–Quality (EQ) phenotype to evolutionary character matrix data and taxonomic specimens. While EQ can be thought of as simple ontological tags on descriptive data, modeling phenotypes within a more explicit logical framework allows us to make use of more powerful automated reasoning. It also provides a consistent interpretation for EQs across projects annotating phenotypes (for example, Phenoscape and HAO).
Of particular relevance to our poster was another presented by Cam Webb. Cam has created an OWL-compatible version of Darwin Core which can be used to describe specimen metadata in RDF. We made similar use of Darwin Core in our poster, but we are looking into adopting Cam’s Darwin-SW for this part of the model.
Overall there was a lot of interest in semantic technologies at TDWG, ranging from the initial meeting of an RDF/OWL working group to other projects that are not using semantic technologies but seem well suited for RDF.
November 22, 2011
While working to describe two species of lizardfish (Synodus) with Carole Baldwin at the Smithsonian National Museum of Natural History, she received an email from Paula Mabee asking if she knew or had any students interested in working on the Phenoscape Project. I had realized that with advances in technology and communication, evolutionary biology and all science was headed towards a future of large-scale interdisciplinary collaborations to help address big questions and make tools and data readily available. Therefore, I immediately jumped on the opportunity to work on Phenoscape!
With the support of funding from DeepFin, I started my internship with Phenoscape at the National Evolutionary Synthesis Center (NESCent) in August 2011. My three months here at NESCent have flown by and even though it is my last day, I am just as excited about the project as the day I started! Working with Wasial Dahdul, Peter Midford and Jim Balhoff has enabled me to learn and understand a great deal about databases, collaboration and morphology. Phenoscape has completely changed the way I think about phenotypic characters. Breaking them down into logical statements in Phenex really allows you to understand a character as it fits in the bigger picture. I was able to work with Wasila in forging interdisciplinary ties by contributing to other ontologies and databases, such as PATO and PaleoDB. Additionally, working to assist in the expansion of Phenoscape to incorporate all vertebrates taught me a lot about the origins of vertebrates and the plethora of prehistoric life I did not realize existed- including my new personal favorite prehistoric fish, Jagorina!
NESCent is an amazing place. Being one of the few people here without a higher degree or a long list of publications under their belt, I was initially a little intimidated. However, the informatics group, post-docs and professors have been great and pushed me to participate in seminars and intellectual discussion. This is a stimulating environment that facilitates thinking outside the box and looking at bigger picture issues in evolutionary biology.
I am excited to continue my work on Phenoscape offsite back at the Smithsonian and I hope to contribute throughout my graduate career in Dr. Brian Sidlauskas’s (former NESCentian and Phenoscape tester and contributor) lab at Oregon State University.
Graduate Student, Oregon State University
Student Researcher, Smithsonian National Museum of Natural History
September 23, 2011
Last month I visited Xenbase and Aaron Zorn’s lab at the Cincinnati Children’s Hospital for a couple of days (August 21-23, 2011) to work with Xenbase curators in preparing the Xenopus Anatomy Ontology (XAO) for its next big release. Xenbase curators Christina James Zorn and VG Ponferrada have been leading the effort, and Erik Segerdell, the ontology development coordinator for the Phenotype RCN and former Xenbase curator, was also visiting for the week and helping with the update. Erik and I provided training in ontology editing and synchronization tools. Read the rest of this entry »