Adding Amphibians to Phenoscape

March 7, 2012

On 15–16 February 2012, I visited NESCent to work with Peter Midford, Jim Balhoff, and, especially, Wasila Dahdul. The focus of my trip was to push forward on the continued development of the Amphibian Anatomical Ontology and the integration of phenotypic data for amphibians into the larger Phenoscape project.

With Peter Midford, I worked to make a significant update to the Amphibian Taxonomy Ontology based largely on a recent revision to the higher-level taxonomy used on AmphibiaWeb (for which I am part of the steering committee). AmphibiaWeb provides an excellent resource for Phenoscape and other related projects because it provides a list of currently recognized species of living amphibians and is updated daily.

The majority of my visit was spent working with Wasila Dahdul on issues related to the Amphibian Anatomy Ontology (AAO) and on curating our first evolutionary dataset related to the fin–limb transition (Ruta et al., 2003). During this work, we plowed through a significant portion of AAO terms lacking parent terms (either adding parents or synonymizing the terms with others in either VAO or AAO). We also evaluated whether to add terms to the AAO that are present in the Xenopus Anatomy Ontology (XAO; Xenopus is a genus of African frogs used as a model system) but absent in the AAO. In some cases, this led to recommending that those terms be removed from the XAO. As we have started to curate morphological characters related to the limbs from the study by Ruta et al. (2003), we encountered many terms not present in existing anatomy ontologies, such as AAO or the Vertebrate Anatomy Ontology. Some terms had been slated for inclusion in the Amniote Anatomy Ontology (AmAO) being developed by Nizar Ibrahim and Paul Sereno (University of Chicago). Because these terms are also present in non-amniotes, we are recommending that they be migrated from the AmAO to the higher-level VAO.

As we start to focus on curating phenotypes from the literature of vertebrate paleontology, a few issues are emerging. One important issue is that curation of data from paleontological studies will likely necessitate adding a field to our information for specimens to accommodate free text alongside museum abbreviations and catalog numbers. The reason for this is that paleontological studies can rely on a combination of materials, including both specimens and examination of literature. We will also need to add to and refine the collection of museum codes used to curate specimen data. These last points about accurately curating data related to specimens examined are important if we are to use the Phenoscape knowledgebase to point to records for those same specimens in on-line databases, or if databases (such as those for museum collections) want to point to records of specimens in the Phenoscape knowledgebase.


New summer course on anatomy ontologies

February 27, 2012

A summer course on anatomy ontologies is being offered for the first time through the NESCent Academy and the Phenotype Ontology Research Coordination Network.  The intended audience is postgraduate researchers in evolutionary biology and informatics – including students, postdocs and faculty – who are relative newcomers to ontologies.  It will be held from 30-Jul to 3-Aug  2012 in Durham, NC.  Spread the word, and if you are interested be sure to apply before the deadline of 6-Apr.  [Full disclosure: a number of long-time friends of Phenoscape are among the instructors].

From the course website:

Evolutionary research has been revolutionized by the explosion of genetic information available, and anatomy ontologies must play a central crucial in relating this knowledge to observable diversity. Anatomy ontologies and vocabularies are widely used to index data and are critical for relating gene expression and phenotype data across taxa. Within a single species, anatomy ontologies provide scaffolding that interconnects many kinds of observations; across species, they provide evolutionary, developmental, and mechanistic insights. In order for anatomy ontologies to successfully serve all of these purposes, they must be constructed consistently so that they can be utilized and understood by both researcher and software alike. This course aims to teach proper ontology design principles and practices such that anatomical interoperability across evolutionarily disparate taxa is achieved. It further seeks to promote community growth and adoption of ontology-based methods and tools. The subsequent benefit is in the form of shared access to the unique data store of each community (e.g. genetic, genomic, developmental, and evolutionary data).

The course covers a basic introduction to ontology design principles and usage, specific ontology considerations for anatomy, application of anatomy ontologies in the context of evolutionary phenotype comparison, and use of anatomy ontologies for image annotation in different taxa. There will be strong emphasis on hands-on exercises that will develop ontology skills and provide exposure to different software applications that are useful in variety of areas of evolutionary biology.


Integrating the Paleobiology Database (PaleoDB) into our taxonomy workflow

February 14, 2012

In the original Phenoscape project, our focus was on asking comparative questions regarding living taxa. Although we added fossil taxa to the Teleost Taxonomy Ontology (TTO) when our publications included them, we had no general need to add fossil taxa to the contemporary groups provided by the Catalog of Fishes.   However, in our renewal, the focus has both expanded taxonomically (to all vertebrates) and narrowed to the evolution of fins and limbs.   The evolution of limbs from fins occurred over 300 million years ago, meaning the morphological data for this transition exists only in the fossil record.  Therefore, including fossil data and taxonomy has become essential.

These fossil taxa are not available in the major online sources of names, whether taxon-specific, such as Catalog of Fishes, or general such as Catalog of Life or the NCBI taxonomy. Although NCBI includes some fossil taxa, taxa are only included when a related molecular sequence is submitted, which will never be the case for the vast majority of fossil taxa. These latter taxa will only ever be represented as morphological remains.

This need for fossil data, along with the absence of names from recognized sources, requires us to either add names (and hopefully plausible taxonomy) as curators encounter them in papers, or find an alternative source for names of fossil taxa. Although we have and will continue to add fossil taxa to our taxonomy, we do not, and did not intend to become a name or taxonomy authority in our own right.  In light of the strengths and weaknesses of the Phenoscape team allying with a recognized source of fossil taxonomy seems the best option.

The Paleobiology database also called PaleoDB or simply PBDB is an online repository covering a wide range of paleontological data across all taxa represented in the fossil record. These data include names as well as taxonomic opinions appearing in paleontology publications. These data are available and queryable on the PBDB website and are also available for bulk download. As part of developing the Vertebrate Taxonomy Ontology (VTO), an expansion of the TTO to cover all vertebrates and several chordate groups of interest, I have implemented a tool that adds the content of these bulk downloads to a taxonomy ontology. The process of updating from PBDB was designed to minimize disruption to the existing taxonomy by only adding new taxa from PBDB along with whatever taxonomic lineage is required to link each new taxon to a taxon already known to the existing taxonomy. This way, updating from PBDB does not disrupt any existing taxonomic hierarchy we have either incorporated from other resources or were the result of prior curators’ efforts.

However, no taxonomic resource is ever complete. As our term of curators annotate publications, they are encountering fossil taxa unknown to PBDB, and have begun contributing the publication and taxonomy information back to the PBDB. John Alroy and the PBDB board have accepted several project members as authorizers and enterers of data into the PBDB. This allows us to give back to the PBDB as well as simplify the process of adding fossil taxa to our vertebrate taxonomy. We have developed a workflow where a curator can enter publications, names, and taxonomic opinions directly into the PBDB. This immediately makes our additions visible to a wider community and the opportunity to engage expertise we may not have known existed. Subsequent PBDB bulk downloads will include these new names and reflect any changes to the taxonomic opinions entered during curation. These will then be added to the next update of the VTO.


Collaborative editing in Phenex 1.2

February 13, 2012

We have recently released version 1.2.1 of our Phenex annotation software. This release adds some functionality for easier collaborative editing of data files. While our curators have used Subversion revision control software in the past, the new features make it more reliable to share Phenex data files with user-friendly file synchronization software such as Dropbox. While a NeXML document is open in Phenex, the application monitors for changes to the document file in the background. If the file is being shared via Dropbox and is simultaneously edited by someone else, Phenex will alert the user that the file has changed and offer to load the new version. If there are no unsaved edits then Phenex will reload the file automatically. Phenex 1.2 also provides an autosave feature which saves the document after every edit—this reduces the chance that the file might be edited elsewhere while one has unsaved changes, avoiding complicated file merges.


Teleost Anatomy Ontology adds French terms and synonyms

February 3, 2012

With the help of Phenoscape and DeepFin intern Ben Frable, I recently finished adding 117 French anatomical terms and synonyms from Chanet & Desoutter’s glossary publication [1] to the Teleost Anatomy Ontology (TAO). These authors spent many years defining and translating Paul Chabanaud’s anatomical analyses of flatfishes into modern French and English to help researchers understand his important publications. Adding these terms to the TAO takes their translation one step further, enabling computers to link Chabanaud’s unusual terms to an ontology ID for each anatomical ‘concept’, which in turn enables connections among all phenotypic and related data that reference this ID.

These synonyms can now be used in searches of the Phenoscape Knowledgebase. For example, you can see the French synonyms for ‘paired fin’. One can imagine ultimately being able to select a preferred language or term label when browsing the ontology in the Knowledgebase.

These were the first set of foreign terms to be added to the teleost ontology, and we had to tweak the Phenoscape Knowledgebase interface to display the diacritical marks correctly. We are ready to accept more! Please send me anything you’d like added or changed to the TAO term tracker.

[1] Chanet, B., & Desoutter-Meniger, M. (2008). French-English glossary of terms found in Chabanaud’s published works on Pleuronectiformes. Cybium, Electronic Publication no 1:1-23. PDF download


Notes from TDWG 2011

November 22, 2011

Last month, I (Jim Balhoff) and Hilmar Lapp attended the Biodiversity Information Standards meeting (TDWG 2011), in New Orleans. As a representative of both Phenoscape and the Hymenoptera Anatomy Ontology project, I presented a poster, with co-authors Matt Yoder and Andy Deans, detailing an OWL model showing the explicit semantics of linking an Entity–Quality (EQ) phenotype to evolutionary character matrix data and taxonomic specimens. While EQ can be thought of as simple ontological tags on descriptive data, modeling phenotypes within a more explicit logical framework allows us to make use of more powerful automated reasoning. It also provides a consistent interpretation for EQs across projects annotating phenotypes (for example, Phenoscape and HAO). 

Of particular relevance to our poster was another presented by Cam Webb. Cam has created an OWL-compatible version of Darwin Core which can be used to describe specimen metadata in RDF. We made similar use of Darwin Core in our poster, but we are looking into adopting Cam’s Darwin-SW for this part of the model.

Overall there was a lot of interest in semantic technologies at TDWG, ranging from the initial meeting of an RDF/OWL working group to other projects that are not using semantic technologies but seem well suited for RDF.


My Experience as a Phenoscape Training Fellow

November 22, 2011

While working to describe two species of lizardfish (Synodus) with Carole Baldwin at the Smithsonian National Museum of Natural History, she received an email from Paula Mabee asking if she knew or had any students interested in working on the Phenoscape Project. I had realized that with advances in technology and communication, evolutionary biology and all science was headed towards a future of large-scale interdisciplinary collaborations to help address big questions and make tools and data readily available. Therefore, I immediately jumped on the opportunity to work on Phenoscape!

With the support of funding from DeepFin, I started my internship with Phenoscape at the National Evolutionary Synthesis Center (NESCent) in August 2011. My three months here at NESCent have flown by and even though it is my last day, I am just as excited about the project as the day I started! Working with Wasial Dahdul, Peter Midford and Jim Balhoff has enabled me to learn and understand a great deal about databases, collaboration and morphology. Phenoscape has completely changed the way I think about phenotypic characters. Breaking them down into logical statements in Phenex really allows you to understand a character as it fits in the bigger picture. I was able to work with Wasila in forging interdisciplinary ties by contributing to other ontologies and databases, such as PATO and PaleoDB. Additionally, working to assist in the expansion of Phenoscape to incorporate all vertebrates taught me a lot about the origins of vertebrates and the plethora of prehistoric life I did not realize existed- including my new personal favorite prehistoric fish, Jagorina!

NESCent is an amazing place. Being one of the few people here without a higher degree or a long list of publications under their belt, I was initially a little intimidated. However, the informatics group, post-docs and professors have been great and pushed me to participate in seminars and intellectual discussion. This is a stimulating environment that facilitates thinking outside the box and looking at bigger picture issues in evolutionary biology.

I am excited to continue my work on Phenoscape offsite back at the Smithsonian and I hope to contribute throughout my graduate career in Dr. Brian Sidlauskas’s (former NESCentian and Phenoscape tester and contributor) lab at Oregon State University.

Ben Frable
Graduate Student, Oregon State University
Student Researcher, Smithsonian National Museum of Natural History


Notes from ISWC 2011

November 3, 2011

Last week, I attended the 10th International Semantic Web Conference (ISWC) in Bonn, Germany. A tremendous variety of sophisticated work is going on both in academia and industry to improve the technology for, and take advantage of, the ever-growing network of data and concepts published, through open standards, on the web.

You might say it is the best of times and the worst of times for semantic web enthusiasts, in that reasoning and query engines that can be used on large collections of RDF have in the last few years become a reality (one of the Challenge Tracks provided contestants with a *billion* triples to work with).  But some see clouds on the horizon. The web search titans (Bing, Google and Yahoo!) are now pushing schema.org, a microformat and vocabulary standard for web content that some worry may threaten the development of richer semantic web technology.  Still, most treated the news positively, happy to know that these organizations now seem to agree on the importance of semantics.  In fact, Yahoo! described at the conference how they are trying to build a “Web of Objects” that takes advantage of scheme.org, together with more extensive internal vocabularies, to regroup knowledge pieces that are scattered around the Web.

Conference chair Natasha Noy showed a revealing pair of tag clouds comparing the abstracts from the first year of the conference in 2001 to today — the terms “semantic” and “web” have shrunk in importance and “data” is now king! ISWC 2011 tag cloud

Ivan Herman’s blog gives a good sampling of the flavor of talks presented at the meeting.  I especially enjoyed the Industry Track, since these applications are less familiar to me than the academic/scientific ones, and  I was particularly impressed by the importance of semantic technologies to the news media and other content industries.  These technologies are being deployed by news organizations with great enthusiam (e.g. the BBC).  I also came away with a strong sense that semantic technologies are helping to create demand, and drive a revolution in the use of, Open Government Data; there were a number of demonstrations of useful real-world applications, particularly to environmental monitoring.

With my Phenoscape hat on, I attended a Linked Open Data for Science (LISC) satellite workshop prior to the main conference.  The event included both presentations and discussions from a variety of perspectives about the opportunities and challenges of this new technology.  A diversity of fields were represented (social science, linguistics, geosciences, biomedicine, etc.).  But, it is clear that uptake of linked open data as an alternative means of publication is still in its infancy within the sciences.  This despite the fact that the bioinformatics data centers account for nearly a quarter of the real estate in the famous linked data cloud diagram.  Some of the most exciting opportunities, in my opinion, come from the ability to allow radically decentralized data publication, and this is something that we might wish to pilot in a modestly distributed data curation environment like Phenoscape.  Another observation: I was surprised to discover at the meeting how much the utility of the linked data cloud (and, by extension, the semantic web) depend on the social convention by which everyone provides links into a relatively small number of large ‘concept repositories’ like DBPedia (which was originally a Master’s project, BTW).

The breakout discussion sessions at LISC  highlighted how scientific practice will place difficult demands on linked data with respect to provenance, context, granularity, distributed authority, etc.  This resonated with the message of our own contribution to the workshop, which outlined some of the particular challenges in making context-dependent links between scientific objects, when the descriptions of those objects are scattered across different resources, and when the similarities between objects are spread weakly over many properties [1].  Another important question that hit home for a number of us coming from the bioinformatics and biodiversity informatics world is how scientists are going to be able to take advantage of the innovations now going on in the commercial sector (including some of the exhibitors at the main conference) within the constraints and DIY culture of small individual university-based research grants.

There is no denying the explosion in linked data resources out there (comparisons of the growth in the cloud diagram are about as common as graphs showing the growth in sequence data at a biology conference).  But another recurrent theme of the meeting was that unfortunately much of that content is missing semantics (i.e. a lack of use or availability of ontologies for many concepts, and lack of links between content at different endpoints), and generating semantically annotated triples needs to be easier that it currently is (a message certainly relevant to those of us developing curation tools).

One of the keynotes, from Frank van Harmelen, generated quite a bit of buzz.  He looked back on 10 years of the semantic web, asking what theoretical principles we can learn from the experience so far, and his annotated slides are well worth a look.

The conference was a great mix of different formats.  In addition to the keynotes and regular talks, there are a host of workshops and tutorials, challenges, panel discussions (including one billed as a ‘Death Match’), and even a special competition for the best “Outrageous Ideas”.  The winner of that one was a proposal to bring linked data to the non-networked portion of humanity.  A particularly nice feature of the meeting was the ‘Minute Madness’ preceding the poster session in which each of the poster presenters gave a short timed pitch with to all the attendees – it was a very entertaining and informative way to ‘see’ every poster and allowed everyone to quickly pick out which ones to hit during the session.

For more, see the excellent day-by-day summary of the meeting from Juan Sequeda, where there are links to all the winning presentations and challenge entries.  [Ironically, the conference website is down temporarily while it is being moved, so come back later if the links to the papers hang].  The next ISWC will be November 11-15, 2012 in Boston.

Reference:

[1] Vision T, Blake J, Lapp H, Mabee P, Westerfield M (2011) Similarity Between Semantic Description Sets: Addressing Needs Beyond Data Integration, in Proceedings of the First International Workshop on Linked Science, Bonn, Germany, October 24, 2011, Tomi Kauppinen, Line C. Pouchard, Carsten Kessler (eds), published in CEUR Workshop Proceedings, Volume 783.


Phenoscape visits Xenbase for Anatomy Ontology Update

September 23, 2011

Last month I visited Xenbase and Aaron Zorn’s lab at the Cincinnati Children’s Hospital for a couple of days (August 21-23, 2011) to work with Xenbase curators in preparing the Xenopus Anatomy Ontology (XAO) for its next big release.  Xenbase curators Christina James Zorn and VG Ponferrada have been leading the effort, and Erik Segerdell, the ontology development coordinator for the Phenotype RCN and former Xenbase curator, was also visiting for the week and helping with the update. Erik and I provided training in ontology editing and synchronization tools. Read the rest of this entry »


ICBO 2011

August 11, 2011

Jim Balhoff and I recently attended the International Conference on Biomedical Ontology (ICBO) held 26-30 July in Buffalo, NY. The conference focused on the use and development of ontologies in the biological and biomedical domains. Of particular interest to Phenoscape were the workshops and tutorials held during the two days before the main conference. Topics included ontology integration, Common Logic, ontology development tools, and using OBO and OWL formats for ontology development and reasoning.

We presented talks at the Facilitating Anatomy Ontology Interoperability workshop. Jim’s talk was on representing taxa as individuals in OWL, an alternative to the common representation of taxa as classes, which facilitates annotation of phenotypic data involving polymorphism and evolutionary reversals.  I presented a lightning talk on the anatomy ontology synchronization requirements for linking evolutionary and model organism phenotypes.  Other presentations from the workshop are available here. We also presented a poster describing the reasoning used in the Phenoscape Knowledgebase.

The main conference included interesting talks on a broad range of topics including the application of ontologies to proteins, diseases, biological mechanisms, and electronic health records. Presentations can be downloaded here.