The Vertebrate Taxonomy Ontology

January 25, 2014

Our paper describing the Vertebrate Taxonomy Ontology (VTO)  is published!  See: http://www.jbiomedsem.com/content/4/1/34 .

One primary objective for Phenoscape and similar projects is to aggregate phenotypic data from multiple studies to named taxa, which in many phylogenetic studies are species but also might be at higher taxonomic levels such as genera or families. While there are many widely used taxonomies that include rich sampling of species and higher taxa, for example Bill Eschmeyer’s widely used Catalog of Fishes, there are few vetted “bridging” taxonomies that allow for aggregating data across, say, fishes, amphibians, and mammals. This problem becomes even more acute when you consider integrating data for extinct taxa as well. As a first step towards addressing this issue for vertebrates, we created the Vertebrate Taxonomy Ontology (VTO) that brings together taxonomies from NCBI, AmphibiaWeb, the Catalog of Fishes (via the previously existing Teleost Taxonomy Ontology), and the Paleobiology Database. The resulting curated taxonomy contains more than 106,000 terms, more than 104,000 additional synonyms, and extensive cross-referencing to these existing taxonomies. The Phenoscape Knowledgebase will leverage this taxonomic ontology by allowing for phenotype statistics to be displayed by taxon, including coarse measures of the extent of annotation coverage and phenotypic variation. Though phenotypes may be annotated to a species, the use of an ontological framework for the taxonomic hierarchy facilitates aggregating phenotypes to higher levels, such as genera or families. In the future, we hope to be able to integrate other excellent and rich sources of taxon-specific taxonomies, such as that in the Reptile Database or the International Ornithologists’ Union Bird List. This is a work-in-progress and the Phenoscape team is certainly interested to integrate new taxonomic sources as well as explore different ways that such a resource can be used and developed by the larger community.

 


Integrating the Paleobiology Database (PaleoDB) into our taxonomy workflow

February 14, 2012

In the original Phenoscape project, our focus was on asking comparative questions regarding living taxa. Although we added fossil taxa to the Teleost Taxonomy Ontology (TTO) when our publications included them, we had no general need to add fossil taxa to the contemporary groups provided by the Catalog of Fishes.   However, in our renewal, the focus has both expanded taxonomically (to all vertebrates) and narrowed to the evolution of fins and limbs.   The evolution of limbs from fins occurred over 300 million years ago, meaning the morphological data for this transition exists only in the fossil record.  Therefore, including fossil data and taxonomy has become essential.

These fossil taxa are not available in the major online sources of names, whether taxon-specific, such as Catalog of Fishes, or general such as Catalog of Life or the NCBI taxonomy. Although NCBI includes some fossil taxa, taxa are only included when a related molecular sequence is submitted, which will never be the case for the vast majority of fossil taxa. These latter taxa will only ever be represented as morphological remains.

This need for fossil data, along with the absence of names from recognized sources, requires us to either add names (and hopefully plausible taxonomy) as curators encounter them in papers, or find an alternative source for names of fossil taxa. Although we have and will continue to add fossil taxa to our taxonomy, we do not, and did not intend to become a name or taxonomy authority in our own right.  In light of the strengths and weaknesses of the Phenoscape team allying with a recognized source of fossil taxonomy seems the best option.

The Paleobiology database also called PaleoDB or simply PBDB is an online repository covering a wide range of paleontological data across all taxa represented in the fossil record. These data include names as well as taxonomic opinions appearing in paleontology publications. These data are available and queryable on the PBDB website and are also available for bulk download. As part of developing the Vertebrate Taxonomy Ontology (VTO), an expansion of the TTO to cover all vertebrates and several chordate groups of interest, I have implemented a tool that adds the content of these bulk downloads to a taxonomy ontology. The process of updating from PBDB was designed to minimize disruption to the existing taxonomy by only adding new taxa from PBDB along with whatever taxonomic lineage is required to link each new taxon to a taxon already known to the existing taxonomy. This way, updating from PBDB does not disrupt any existing taxonomic hierarchy we have either incorporated from other resources or were the result of prior curators’ efforts.

However, no taxonomic resource is ever complete. As our term of curators annotate publications, they are encountering fossil taxa unknown to PBDB, and have begun contributing the publication and taxonomy information back to the PBDB. John Alroy and the PBDB board have accepted several project members as authorizers and enterers of data into the PBDB. This allows us to give back to the PBDB as well as simplify the process of adding fossil taxa to our vertebrate taxonomy. We have developed a workflow where a curator can enter publications, names, and taxonomic opinions directly into the PBDB. This immediately makes our additions visible to a wider community and the opportunity to engage expertise we may not have known existed. Subsequent PBDB bulk downloads will include these new names and reflect any changes to the taxonomic opinions entered during curation. These will then be added to the next update of the VTO.


What’s new in TTO

July 19, 2010

In past months, the TTO (Teleost Taxonomy Ontology) has undergone some changes that will, we hope, make it more useful by connecting it with other taxonomic resources. Here, I will discuss three changes that have been added since last January, but check as more (and important) connections will be coming soon.

When the TTO was first built, we followed the pattern of the NCBI taxonomic ontology that was generated from the NCBI taxonomy database. One design feature of this ontology was the inclusion of terms for taxonomic ranks (e.g., family, genus, etc.) as a separate ‘tree’ of terms with the same ontology. The ontology file contained two root nodes, one for taxon terms, the other for taxonomic rank terms. We had long felt that ranks should exist in a separate ontology (more correctly a vocabulary) that could be shared across ontologies for different taxonomic groups. After several rounds of discussion on the obo-discuss list, we were invited in January to add the taxonomic rank vocabulary to the OBO library of ontologies of interest.

This acceptance allowed us both to register the rank vocabulary and to finally strip out the tree of rank terms from TTO and replace the internal rank tags with ‘has_rank’ links to terms in the (external) rank vocabulary. However, the new rank vocabulary is more than just the set of ranks that we used in tagging taxa in TTO.  The rank vocabulary  incorporates rank terms from two additional sources: first the rank terms that appear in the NCBI taxonomy itself, and also terms from a rank vocabulary developed for TDWG.  We hope that other taxonomic ontologies will be able to make use of this vocabulary.

More recently, we have gone back to the NCBI taxonomy and added cross references between our terms and lexically identical names in NCBI.  As TTO’s names are mostly drawn from the Catalog of Fishes, the exact relation between TTO terms and NCBI names is not, in some cases clear, which lead to the decision to leave the relationship at the level of a cross reference.

In the same release (156), common names, contributed by FishBase were added as synonyms.  As of now, approximately 16,000 taxa have common names with cross references back to their source in FishBase.  We hope to be able to add more common names and eventually include appropriate language tags to these names.

I’ve already started work on our next integration target, but I’ll save that for a later post.


Adding Instances To Ontologies

August 24, 2008

As we proposed in an earlier post, we have been developing an alternative to the traditional approach of representing taxonomy in ontologies. This alternative represents species (and currently higher taxa) as individuals in the ontology.

There is another phenoscape ontology that would benefit from the use of individuals: our ontology (ok, it’s really a vocabulary) of research collections of fish. Part of our process for curating anatomy papers involves constructing a list of all the specimens reported in the paper (generally the author includes this in the paper, but we enter it to facilitate annotation). The specimen lists consist of collection names and numbers. Although research collections are supposed to have standardize names and abbreviations, practice does not always reflect these standards: the same collection may be abbreviated in different ways by different authors, and collections occasionally merge and the smaller collections might disappear into larger ones or be renamed.

So we constructed a vocabulary of fish collections with their 4-5 letter abbreviations, longer name and possible synonyms. Now a research collection contains individual specimens, but these are parts of a collection, not subtypes of a particular collection, and therefore, the collection is best represented as an individual, not a class. So, the collections vocabulary, represented as an OBO ontology, should ideally use OBO instances (which exist to represent individuals), rather than terms for the collections.

Unfortunately, the current version of OBO-Edit, although it will read and save ontologies containing Individuals, provides no facility for either viewing or editing these individuals. Furthermore, although there are several OWL editors that would allow us to work with Individuals, there are no OBO<->OWL translators that understand what to do with OWL individuals, so they are just omitted in the OBO translation, even though the OBO file format, since version 1.2, has supported Instance ‘stanzas.’ Read the rest of this entry »


Taxonomy as ontology: opening up the debate

May 15, 2008

We have created a new mailing list, obo-taxonomy, under the OBO (Open Biomedical Ontologies) umbrella. Our motivation for this new forum is to really open up the discussion surrounding the issues of what should be a proper ontological representation of taxonomy and phylogeny, for example proper semantics of the relationship between taxonomic groups, and between specimens and species. If you care about or have thoughts or opinions on these and related questions, we encourage you to subscribe to this new list.

Read the rest of this entry »


The Teleost Taxonomy Ontology

May 14, 2008

One of the two main ontologies developed and used by the Phenoscape project is the Teleost Taxonomy Ontology (TTO). Although the Phenoscape project is focused on the Ostariophysi, the TTO covers not just teleosts, but all the species listed in Bill Eschmeyer’s Catalog of Fish. This post will discuss how the current TTO was constructed and the work flow we use to update it. A later posting will discuss the effort to update the ontology to better represent current thinking about metaphysical status of species and other taxonomic terms.

Read the rest of this entry »


Teleost Anatomy and Taxonomy Ontologies on-line at the NCBO BioPortal

January 31, 2008

The Teleost Anatomy Ontology (TAO) and the Teleost Taxonomy Ontology (TTO) are finally on-line and searchable on the NCBO BioPortal.

The ontologies were deposited into the OBO versioning system already in November, but a database loading problem prevented their functioning in the BioPortal browser earlier.