One of the two main ontologies developed and used by the Phenoscape project is the Teleost Taxonomy Ontology (TTO). Although the Phenoscape project is focused on the Ostariophysi, the TTO covers not just teleosts, but all the species listed in Bill Eschmeyer’s Catalog of Fish. This post will discuss how the current TTO was constructed and the work flow we use to update it. A later posting will discuss the effort to update the ontology to better represent current thinking about metaphysical status of species and other taxonomic terms.
What’s in the TTO?
The TTO contains slightly over 35,000 terms, of which 30,385 are species, 5045 are genera, and 542 are families. The ontology is organized as a traditional taxonomy, which is also refered to a class hierarchy – thus the species Danio rerio is a subclass of the genus Danio, which is a subclass of the family Cyprinidae. In addition to terms from the Catalog of Fish (CoF), we have added a small number of ‘higher level’ taxa, which allow the TTO to have Craniata as its root.
The TTO also contains a set of taxonomic rank terms – ‘genus’, ‘family’, ‘order’, etc. Most of the taxa named in the ontology are tagged with one of these rank terms, using a property called ‘has_rank.’ This structure is virtually identical to the NCBI taxonomy ontology that is periodically generated by Chris Mungall. The current structure is diagrammed in the figure below. I have constructed a separate ontology of rank terms, which will be submitted to the OBO community in the near future.
The TTO also contains over 38,000 taxonomic synonyms, mostly at the species level, but there are also some at the level of genera.
We have also consulted with taxonomic experts for several Ostariophysian families, including Siluriforms, Characiforms, and Cypriniforms. One benefit of the contributions from area experts is the addition of several fossil taxa, particularly for Siluriforms. Although fossil taxa are not included in the CoF or in the NCBI taxonomy, which focuses on taxa which have molecular data, inclusion of fossil taxa is consistent with the use of this ontology to support annotation of morphological characters. For taxa that have not entered the TTO as an entry in the CoF, we try to include a doi or similar identifier as a database cross reference.
For taxa that are mentioned, but are not actually described in a publication (e.g. Abramites sp. appearing in Fink and Fink (1981)), we include the source publication as part of the name, for example: “Abramites sp. (Fink and Fink 1981).”
There is a Wiki page that lists all the changes we have made relative to the last version directly generated from the CoF. This is currently from November 2007, though we will be generating an update from the January 2008 CoF in the near future.
Constructing the TTO
The TTO was constructed from a database dump of the Catalog of Fish that was very generously made available to us by Bill Eschmeyer and Stan Blum. The dump consisted of three tables: species, genera, and lineages. Each table consisted of rows of names, with the status of each name and, if appropriate, whether the name was currently valid or a synonym. Each row also indicated the higher level group that subsumes each term. I constructed a tool ‘TTOUpdate’ to build the OBO format ontology from these three tables.
The tool is currently capable of doing limited updates from tables in a special format. Otherwise, updates are currently done by hand, though the next version of the TTOUpdate tool should be more flexible. The next version will also support generation of ‘intermediate synonyms’ by extracting taxon names from the history comments. Currently only names that appear as either current names (which might however, be marked as synonyms) or original names will be extracted from the CoF into the TTO (as either valid term names or synonyms).
Browsing the TTO
We do not have a tool specifically for browsing the TTO, however, there are several options for doing so. The TTO is available on NCBO’s bioportal, though we have been having some difficultly with that site. However, the TTO is available as a text file (OBO format), can be visualized as a tree, and can be searched from here.
If you are familiar with OBOEdit, you can download the text version from any of the usual OBO respositories (e.g., the source forge cvs ). The cvs browser will also let you see another view of the changes to the TTO, in terms of tracker items and other requests that each change addresses (I do try to write meaningful commit comments).