Ontology-based text markup tools

January 13, 2016

Efficiently extracting knowledge from the published literature is a challenge faced by many database projects in biology, and many of us are interested in tools that can assist and speed up the task of identifying concepts in free text. I’ve recently used two text markup tools that are helpful in keeping up with the literature and rapidly developing ontologies. As a participant in the Fifth BioCreative Challenge, in which biocurators test and evaluate text mining systems, I evaluated the EXTRACT bookmarklet tool. EXTRACT was developed for metagenomics data and provides full-page tagging of mapped terms from environment, disease, taxonomy, and tissue ontologies, and can also markup shorter selections of text on an HTML page. The tool is immediately useful, particularly during the first stages of the curation process, as a curator is surveying the literature for relevant articles.

Annotating long, descriptive text has also been a challenge for Phenoscape. To assist curators in this task, we recently added a text annotator tool to the Phenoscape Knowledgebase that tags selected text passages copied in from a source with matched terms from anatomy (Uberon), taxon (VTO), and quality (PATO) ontologies. Viewing the annotated results, with color-coded text, has aided curators in the process of applying large, complex ontologies to equally complex text.

Half-duck, half-crocodile, and bigger than T. Rex: a giant semiaquatic predatory dinosaur

September 26, 2014

A team led by University of Chicago Phenoscapers Nizar Ibrahim and Paul Sereno have published new findings about the remarkable semiaquatic predatory dinosaur Spinosaurus aegyptiacus in the latest issue of Science.  It has been receiving some nice coverage at NPR and other news outlets.

Workers at the National Geographic Museum in Washington grind the rough edges off a life-size replica of a spinosaurus skeleton.  Credit: Mike Hettwer/National Geographic.

From the abstract:

We describe adaptations for a semiaquatic lifestyle in the dinosaur Spinosaurus aegyptiacus. These adaptations include retraction of the fleshy nostrils to a position near the mid-region of the skull and an elongate neck and trunk that shift the center of body mass anterior to the knee joint. Unlike terrestrial theropods, the pelvic girdle is downsized, the hindlimbs are short, and all of the limb bones are solid without an open medullary cavity, for buoyancy control in water. The short, robust femur with hypertrophied flexor attachment and the low, flat-bottomed pedal claws are consistent with aquatic foot-propelled locomotion. Surface striations and bone microstructure suggest that the dorsal “sail” may have been enveloped in skin that functioned primarily for display on land and in water.

Citation: Ibrahim N, Sereno PC, Dal Sasso C, Maganuco S, Fabbri M, Martill DM, Zouhri S, Myhrvold N, Iurino DA (2014) Semiaquatic adaptations in a giant predatory dinosaur. Science. http://doi.org/10.1126/science.1258750.

Integrating the Paleobiology Database (PaleoDB) into our taxonomy workflow

February 14, 2012

In the original Phenoscape project, our focus was on asking comparative questions regarding living taxa. Although we added fossil taxa to the Teleost Taxonomy Ontology (TTO) when our publications included them, we had no general need to add fossil taxa to the contemporary groups provided by the Catalog of Fishes.   However, in our renewal, the focus has both expanded taxonomically (to all vertebrates) and narrowed to the evolution of fins and limbs.   The evolution of limbs from fins occurred over 300 million years ago, meaning the morphological data for this transition exists only in the fossil record.  Therefore, including fossil data and taxonomy has become essential.

These fossil taxa are not available in the major online sources of names, whether taxon-specific, such as Catalog of Fishes, or general such as Catalog of Life or the NCBI taxonomy. Although NCBI includes some fossil taxa, taxa are only included when a related molecular sequence is submitted, which will never be the case for the vast majority of fossil taxa. These latter taxa will only ever be represented as morphological remains.

This need for fossil data, along with the absence of names from recognized sources, requires us to either add names (and hopefully plausible taxonomy) as curators encounter them in papers, or find an alternative source for names of fossil taxa. Although we have and will continue to add fossil taxa to our taxonomy, we do not, and did not intend to become a name or taxonomy authority in our own right.  In light of the strengths and weaknesses of the Phenoscape team allying with a recognized source of fossil taxonomy seems the best option.

The Paleobiology database also called PaleoDB or simply PBDB is an online repository covering a wide range of paleontological data across all taxa represented in the fossil record. These data include names as well as taxonomic opinions appearing in paleontology publications. These data are available and queryable on the PBDB website and are also available for bulk download. As part of developing the Vertebrate Taxonomy Ontology (VTO), an expansion of the TTO to cover all vertebrates and several chordate groups of interest, I have implemented a tool that adds the content of these bulk downloads to a taxonomy ontology. The process of updating from PBDB was designed to minimize disruption to the existing taxonomy by only adding new taxa from PBDB along with whatever taxonomic lineage is required to link each new taxon to a taxon already known to the existing taxonomy. This way, updating from PBDB does not disrupt any existing taxonomic hierarchy we have either incorporated from other resources or were the result of prior curators’ efforts.

However, no taxonomic resource is ever complete. As our term of curators annotate publications, they are encountering fossil taxa unknown to PBDB, and have begun contributing the publication and taxonomy information back to the PBDB. John Alroy and the PBDB board have accepted several project members as authorizers and enterers of data into the PBDB. This allows us to give back to the PBDB as well as simplify the process of adding fossil taxa to our vertebrate taxonomy. We have developed a workflow where a curator can enter publications, names, and taxonomic opinions directly into the PBDB. This immediately makes our additions visible to a wider community and the opportunity to engage expertise we may not have known existed. Subsequent PBDB bulk downloads will include these new names and reflect any changes to the taxonomic opinions entered during curation. These will then be added to the next update of the VTO.

Phenoscape visits Xenbase for Anatomy Ontology Update

September 23, 2011

Last month I visited Xenbase and Aaron Zorn’s lab at the Cincinnati Children’s Hospital for a couple of days (August 21-23, 2011) to work with Xenbase curators in preparing the Xenopus Anatomy Ontology (XAO) for its next big release.  Xenbase curators Christina James Zorn and VG Ponferrada have been leading the effort, and Erik Segerdell, the ontology development coordinator for the Phenotype RCN and former Xenbase curator, was also visiting for the week and helping with the update. Erik and I provided training in ontology editing and synchronization tools. Read the rest of this entry »

ICBO 2011

August 11, 2011

Jim Balhoff and I recently attended the International Conference on Biomedical Ontology (ICBO) held 26-30 July in Buffalo, NY. The conference focused on the use and development of ontologies in the biological and biomedical domains. Of particular interest to Phenoscape were the workshops and tutorials held during the two days before the main conference. Topics included ontology integration, Common Logic, ontology development tools, and using OBO and OWL formats for ontology development and reasoning.

We presented talks at the Facilitating Anatomy Ontology Interoperability workshop. Jim’s talk was on representing taxa as individuals in OWL, an alternative to the common representation of taxa as classes, which facilitates annotation of phenotypic data involving polymorphism and evolutionary reversals.  I presented a lightning talk on the anatomy ontology synchronization requirements for linking evolutionary and model organism phenotypes.  Other presentations from the workshop are available here. We also presented a poster describing the reasoning used in the Phenoscape Knowledgebase.

The main conference included interesting talks on a broad range of topics including the application of ontologies to proteins, diseases, biological mechanisms, and electronic health records. Presentations can be downloaded here.

Phenoscape outreach from Chicago

May 4, 2010

In mid March Phenoscape met at the Biodiversity Synthesis Center (BioSynC) at the Field Museum of Natural History to host an education and outreach workshop, gather feedback on new user interface mockups for the Phenoscape Knowledgebase, and hold a project meeting. We sent announcements out to the Chicago morphologists, systematists, and developmental biologists. Because we are designing tools to address the general needs of the systematics and evo-devo community, we were delighted that about 20 people attended our day-long meeting. The  input was very useful, and we translated it into specifications for a better user interface (in the works right now). Big thanks to Mark Westneat and staff at the BioSynC for their help in organizing this workshop as well as Lance Grande for the superb behind-the-scenes tour. Please see our wiki page about this workshop for the list of speakers and their slides.