Postdoctoral Opportunity: Semantic Reasoning for Biological Phenotypes

July 28, 2011

We seek a postdoctoral researcher in computational biology for Phenoscape.  This person will contribute to two important research strands within the project:

  1. Development of computational and statistical methodology for measuring semantic similarity between sets of phenotypes, in order to support searches within extremely large phenotype datasets.
  2. Development and testing of methods for automatically generating ontologically based phenotype expressions from structured excerpts of natural language.

The position is based in the informatics group at the National Evolutionary Synthesis Center (NESCent), and will be administered through the University of North Carolina at Chapel Hill (UNC-CH) under the supervision of Hilmar Lapp at NESCent and Dr. Todd Vision at UNC-CH.   The research will be in collaboration with Dr. Chris Mungall at Lawrence Berkeley National Lab and Dr. Hong Cui at the University of Arizona.  The project also includes biologists and bioinformaticists from the University of South Dakota, the University of Chicago, the University of Kansas, in addition to the model organism databases for mouse (MGD), zebrafish (ZFIN), and Xenopus (Xenbase).

Applicants should have a PhD in bioinformatics, computational biology or a related field. Prior experience with machine reasoning using ontologies is strongly preferred. The position is for two years, pending satisfactory performance and availability of funds.  To apply, please provide a cover letter, CV, and contact information for three references.  Inquiries and applications may be sent to Hilmar Lapp at hlapp@nescent.org.  The post is open immediately and will remain open until filled.


CSHALS 2011

March 9, 2011

I recently attended the Conference on Semantics in Healthcare and Life Sciences (CSHALS), in Cambridge, MA. The CSHALS meeting was a change for me in that it’s much more healthcare-oriented than other venues in which I’ve presented work from Phenoscape. This was a great opportunity to see how far the healthcare community has pushed semantic web technologies, and also to become more familiar with some of the more commercial packages which are available for storing and querying very large knowledgebases based on RDF (for example, AllegroGraph and Gruff from Franz, Inc., and Sentient Knowledge Explorer from IO Informatics). A particularly interesting talk was the keynote by Toby Segaran, of Metaweb Technologies, advocating semantic techniques as a more agile approach to data. Slideshows from the conference presentations are available for download here, including my own.


Third beta release of Phenoscape Knowledgebase 2.0

February 7, 2011

Phenoscape Knowledgebase 2.0 beta release 3 is now available at http://kb.phenoscape.org/. This version includes an enhanced publication info interface [example] which displays the original character matrix, as well as a list of taxa including the taxonomic names and museum specimens used in the dataset. Other recent developments in the KB are global term info popups and hierarchical browsing of ontology terms on their info pages [example].

We have also improved our software and data release processes so that the public Knowledgebase can easily keep up with new developments and the latest data updates from our curators. Looking forward, the next major feature to be added to the Knowledgebase is a faceted browsing interface which is currently under development. This interface should help provide an overview of how the data are organized within the various ontologies used in the Knowledgebase.


Introducing the Vertebrate Anatomy Ontology

January 12, 2011

The Vertebrate Anatomy Ontology (VAO) was recently developed as a high-level, bridging ontology for existing and future single species (e.g., zebrafish, mouse, Xenopus) and multispecies (teleosts, amphibians) vertebrate ontologies. We initiated VAO at a Phenoscape workshop held at NESCent in April 2010. VAO was developed to accommodate the various ways that biologists classify bones and cartilages, as distinct elements and tissue types, and based on developmental and locational criteria. After substantial review by experts in comparative anatomy, paleontology, systematics, and anatomy ontologies, VAO was submitted to the Open Biological and Biomedical Ontologies (OBO) Foundry and committed in December 2010.  The ontology currently contains 127 defined terms and 63 synonyms for cells, tissues, skeletal elements, skeletal system parts, and biological processes. Cross references to several existing ontologies (Cell Type Ontology, Common Anatomy Reference Ontology, GO Biological Process) are included, thus connecting vertebrate ‘sub’ onotologies to a wealth of additional data.  A mansucript detailing the VAO and evaluating the benefits of its use is in preparation.


Matching Phenotypes

December 17, 2010

An important goal for the Phenoscape project is to be able to suggest candidate genes that may have contributed to evolutionary change.  The way that we have proposed to do this is to search for changes in phenotype that appear as the result of mutations in model organisms and also appear as phenotype changes on an evolutionary tree.  There are several challenges in designing this search, apart from simply recognizing similar phenotypes, that we have been working on during the past few months.

The first issue is that we are interested in changes in phenotype, not simply matching phenotypes.  For phenotypes associated with mutants of model organism mutants, it is understood that they vary with respect to the wild type.  For taxa, however, this means looking for taxonomic nodes where variation in a phenotype is observed among the children of the node.  For example, there are nine species within the genus Aspidoras with annotations for the shape of the opercle bone.  Of these, eight exhibit opercle bones with round shape, but the ninth (A. pauciradiatus) is annotated with a triangular opercle.  In contrast, all three annotated species of the related Hoplosternum are annotated with a triangular opercle.  Thus there is detectable variation in opercle shape within the children of Aspidoras, but not within  Hoplosternum – suggesting that change in opercle shape has occurred somewhere among the descendants of  Aspidoras. For our analysis, identifying variation among descendants is important.

Thus, our search for shared variation in phenotypes focuses on matching phenotypes associated with genes with phenotypes of taxa showing variation.  However we are looking for matches at a larger scale than single phenotypes; we are looking for matches across the set of phenotypes affected by a gene or the set of features that have changed among the descendants of a taxonomic node.   We refer to these sets of phenotypes as the ‘phenotypic profile’ of a gene or taxon, following a seminal paper by Washington et al. 2009.  Washington et al. propose four metrics (three based on ‘information content’) to score matches between the sets of phenotypes in a pair of profiles.

In the course of developing the search, we have encountered several important differences in curation approach between ZFIN and Phenoscape.  In some cases tehre are different uses of PATO to model the same phenotype, for example the absence of an entity.  In other cases ZFIN uses a quality ‘abnormal’ that applies to mutants, but not in a taxonomic, comparative sense, which means these phenotypes will be inaccessible to us.  Thus, implementing this search is helping us to better understand our data and our choices in modeling the data and how it interoperates with other ontology-based data.  Such reflection would have been difficult or impossible without the use of ontologies to represent the phenotypes.


Phenotype RCN announced

November 24, 2010

NSF has recently funded a Research Coordination Network for researchers who are interested in searching and comparing phenotypes across species and in developing the tools and methods needed in making this possible  (http://phenotypercn.org).  The representation of morphology, behavior and other phenotypic features using computational methods such as ontologies and controlled vocabularies is in its infancy.  Integrating phenotypes with data across all levels of the biological hierarchy, however, is possible if standards are co-developed and coordinated.

This RCN envisions building a broad base of community knowledge and resources so as to maximize the research potential of web-based data.  Funding for participation in meetings, presentations and laboratory exchanges for students, postdocs and faculty from ontology and taxonomic domains (initially plants, arthropods, and vertebrates) is available through the RCN (see http://phenotypercn.org/opportunities/.

We are eager to have you join us!  Please sign up for our participant and mailing lists for further information (http://phenotypercn.org/participants/add/) and feel free to contact one of the PIs (Paula Mabee, pmabee@usd.edu; Andy Deans, andy_deans@ncsu.edu; Eva Huala, huala@acoma.stanford.edu; and Suzanna Lewis, selewis@lbl.gov).


Lapp gives NCBO webinar for Phenoscape

November 17, 2010

Hilmar Lapp gave a great overview today of the ongoing work in the Phenoscape project to 29 participants in the NCBO Webinar series.  This series showcases new projects, technologies and ideas in biomedical ontology, many of which use ontologies for interoperability.  Hilmar presented the biological context (evolution, conservation, development, etc.) into which our work fits, and the challenges involved in representing phenotype.  A videorecording of his talk will be posted in case you missed it.

Update: The slides are also posted on Slideshare


Phenoscape and colleagues meet with PATO on ontology and phenotype representation issues, Sept. 25-27, 2010

November 12, 2010

At the end of September, members of Phenoscape (Mabee, Balhoff), the Hymenoptera Anatomy Ontology (HAO) project (Yoder, Deans, Seltmann) and TAIR (Huala) met with developers of the Phenotype and Trait Ontology (PATO) (Gkoutos, Mungall, Westerfield, Lewis) at the University of Oregon.   Our discussions were focused on finding solutions to problems that have arisen as a result of PATO ontology structure, and problems for representing phenotypes in the EQ model, which have arisen in the course of annotating comparative phenotype data from the fish and hymenoptera literature.  We prepared for this meeting by developing a list of common issues and importantly, specific examples, on a Google doc shared among participants.  We all co-edited this document during the meeting with notes, decisions and examples, and we ‘published’ this Google doc for you all to see.  A number of important changes to the PATO hierarchy were proposed and subsequently made.  We also clarified best practices for modelling some common but tricky phenotypic features. One additional outcome was the participants strong recommendation that a ‘shape jamboree’ be held to improve the usability of this branch of the PATO ontology. Read the rest of this entry »


What’s new in TTO

July 19, 2010

In past months, the TTO (Teleost Taxonomy Ontology) has undergone some changes that will, we hope, make it more useful by connecting it with other taxonomic resources. Here, I will discuss three changes that have been added since last January, but check as more (and important) connections will be coming soon.

When the TTO was first built, we followed the pattern of the NCBI taxonomic ontology that was generated from the NCBI taxonomy database. One design feature of this ontology was the inclusion of terms for taxonomic ranks (e.g., family, genus, etc.) as a separate ‘tree’ of terms with the same ontology. The ontology file contained two root nodes, one for taxon terms, the other for taxonomic rank terms. We had long felt that ranks should exist in a separate ontology (more correctly a vocabulary) that could be shared across ontologies for different taxonomic groups. After several rounds of discussion on the obo-discuss list, we were invited in January to add the taxonomic rank vocabulary to the OBO library of ontologies of interest.

This acceptance allowed us both to register the rank vocabulary and to finally strip out the tree of rank terms from TTO and replace the internal rank tags with ‘has_rank’ links to terms in the (external) rank vocabulary. However, the new rank vocabulary is more than just the set of ranks that we used in tagging taxa in TTO.  The rank vocabulary  incorporates rank terms from two additional sources: first the rank terms that appear in the NCBI taxonomy itself, and also terms from a rank vocabulary developed for TDWG.  We hope that other taxonomic ontologies will be able to make use of this vocabulary.

More recently, we have gone back to the NCBI taxonomy and added cross references between our terms and lexically identical names in NCBI.  As TTO’s names are mostly drawn from the Catalog of Fishes, the exact relation between TTO terms and NCBI names is not, in some cases clear, which lead to the decision to leave the relationship at the level of a cross reference.

In the same release (156), common names, contributed by FishBase were added as synonyms.  As of now, approximately 16,000 taxa have common names with cross references back to their source in FishBase.  We hope to be able to add more common names and eventually include appropriate language tags to these names.

I’ve already started work on our next integration target, but I’ll save that for a later post.


2010 Semantic Web Workshop

June 16, 2010

I recently attended the 2010 Semantic Web Workshop in Santa Fe, hosted by the SSWAP project and iPlant, at St. John’s College.  This was a two-day workshop, June 7-8, introducing semantic web technologies and applications to biological data and service integration.  The first day was scheduled to be a whirlwind overview of semantic web technologies, beginning with a lecture on the foundations of web logic and reasoning in classic formal logic and moving through RDF, RDFS, and OWL.  However, air travel problems led me to miss the entire first day of the workshop.  Fortunately Damian Gessler, the workshop organizer, provided me with all the slides for the first day upon my arrival, and I was able to somewhat catch up before day 2.  These slides are really a great overview of semantic web technologies and will be a useful resource.

The second day focused on applications to biological data and web services.  A discussion on “taxonomic intelligence” was particularly illuminating.  It provided an example of how different communities can share a set of identifiers for species, for example, yet provide their own set of statements about the taxonomy relating those species.  Each community can draw conclusions relevant to its preferred taxonomy using data associated with the same species.

The afternoon focused on the SSWAP project, led by Damian Gessler.  SSWAP is a protocol which uses OWL documents to describe the inputs and outputs relevant to a web service.  Interestingly, users of these web services would submit their input in the very same OWL model used for service descriptions.

In Phenoscape, we are using OBO ontologies rather than RDF and OWL and storing our ontological annotations in OBD, a datastore tailored for OBO technologies which provides its own very effective reasoner.  However, this workshop provided a great opportunity to stay up to date with semantic web standards and explore how to make our data compatible with and part of the global semantic web.  In addition, St. John’s College was a great meeting location – it is a small college with a wonderful natural landscape in the hills outside of Santa Fe.