Report from Tucson: from characters to annotations with text mining

March 30, 2013

There is a wealth of phenotypic information in the evolutionary literature that comes in the the form of semi-structured character state descriptions. To get that information into computable form is, right now, an awfully slow process. In Phenoscape I, we estimated that it took about five person-years in total to curate semantic phenotype anphenowordcloudnotations from 47 papers. If we are to get computable evolutionary phenotypes from a larger slice of the literature, we really need to figure out ways to speed this up.

One promising approach is to use text-mining.  This could contribute in a few different ways.  First, one could efficiently identify all the terms in the text that are not currently represented in ontologies and add them en masse, so that data curation does not have to stop and resume whenever such terms are encountered. Second, one could present a human curator with suggestions for what terms to use and what relations those terms have to one another, speeding the process of composing an annotation.

CharaParser, developed by Hong Cui at the University of Arizona, is an expert-based system that decomposes character descriptions into recognizable grammatical components, and it is now being used in several different biodiversity informatics projects. Baseline evaluation results from BioCreative III showed that a naive workflow combining CharaParser and Phenex, the software curators use to compose ontological annotations and relate them to character states, was capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average) but had difficulty translating those into ontological annotations.  This first iteration workflow also was not yet reducing curation time.

In March, a small contingent from NESCent (Jim Balhoff, Hilmar Lapp and Todd Vision) visited Hong Cui’s group in Tucson. We talked through improvements to CharaParser and the curation workflow, brainstormed plans for a more thorough set of evaluation tests, began refactoring of the code so that it can be more easily shared across projects, and gained a better understanding of what features make a character difficult to curate for humans vs. text-mining.  We made substantial progress on all fronts, and are looking forward to seeing how much improvement in the accuracy and efficiency of curation will be achieved in the next round of testing.

We are also pleased to report that the CharaParser codebase will now be available from GitHub under an open source (MIT) license.


California Dreaming

March 27, 2013

Winner of a competition among participants to illustrate the essence of Phenoscape, from Paul Sereno

It’s easy to get caught up in the details when developing infrastructure. You know it will be useful – because the grant application said so!  But there’s so much engineering to do. And no matter how thoughtful and deliberate a process you follow to anticipate the needs of your future users, once they have a complicated thing in their hands who knows how they will actually use it.

Enter the Phenoscape Knowledgebase.  After a heroic data collection push this winter, our next release of the Knowledgebase will contain millions of evolutionary phenotypes from throughout the vertebrates, linked to genetic phenotypes from human, mouse, Xenopus, and zebrafish, and a particularly rich set of annotations for skeletal features of fins and limbs.  The Knowledgebase is far from comprehensive, and annotations do not capture the full richness of the original characters in the evolutionary literature, but we think it’s a pretty useful resource.

So, it’s time to see what capabilities our users are excited by and what limitations frustrate them. To that end, we brought a small group of experts who look at phenotypes in a variety of different ways (e.g. genetics, systematics, evo-devo, clinical biomedicine, paleontology, even zooarchaeology) to the California Academy of Sciences in February, and we asked them what questions they’d most like to address using the KB as it exists today.

To help us in tapping into the assembled brainpower, we enlisted KnowInnovation, facilitation pioneers that specialize in helping researchers self-organize into teams to tackle creative research challenges. This they did with amazing resourcefulness, milking ideas out of us that we wouldn’t have imagined we even had.  The workshop was no ordinary parade of PowerPoints. We did speed-dating to toss research ideas off of each other, generated a  staggering number of post-it notes, sculpted creatures and skeletal parts out of clay and engaged in a host of other seemingly contrived but strangely liberating activities.  We watched in amazement as Karl Gude took visual minutes.

clay1postitsspeed_dating2karl

And we came up with some great collaborative ideas for research that take leverage the Knowledgebase to ask questions that would have been difficult to impossible to answer without it, including questions about genetic convergence and parallelism, global comparisons of intra and interspecific phenotypic variation, and the evolution of phenotypes affected by duplicated genes. These projects will now serve as driving applications for Phenoscape so that we know better what our users really need the Knowledgebase to do for them.  We look forward to reporting on the outcome of those in due course.

A big thank you to David Blackburn and the Cal Academy for providing such an inspiring venue, being exquisite hosts, and for conveniently having an open museum night during our workshop.  Thanks also to a great group of participants and facilitators, and to to NSF for a supplemental award that helped to make the workshop a success.


Homology in anatomy ontologies: Report from a Phenotype RCN meeting

February 26, 2013

At the end of October 2012, the working groups of the Phenotype Research Coordination Network (RCN) all met at the Asilomar Conference Center, in Pacific Grove, CA. One of the groups, the Vertebrate working group, made it their goal to discuss methods of representing phylogenetic and serial homology in anatomy ontologies, an issue that is central to Phenoscape as well. Though common ancestry is implicit in the semantics of many classes and subclass relationships (see for example the ‘homology_notes’ for digit in Uberon), most multispecies anatomy ontologies, including Uberon, VSAO, and TAO, do not assert homology relationships between anatomical entities.  Nonetheless, homology is central to comparative biology, and therefore to enriching computations across data types, species, and evolutionary change.

Read the rest of this entry »


Society of Vertebrate Paleontology Annual Meeting 2012 (Raleigh, NC)

November 27, 2012

The Phenoscape project had a strong presence at the largest Vertebrate Paleontology/Comparative Anatomy conference in the world this year, the Society of Vertebrate Paleontology annual meeting. In one of the large conference halls, and in front of a packed audience, I gave a talk on the history, goals and background of the Phenoscape project (“Phenoscape: A New Anatomical Ontology of Vertebrates”). The authorship also included Paul  Sereno, Paula Mabee, Todd Vision and Hilmar Lapp. The talk was well received, and several attendees expressed great interest in our work. The difficult part now is to make sure this first spark of interest is maintained – this can be difficult when the community has not been exposed to ontologies before and the project appears to be so different from anything they have done before – but we’ll do our best to stay in contact with those people that expressed strong interest.

Alex Dececchi presented a poster on Phenoscape at the same conference (Phenoscape: bridging the gap between fossils and genes – his co-authors were J. Balhoff, W. Dahdul, N. Ibrahim, H. Lapp, P. Midford, P. Sereno, T. Vision, M. Westerfield, P Mabee and D. Blackburn), making sure that even those that could not attend the talk would get an opportunity to learn more about our exciting work.

Nizar Ibrahim, University of Chicago


Phenex 1.6 released

October 10, 2012

Phenex 1.6 has been released. Updates:

  • Support for entry of polymorphic values in matrix cells (documentation).
  • Improvements to the tab-delimited export format.

Download for Mac, Windows, or Unix.


Knowledgebase tutorial now available

October 4, 2012

The Phenoscape team has created a tutorial introduction to the Knowledgebase. The tutorial is designed to introduce users to exploring phenotypic data in the Phenoscape Knowledgebase, starting with searching for anatomical terms, browsing data using faceted browsing, and performing searches using the query panel. Let us know if you find it helpful in getting started with the KB.

Link: Phenoscape Knowledgebase tutorial


Junior Biocurator

September 11, 2012

This summer, with the help of non-profit organization Project Exploration (http://www.projectexploration.org/), we ran the Junior Biocurator program for the first time. This is the ‘outreach’ part of the Phenoscape project. The program content and structure was designed by Paul Sereno, Lauren Conroy, Nicole Ridgwell and myself. The program includes several lectures, covering topics as diverse as biocuration, comparative anatomy and photography techniques. Lectures were given by Lauren Conroy, Erin Fitzgerald, Nicole Ridgwell and myself. Nicole also supervised the day to day activities and labs. In their ‘hands on’ time the students acquired a whole array of impressive new skills, outlined below.

I was a little worried that the Jr Biocurators would find the curriculum to be too demanding and difficult, but the five students – Haley, Kyle, Hope, Michael and Kamal – really enjoyed their time and impressed everybody with their curiosity, enthusiasm and overall performance in the many exercises and other tasks they had to complete. The students learned how to put together vertebrate skeletons, how to organize biological information in an ontology, and how to take high quality photographs of vertebrate bones. They also learned how to manipulate these images in Photoshop, effectively creating publication quality image files. As if that was not enough, towards the end of the program, they learned how to use a laser scanner and visualized the bones in a whole new way. The images they created (photography and scanning) will be used by us – first in Protege and ultimately in the Phenoscape user interface. Our Jr Biocurators were extremely proud when they heard that their work would make a real, tangible contribution to a major NSF funded project.

While we were putting together the curriculum, I suggested we also offer the students opportunities to learn more about university life in general. Nicole took them to the University of Chicago admissions office, were they could ask all their questions, and they also went on a big campus tour and visited the Oriental Institute Museum (which is part of our university). This part of the curriculum was also extremely well received.

Running the program was a lot of work, but it was all worth it, considering that our Jr Biocurators all became good friends (and good young budding scientists!) and were genuinely sad when the program ended. I am looking forward to meeting the next Jr Biocurators in 2013 and am no longer worried about the degree of difficulty of the curriculum.

To find out more about the program and read some of the blog posts our Biocurators wrote visit:

http://www.projectexploration.org/blog/2012/07/18/two-day/

http://www.facebook.com/media/set/?set=a.10151007328856583.416729.91282556582&type=3

http://www.projectexploration.org/blog/2012/08/01/university-tour/

Nizar Ibrahim,

University of Chicago