January 16, 2020
Just after the holidays, we held the second short course, as part of the SCATE project, on Phylogenetic Comparative Analysis of Integrated Anatomical Traits at the SSB 2020 meeting in Gainesville, FL. The course is an introduction to ontologies and their application in phylogenetic comparative analyses. We demonstrated how to use anatomical dependencies derived from ontologies to better model character evolution, and how semantic similarity of traits could be integrated in a comparative analysis.
The hands on sessions included demos of RPhenoscape and the PARAMO pipeline. Using these tools, attendees learned how to query for phenotypes in the Phenoscape KB and build character matrices that take anatomical dependencies into account. They also learned to construct stochastic character maps for body regions and whole organismal anatomies on a phylogeny. The large number of attendees (37!) came with a range of abilities, from novice R users to experienced developers, and from a range of career stages, from undergraduate students to postdocs and faculty. Anticipating that running a workshop with a group this large would be challenging because of the possible technical issues that can arise, we built in “tech troubleshooting” time at the beginning of the course to help install the R package. Participants either interactively followed the tutorials using the RMarkdown files in RStudio or followed a rendered version of the course materials on the course website.
For those interested in trying out the course materials without needing to install the required software, we’ve published a Code Ocean capsule containing the code and tutorials. The capsule runs in a web browser, without needing to install R, R packages, or any dependencies.
March 12, 2019
Ontologies encode information about a domain of knowledge, such as how anatomical structures are related, which is crucial information for modeling character evolution. Phenoscape, in its current Semantic Comparative Analyses for Trait Evolution (SCATE) project, is developing tools that use the computable knowledge in ontologies to improve phenotypic character modeling and inform analyses of trait evolution. To train evolutionary biologists and developers of comparative analysis tools to adopt these new capabilities, the SCATE team will be holding a short course on using ontologies in comparative analyses of integrated anatomical traits, in conjunction with iEvoBio and the Evolution Meetings, on June 26, 2019 in Providence, Rhode Island.
Attendees will learn how to use R packages such as RPhenoscape to access a knowledgebase of ontology-linked phenotypes (kb.phenoscape.org), build character matrices that take anatomical dependencies into account, and use these to construct stochastic character maps on a phylogeny. The course will also include a practical introduction to community ontologies for biodiversity domain knowledge (anatomy, taxonomy, phenotypic attribute).
Graduate students, postdocs, faculty, and software developers with interests in comparative analyses, morphology, and phylogenetics are encouraged to apply.
Registration for this post-conference event is free. See the Call for Participation for registration and further information.
September 29, 2017
Call for Participation:
Computable evolutionary phenotype knowledge: a hands-on workshop
The Phenoscape project is hosting a hands-on workshop on Dec 11-14, 2017, at Duke University in Durham, North Carolina.
Evolutionary phenotype data that is amenable to computational data science, including computation-driven discovery, remains relatively new to science. Therefore use-cases and applications that effectively exploit these new capabilities are only beginning to emerge. If you are interested in discovering, linking to, recombining, or computing with machine-interpretable evolutionary phenotypes, this is the workshop for you!
The event will bring together a diverse group of people to collaboratively design and work hands-on on targets of their interest that take advantage and promote reuse of Phenoscape’s online evolutionary data resources and services. The event is designed as a hands-on unconference-style workshop. Participants will break into subgroups to collaboratively tackle self-selected
The full Call for Participation, including motivation and scope, is posted here: https://hackmd.io/s/Sk6Xa7Eq-#
To apply to participate in the event, please fill out the application form by Oct 9, 2017. Travel sponsorship is available but limited, as is space.
April 6, 2016
What are the challenges in building, visualizing and using the Tree of Life? How can we best utilize and build on existing phylogenetic knowledge and look ahead to address the challenges of data integration? Recently, fellow Phenoscaper Jim Balhoff and I attended the first FuturePhy workshop in Gainesville, Florida (February 20-22, 2016). The workshop brought together three taxonomically-defined working groups (catfish, beetles, barnacles) to build megatrees from existing phylogenetic studies, and identify and begin applying diverse data layers for their respective groups. Open Tree and Arbor personnel were on hand discuss and help solve issues in data integration.
Read the rest of this entry »
January 13, 2016
Efficiently extracting knowledge from the published literature is a challenge faced by many database projects in biology, and many of us are interested in tools that can assist and speed up the task of identifying concepts in free text. I’ve recently used two text markup tools that are helpful in keeping up with the literature and rapidly developing ontologies. As a participant in the Fifth BioCreative Challenge, in which biocurators test and evaluate text mining systems, I evaluated the EXTRACT bookmarklet tool. EXTRACT was developed for metagenomics data and provides full-page tagging of mapped terms from environment, disease, taxonomy, and tissue ontologies, and can also markup shorter selections of text on an HTML page. The tool is immediately useful, particularly during the first stages of the curation process, as a curator is surveying the literature for relevant articles.
Annotating long, descriptive text has also been a challenge for Phenoscape. To assist curators in this task, we recently added a text annotator tool to the Phenoscape Knowledgebase that tags selected text passages copied in from a source with matched terms from anatomy (Uberon), taxon (VTO), and quality (PATO) ontologies. Viewing the annotated results, with color-coded text, has aided curators in the process of applying large, complex ontologies to equally complex text.
September 26, 2014
A team led by University of Chicago Phenoscapers Nizar Ibrahim and Paul Sereno have published new findings about the remarkable semiaquatic predatory dinosaur Spinosaurus aegyptiacus in the latest issue of Science. It has been receiving some nice coverage at NPR and other news outlets.
Workers at the National Geographic Museum in Washington grind the rough edges off a life-size replica of a spinosaurus skeleton. Credit: Mike Hettwer/National Geographic.
From the abstract:
We describe adaptations for a semiaquatic lifestyle in the dinosaur Spinosaurus aegyptiacus. These adaptations include retraction of the fleshy nostrils to a position near the mid-region of the skull and an elongate neck and trunk that shift the center of body mass anterior to the knee joint. Unlike terrestrial theropods, the pelvic girdle is downsized, the hindlimbs are short, and all of the limb bones are solid without an open medullary cavity, for buoyancy control in water. The short, robust femur with hypertrophied flexor attachment and the low, flat-bottomed pedal claws are consistent with aquatic foot-propelled locomotion. Surface striations and bone microstructure suggest that the dorsal “sail” may have been enveloped in skin that functioned primarily for display on land and in water.
Citation: Ibrahim N, Sereno PC, Dal Sasso C, Maganuco S, Fabbri M, Martill DM, Zouhri S, Myhrvold N, Iurino DA (2014) Semiaquatic adaptations in a giant predatory dinosaur. Science. http://doi.org/10.1126/science.1258750.
August 27, 2014
I attended the Evolution 2014 meeting a few months ago in Raleigh, NC, and presented a poster on Phenoscape’s curation effort: “Moving the mountain: How to transform comparative anatomy into computable anatomy?”, with coauthors A. Dececchi, N. Ibrahim, H. Lapp, and P. Mabee. In this work, we assessed the efficiency of our workflow for the curation of evolutionary phenotypes from the matrix-based phylogenetic literature. We identified the bottlenecks and areas of improvement in data preparation, phenotype annotation, and ontology development. Gains in efficiency, such as through improved community data practices and development of text-mining tools, are critical if we are to translate evolutionary phenotypes from an ever-growing literature. The poster was well received and several researchers at the meeting were interested in learning more about open source tools for phenotype annotation.
January 25, 2014
Our paper describing the Vertebrate Taxonomy Ontology (VTO) is published! See: http://www.jbiomedsem.com/content/4/1/34 .
One primary objective for Phenoscape and similar projects is to aggregate phenotypic data from multiple studies to named taxa, which in many phylogenetic studies are species but also might be at higher taxonomic levels such as genera or families. While there are many widely used taxonomies that include rich sampling of species and higher taxa, for example Bill Eschmeyer’s widely used Catalog of Fishes, there are few vetted “bridging” taxonomies that allow for aggregating data across, say, fishes, amphibians, and mammals. This problem becomes even more acute when you consider integrating data for extinct taxa as well. As a first step towards addressing this issue for vertebrates, we created the Vertebrate Taxonomy Ontology (VTO) that brings together taxonomies from NCBI, AmphibiaWeb, the Catalog of Fishes (via the previously existing Teleost Taxonomy Ontology), and the Paleobiology Database. The resulting curated taxonomy contains more than 106,000 terms, more than 104,000 additional synonyms, and extensive cross-referencing to these existing taxonomies. The Phenoscape Knowledgebase will leverage this taxonomic ontology by allowing for phenotype statistics to be displayed by taxon, including coarse measures of the extent of annotation coverage and phenotypic variation. Though phenotypes may be annotated to a species, the use of an ontological framework for the taxonomic hierarchy facilitates aggregating phenotypes to higher levels, such as genera or families. In the future, we hope to be able to integrate other excellent and rich sources of taxon-specific taxonomies, such as that in the Reptile Database or the International Ornithologists’ Union Bird List. This is a work-in-progress and the Phenoscape team is certainly interested to integrate new taxonomic sources as well as explore different ways that such a resource can be used and developed by the larger community.
January 10, 2014
In an effort to expand the user community and to demonstrate what is possible using our infrastructure, members of the Phenoscape team gave multiple presentations across two continents on our recent developments. In late October Paula Mabee gave an invited presentation on mapping phenotypes across phylogenies at the Muséum national d’Histoire naturelle in Paris. This was followed by presentations at the 73rd annual meeting of the Society of Vertebrate Paleontology (SVP) in Los Angeles and the 2013 meeting of the Taxonomic Database Working Group (TDWG) in Florence, Italy. Phenoscape had a significant presence at SVP with both a poster presented by Alex Dececchi demonstrating our progress in generating supermatrices from our annotations as well as a talk given by collaborator Karen Sears, using EQ supermatrices from Phenoscape fin/limb data to examine integration patterns across the fin to limb transition. Karen’s talk marks the first of the collaborations coming out of our 2013 San Francisco workshop. It also showed how data from Phenoscape can drive independent projects and is easily integrated with existing phylogenetic and statistical tool such as Mesquite and various R modules. The talks and poster were well received, with numerous researchers inquiring on how they could incorporate Phenoscape or use ontology based annotations.
March 30, 2013
There is a wealth of phenotypic information in the evolutionary literature that comes in the the form of semi-structured character state descriptions. To get that information into computable form is, right now, an awfully slow process. In Phenoscape I, we estimated that it took about five person-years in total to curate semantic phenotype annotations from 47 papers. If we are to get computable evolutionary phenotypes from a larger slice of the literature, we really need to figure out ways to speed this up.
One promising approach is to use text-mining. This could contribute in a few different ways. First, one could efficiently identify all the terms in the text that are not currently represented in ontologies and add them en masse, so that data curation does not have to stop and resume whenever such terms are encountered. Second, one could present a human curator with suggestions for what terms to use and what relations those terms have to one another, speeding the process of composing an annotation.
CharaParser, developed by Hong Cui at the University of Arizona, is an expert-based system that decomposes character descriptions into recognizable grammatical components, and it is now being used in several different biodiversity informatics projects. Baseline evaluation results from BioCreative III showed that a naive workflow combining CharaParser and Phenex, the software curators use to compose ontological annotations and relate them to character states, was capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average) but had difficulty translating those into ontological annotations. This first iteration workflow also was not yet reducing curation time.
In March, a small contingent from NESCent (Jim Balhoff, Hilmar Lapp and Todd Vision) visited Hong Cui’s group in Tucson. We talked through improvements to CharaParser and the curation workflow, brainstormed plans for a more thorough set of evaluation tests, began refactoring of the code so that it can be more easily shared across projects, and gained a better understanding of what features make a character difficult to curate for humans vs. text-mining. We made substantial progress on all fronts, and are looking forward to seeing how much improvement in the accuracy and efficiency of curation will be achieved in the next round of testing.
We are also pleased to report that the CharaParser codebase will now be available from GitHub under an open source (MIT) license.