An important goal for the Phenoscape project is to be able to suggest candidate genes that may have contributed to evolutionary change. The way that we have proposed to do this is to search for changes in phenotype that appear as the result of mutations in model organisms and also appear as phenotype changes on an evolutionary tree. There are several challenges in designing this search, apart from simply recognizing similar phenotypes, that we have been working on during the past few months.
The first issue is that we are interested in changes in phenotype, not simply matching phenotypes. For phenotypes associated with mutants of model organism mutants, it is understood that they vary with respect to the wild type. For taxa, however, this means looking for taxonomic nodes where variation in a phenotype is observed among the children of the node. For example, there are nine species within the genus Aspidoras with annotations for the shape of the opercle bone. Of these, eight exhibit opercle bones with round shape, but the ninth (A. pauciradiatus) is annotated with a triangular opercle. In contrast, all three annotated species of the related Hoplosternum are annotated with a triangular opercle. Thus there is detectable variation in opercle shape within the children of Aspidoras, but not within Hoplosternum – suggesting that change in opercle shape has occurred somewhere among the descendants of Aspidoras. For our analysis, identifying variation among descendants is important.
Thus, our search for shared variation in phenotypes focuses on matching phenotypes associated with genes with phenotypes of taxa showing variation. However we are looking for matches at a larger scale than single phenotypes; we are looking for matches across the set of phenotypes affected by a gene or the set of features that have changed among the descendants of a taxonomic node. We refer to these sets of phenotypes as the ‘phenotypic profile’ of a gene or taxon, following a seminal paper by Washington et al. 2009. Washington et al. propose four metrics (three based on ‘information content’) to score matches between the sets of phenotypes in a pair of profiles.
In the course of developing the search, we have encountered several important differences in curation approach between ZFIN and Phenoscape. In some cases tehre are different uses of PATO to model the same phenotype, for example the absence of an entity. In other cases ZFIN uses a quality ‘abnormal’ that applies to mutants, but not in a taxonomic, comparative sense, which means these phenotypes will be inaccessible to us. Thus, implementing this search is helping us to better understand our data and our choices in modeling the data and how it interoperates with other ontology-based data. Such reflection would have been difficult or impossible without the use of ontologies to represent the phenotypes.