Notes from ISWC 2011

November 3, 2011

Last week, I attended the 10th International Semantic Web Conference (ISWC) in Bonn, Germany. A tremendous variety of sophisticated work is going on both in academia and industry to improve the technology for, and take advantage of, the ever-growing network of data and concepts published, through open standards, on the web.

You might say it is the best of times and the worst of times for semantic web enthusiasts, in that reasoning and query engines that can be used on large collections of RDF have in the last few years become a reality (one of the Challenge Tracks provided contestants with a *billion* triples to work with).  But some see clouds on the horizon. The web search titans (Bing, Google and Yahoo!) are now pushing schema.org, a microformat and vocabulary standard for web content that some worry may threaten the development of richer semantic web technology.  Still, most treated the news positively, happy to know that these organizations now seem to agree on the importance of semantics.  In fact, Yahoo! described at the conference how they are trying to build a “Web of Objects” that takes advantage of scheme.org, together with more extensive internal vocabularies, to regroup knowledge pieces that are scattered around the Web.

Conference chair Natasha Noy showed a revealing pair of tag clouds comparing the abstracts from the first year of the conference in 2001 to today — the terms “semantic” and “web” have shrunk in importance and “data” is now king! ISWC 2011 tag cloud

Ivan Herman’s blog gives a good sampling of the flavor of talks presented at the meeting.  I especially enjoyed the Industry Track, since these applications are less familiar to me than the academic/scientific ones, and  I was particularly impressed by the importance of semantic technologies to the news media and other content industries.  These technologies are being deployed by news organizations with great enthusiam (e.g. the BBC).  I also came away with a strong sense that semantic technologies are helping to create demand, and drive a revolution in the use of, Open Government Data; there were a number of demonstrations of useful real-world applications, particularly to environmental monitoring.

With my Phenoscape hat on, I attended a Linked Open Data for Science (LISC) satellite workshop prior to the main conference.  The event included both presentations and discussions from a variety of perspectives about the opportunities and challenges of this new technology.  A diversity of fields were represented (social science, linguistics, geosciences, biomedicine, etc.).  But, it is clear that uptake of linked open data as an alternative means of publication is still in its infancy within the sciences.  This despite the fact that the bioinformatics data centers account for nearly a quarter of the real estate in the famous linked data cloud diagram.  Some of the most exciting opportunities, in my opinion, come from the ability to allow radically decentralized data publication, and this is something that we might wish to pilot in a modestly distributed data curation environment like Phenoscape.  Another observation: I was surprised to discover at the meeting how much the utility of the linked data cloud (and, by extension, the semantic web) depend on the social convention by which everyone provides links into a relatively small number of large ‘concept repositories’ like DBPedia (which was originally a Master’s project, BTW).

The breakout discussion sessions at LISC  highlighted how scientific practice will place difficult demands on linked data with respect to provenance, context, granularity, distributed authority, etc.  This resonated with the message of our own contribution to the workshop, which outlined some of the particular challenges in making context-dependent links between scientific objects, when the descriptions of those objects are scattered across different resources, and when the similarities between objects are spread weakly over many properties [1].  Another important question that hit home for a number of us coming from the bioinformatics and biodiversity informatics world is how scientists are going to be able to take advantage of the innovations now going on in the commercial sector (including some of the exhibitors at the main conference) within the constraints and DIY culture of small individual university-based research grants.

There is no denying the explosion in linked data resources out there (comparisons of the growth in the cloud diagram are about as common as graphs showing the growth in sequence data at a biology conference).  But another recurrent theme of the meeting was that unfortunately much of that content is missing semantics (i.e. a lack of use or availability of ontologies for many concepts, and lack of links between content at different endpoints), and generating semantically annotated triples needs to be easier that it currently is (a message certainly relevant to those of us developing curation tools).

One of the keynotes, from Frank van Harmelen, generated quite a bit of buzz.  He looked back on 10 years of the semantic web, asking what theoretical principles we can learn from the experience so far, and his annotated slides are well worth a look.

The conference was a great mix of different formats.  In addition to the keynotes and regular talks, there are a host of workshops and tutorials, challenges, panel discussions (including one billed as a ‘Death Match’), and even a special competition for the best “Outrageous Ideas”.  The winner of that one was a proposal to bring linked data to the non-networked portion of humanity.  A particularly nice feature of the meeting was the ‘Minute Madness’ preceding the poster session in which each of the poster presenters gave a short timed pitch with to all the attendees – it was a very entertaining and informative way to ‘see’ every poster and allowed everyone to quickly pick out which ones to hit during the session.

For more, see the excellent day-by-day summary of the meeting from Juan Sequeda, where there are links to all the winning presentations and challenge entries.  [Ironically, the conference website is down temporarily while it is being moved, so come back later if the links to the papers hang].  The next ISWC will be November 11-15, 2012 in Boston.

Reference:

[1] Vision T, Blake J, Lapp H, Mabee P, Westerfield M (2011) Similarity Between Semantic Description Sets: Addressing Needs Beyond Data Integration, in Proceedings of the First International Workshop on Linked Science, Bonn, Germany, October 24, 2011, Tomi Kauppinen, Line C. Pouchard, Carsten Kessler (eds), published in CEUR Workshop Proceedings, Volume 783.


Phenoscape visits Xenbase for Anatomy Ontology Update

September 23, 2011

Last month I visited Xenbase and Aaron Zorn’s lab at the Cincinnati Children’s Hospital for a couple of days (August 21-23, 2011) to work with Xenbase curators in preparing the Xenopus Anatomy Ontology (XAO) for its next big release.  Xenbase curators Christina James Zorn and VG Ponferrada have been leading the effort, and Erik Segerdell, the ontology development coordinator for the Phenotype RCN and former Xenbase curator, was also visiting for the week and helping with the update. Erik and I provided training in ontology editing and synchronization tools. Read the rest of this entry »


CSHALS 2011

March 9, 2011

I recently attended the Conference on Semantics in Healthcare and Life Sciences (CSHALS), in Cambridge, MA. The CSHALS meeting was a change for me in that it’s much more healthcare-oriented than other venues in which I’ve presented work from Phenoscape. This was a great opportunity to see how far the healthcare community has pushed semantic web technologies, and also to become more familiar with some of the more commercial packages which are available for storing and querying very large knowledgebases based on RDF (for example, AllegroGraph and Gruff from Franz, Inc., and Sentient Knowledge Explorer from IO Informatics). A particularly interesting talk was the keynote by Toby Segaran, of Metaweb Technologies, advocating semantic techniques as a more agile approach to data. Slideshows from the conference presentations are available for download here, including my own.


Third beta release of Phenoscape Knowledgebase 2.0

February 7, 2011

Phenoscape Knowledgebase 2.0 beta release 3 is now available at http://kb.phenoscape.org/. This version includes an enhanced publication info interface [example] which displays the original character matrix, as well as a list of taxa including the taxonomic names and museum specimens used in the dataset. Other recent developments in the KB are global term info popups and hierarchical browsing of ontology terms on their info pages [example].

We have also improved our software and data release processes so that the public Knowledgebase can easily keep up with new developments and the latest data updates from our curators. Looking forward, the next major feature to be added to the Knowledgebase is a faceted browsing interface which is currently under development. This interface should help provide an overview of how the data are organized within the various ontologies used in the Knowledgebase.


Matching Phenotypes

December 17, 2010

An important goal for the Phenoscape project is to be able to suggest candidate genes that may have contributed to evolutionary change.  The way that we have proposed to do this is to search for changes in phenotype that appear as the result of mutations in model organisms and also appear as phenotype changes on an evolutionary tree.  There are several challenges in designing this search, apart from simply recognizing similar phenotypes, that we have been working on during the past few months.

The first issue is that we are interested in changes in phenotype, not simply matching phenotypes.  For phenotypes associated with mutants of model organism mutants, it is understood that they vary with respect to the wild type.  For taxa, however, this means looking for taxonomic nodes where variation in a phenotype is observed among the children of the node.  For example, there are nine species within the genus Aspidoras with annotations for the shape of the opercle bone.  Of these, eight exhibit opercle bones with round shape, but the ninth (A. pauciradiatus) is annotated with a triangular opercle.  In contrast, all three annotated species of the related Hoplosternum are annotated with a triangular opercle.  Thus there is detectable variation in opercle shape within the children of Aspidoras, but not within  Hoplosternum - suggesting that change in opercle shape has occurred somewhere among the descendants of  Aspidoras. For our analysis, identifying variation among descendants is important.

Thus, our search for shared variation in phenotypes focuses on matching phenotypes associated with genes with phenotypes of taxa showing variation.  However we are looking for matches at a larger scale than single phenotypes; we are looking for matches across the set of phenotypes affected by a gene or the set of features that have changed among the descendants of a taxonomic node.   We refer to these sets of phenotypes as the ‘phenotypic profile’ of a gene or taxon, following a seminal paper by Washington et al. 2009.  Washington et al. propose four metrics (three based on ‘information content’) to score matches between the sets of phenotypes in a pair of profiles.

In the course of developing the search, we have encountered several important differences in curation approach between ZFIN and Phenoscape.  In some cases tehre are different uses of PATO to model the same phenotype, for example the absence of an entity.  In other cases ZFIN uses a quality ‘abnormal’ that applies to mutants, but not in a taxonomic, comparative sense, which means these phenotypes will be inaccessible to us.  Thus, implementing this search is helping us to better understand our data and our choices in modeling the data and how it interoperates with other ontology-based data.  Such reflection would have been difficult or impossible without the use of ontologies to represent the phenotypes.


Phenoscape and colleagues meet with PATO on ontology and phenotype representation issues, Sept. 25-27, 2010

November 12, 2010

At the end of September, members of Phenoscape (Mabee, Balhoff), the Hymenoptera Anatomy Ontology (HAO) project (Yoder, Deans, Seltmann) and TAIR (Huala) met with developers of the Phenotype and Trait Ontology (PATO) (Gkoutos, Mungall, Westerfield, Lewis) at the University of Oregon.   Our discussions were focused on finding solutions to problems that have arisen as a result of PATO ontology structure, and problems for representing phenotypes in the EQ model, which have arisen in the course of annotating comparative phenotype data from the fish and hymenoptera literature.  We prepared for this meeting by developing a list of common issues and importantly, specific examples, on a Google doc shared among participants.  We all co-edited this document during the meeting with notes, decisions and examples, and we ‘published’ this Google doc for you all to see.  A number of important changes to the PATO hierarchy were proposed and subsequently made.  We also clarified best practices for modelling some common but tricky phenotypic features. One additional outcome was the participants strong recommendation that a ‘shape jamboree’ be held to improve the usability of this branch of the PATO ontology. Read the rest of this entry »


2010 Semantic Web Workshop

June 16, 2010

I recently attended the 2010 Semantic Web Workshop in Santa Fe, hosted by the SSWAP project and iPlant, at St. John’s College.  This was a two-day workshop, June 7-8, introducing semantic web technologies and applications to biological data and service integration.  The first day was scheduled to be a whirlwind overview of semantic web technologies, beginning with a lecture on the foundations of web logic and reasoning in classic formal logic and moving through RDF, RDFS, and OWL.  However, air travel problems led me to miss the entire first day of the workshop.  Fortunately Damian Gessler, the workshop organizer, provided me with all the slides for the first day upon my arrival, and I was able to somewhat catch up before day 2.  These slides are really a great overview of semantic web technologies and will be a useful resource.

The second day focused on applications to biological data and web services.  A discussion on “taxonomic intelligence” was particularly illuminating.  It provided an example of how different communities can share a set of identifiers for species, for example, yet provide their own set of statements about the taxonomy relating those species.  Each community can draw conclusions relevant to its preferred taxonomy using data associated with the same species.

The afternoon focused on the SSWAP project, led by Damian Gessler.  SSWAP is a protocol which uses OWL documents to describe the inputs and outputs relevant to a web service.  Interestingly, users of these web services would submit their input in the very same OWL model used for service descriptions.

In Phenoscape, we are using OBO ontologies rather than RDF and OWL and storing our ontological annotations in OBD, a datastore tailored for OBO technologies which provides its own very effective reasoner.  However, this workshop provided a great opportunity to stay up to date with semantic web standards and explore how to make our data compatible with and part of the global semantic web.  In addition, St. John’s College was a great meeting location – it is a small college with a wonderful natural landscape in the hills outside of Santa Fe.


Revising the Knowledgebase interface

March 24, 2010

We have been developing mockup versions of new web interfaces for the Phenoscape Knowledgebase.  In order to design an updated interface which is both more powerful and easier to use than the existing one, in February I presented a series of mockups to faculty, post-docs, and graduate students at the University of Oregon, the home of ZFIN.  Following user-testing expertise at ZFIN, I met with the researchers in pairs and recorded their feedback on newly designed interfaces for viewing anatomical and taxonomic terms within the ontology hierarchy, configurable queries for phenotype annotations, and data visualization on phylogenetic trees.  The feedback proved to be extremely valuable and has led to several modifications to the planned interface revisions.


Phenex 1.0.3 released

February 23, 2010

Phenex 1.0.3 is now available.  This release fixes a serious bug which caused Phenex to append modified phenotype annotations within files, instead of replacing the previous data. Phenex will now read and write NeXML files correctly. It should also automatically recover the latest data from files saved with older versions of Phenex.

All Phenex users should replace their current copy of Phenex with the latest release. It can be downloaded from the Phenex homepage on the Phenoscape wiki.


Phenoscape internship experience

February 23, 2010

Hello all,

As an online student of Bioinformatics based in Nairobi, Kenya, I had a strong desire to undertake a project that would enhance my knowledge and skills in software development. Hence, after completing MSc. Course work at the University of Manchester, UK, I was happy to be awarded an internship from the Phenoscape project for an 11-week traineeship beginning September 21st, 2009 at the National Evolutionary Synthesis Center (NESCent). This project seeks to establish the developmental and genetic basis of the astonishing morphological heterogeneity across diverse species. In addressing this, a rich and rigorous knowledge base, PhenoscapeKB, constituting evolutionary variable characters across a clade of fishes connected to mutant phenotypes from ZFIN has been developed. Core to the PhenoscapeKB is the modeling of the character entities   using ontologies thus facilitating the knowledge synthesis via logical/mathematical reasoning. Read the rest of this entry »


Follow

Get every new post delivered to your Inbox.