Linked Open Data

Linked Open Data

I've been working with VIVO (www.vivoweb.org) for several months and through that have been exposed to the Linked Open Data community.  This is striving to create semantically-linked repositories of data for use in research.  VIVO's primary goal is to create virtual CV's that are richly linked together to encourage and foster collaboration.

Below is an Outsell Insight article on this open data trend.  I think this has strong implications for ACS, especially the CAS division:

By Mark Ware
Vice President & Lead Analyst
London, UK

May 18, 2012

Nature Publishing Group's release of a linked data platform points the way to improved search, discovery and exploration of STM content.


Important details: Nature Publishing Group has announced the release of a new way to access its publication data via a linked data platform.At this stage the platform provides access to publication metadata. It includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. In this first release, the datasets include basic citation information (title, author, publication date, etc.) as well as NPG-specific ontologies. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximum exploitation and re-use of the data.


NPG's platform is designed to allow for easy querying, exploration and extraction of data and relationships about articles, contributors, publications, and subject topics, using standard vocabularies such as Dublin Core. The data is integrated with existing public datasets including CrossRef and PubMed.


The NPG platform has been built in collaboration with information and publishing solutions specialist The Stationery Office (TSO).


Implications: Linked data is a way of publishing data on the web in a structured way that facilitates interlinking of the data to other resources, and thus makes the data more useful. Built using standard web technologies, linked data allows relationship between things to be expressed, which greatly facilitates navigation between, and integration of, multiple information sources.


Linked open data (often shortened to just "open data") is linked data for which an "open" licence is in place (as in the NPG platform) to permit sharing, extension and re-use.


NPG is by no means the only player in STM to explore linked data. Elsevier also see it as a way to disaggregate its content into meaningful data which can then be re-aggreated into new information solutions. Thomson Reuters' free Open Calais web service, which allows users to add rich semantic metadata to unstructured content, also supports linked data (see Insights 10 May 2010, From "Text Mine!" to Text Mining: STM Text Analytics Come of Age).


Apart from STM publishers, organisations committing resources to linked data project include the Library of Congress, the British Library and other national libraries such as those in France and Germany, and OCLC Online Computer Library Center. The technology has also been pioneered by organisations as diverse as the BBC, the New York Times, and supermarket giant Tesco.


Libraries as well as publishers see benefits in increasing the visibility of their collections through search engine optimisation. In current systems, library data is held in databases which, although they may have web search interfaces, are not integrated with other data resources on the web. Combining library linked data with that from publishers and from a variety of other sources has the potential to dramatically improve discovery and navigation in terms of speed, relevance, and ways for refining of searches. Multi-lingual search and discovery could become much more effective. Similarly, linked data also offers the promise of bridging information use across subject domains that may use different terminology to discuss related topics, benefiting translational and interdisciplinary research.


For the benefits of linked data to extend beyond specialist communities it will be important for the major search engines to support it. Microsoft's Academic Search utilises linked open data (for example to create graphical visualisations of search results), whereas Google has yet to commit. At present Google's "rich snippets" and similar approaches are the versions of structured data used to provide better search results. These are enabled by microdata, for example as codified by schema.org. This initiative is supported by all the major search engines, and has been seen by some as a response to linked open data, intended to maintain the pre-eminent position of the search engines in web discovery. Schema.org is, however, too limited for scientific publishing (and science), having a reduced vocabulary and not being extensible.

  At present it is sometimes said that linked data lacks a "killer app" to drive widespread adoption. The real benefits will emerge as more organisations surface their data as linked open data, as NPG has done, so that other organisations and communities can make use of it to create new services. It is not an accident that NPG refer to its linked data service as a platform. Elsevier has similarly talked of delivering linked data to support userdriven innovation, and fostering a developer ecosystem using their services as a platform. By exposing their data in this way, publishers can improve the visibility, usage and value of their content.