Sunday, 1 December 2013

Final reporting for ANDS and technical reflections -- Part 2.

Continuing our technical reflections on this project, I will wrap up with some speculation on how we think the infrastructure will evolve over the next 12 months.

One of the main features of the project has been how it has extended the preexisting Founders and Survivors (FAS) data model with genealogical relationships and with a hierarchical way of documenting sources. Yggdrasil's hierarchical sources model dovetails perfectly into XML modes of representation and we were able to leverage this in generating xml data driven work flows for populating the "branch" levels of Yggdrasil source trees. However Yggdrasil simply provides a blob of text to be used as required for each source. Our general experiences with the research domains of interest to which AP20 has applied (Convicts, Diggers, Koori Health) strongly lead us to believe that we need to do everything possible to get away from raw "text", either in web forms or spreadsheet cells, as a mode of data capture. This is of course an exceedingly difficult problem to solve without lots of custom programming.

Towards the end of the project we were able to attend the International Semantic Web Conference in Sydney:

   http://iswc2013.semanticweb.org/content/program-friday

A number of papers and posters at that conference were extremely relevant to providing practical ways forward for solving this and other problems. Of particular note for our needs were:
  • ActiveRaul which automatically generates a web-based editing interface from an ontology http://iswc2013.semanticweb.org/content/demos/30
  • PROV-O, an ontology for describing provenance: http://www.w3.org/TR/prov-o/Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. See explanatory material here: https://wiki.duraspace.org/display/VIVO/Prov-O+Ontology
  • the collaborative redevelopment of ICD11 (International Cause of Death coding system) using Web Protege: http://link.springer.com/chapter/10.1007%2F978-3-642-16438-5_6#page-1
We are hopeful that ActiveRaul could provide a workable approach to providing editing services for ontology based data fragments such as specific research data capture needs associated with the existing Drupal data entry form of Prof McCalman's ships research project.

At the conference we also encountered many successful domain specific examples of where semantic technologies had been used to interlink, search and build innovative services across disparate sources of data. We believe this approach is a fertile way forward to solving a specific problem of better sharing and exchanging data with our collaborators such as Tasmanian Archives and Heritage Office and the Female Convicts Research Collective in Hobart. It is entirely feasible to see how an overarching ontology for prosopography, customised for the convict system, would enable each group to publish some RDF in accordance with that ontology and to have a Sparql endpoint to enable federated search across the multiple databases. We believe this approach can help us solve problems of collaborative matching and data exchange whilst enabling each party to continue with the data management practises which best suit their own needs. 

The data integration and portal like capabilities we have developed in Yggdrasil, and its existing deployment in Nectar/Amazon Web Services cloud environments mean it is well placed to evolve as a user interface to support this kind of capability. As we proceed with the Convicts and Diggers domain we will try to evolve a suitable ontology to assist us move in this direction.

Contributors