Talking data

Sustaining the richness of the humanities

At Digital Humanities 2010 I recorded quite a bit of focused attention to sustainability. My own contribution was a challenge to think differently about how sustainability is likely to be realised, using ideas from biological evolution and multi-agent games. In the digital world, every access involves the making of a copy. If we could tweak things in such a way that all these copies positively contributed to the preservation of the corresponding resource, then we would wield tremendous power for the good of preservation. The abstract is online and expect the presentation there as well, shortly.

In several other lectures I heard two very different approaches to sustainability. In 
The Specimen Case and the Garden: Preserving Complex Digital Objects, Sustaining Digital Projects Lewis Ulman eloquently advocated an approach that I have always found right, but lately I have begun to doubt it. In (very) short: a humanities project typically delivers a rich interconnected set of materials in many media, accessible through a (web)-interface that let you explore everything in its connectedness. The web-interface is likely to die off at some time, and the best you can do is to document all relationships, store the documentation with the data, so that posterity will not have much trouble to redevelop an interface for the web of materials.

What is wrong with this? Well, it is not really wrong, in fact it is infinitely better than taking no particular measures, but what I do see happen in the distant future is this: a researcher issues a query, finds among his search results an XML document from the project's frozen result. He wants to explore the related materials of the project, but there is no interface. He sees that there is documentation how to build a web interface, but the documentation refers to obsolete architectures and systems, and the researcher is not in a position to dedicate that kind of effort needed to do it. On to the next search result ...

Can we do better?

At least two talks pointed to an other way. In 
The Open Annotation Collaboration: A Data Model to Support Sharing and Interoperability of Scholarly Annotations Jane Hunter proposed a new way to handle annotations: OpenAnnotation. A web based, collaborative, transparent, explicit, standardized paradigm to represent scholarly annotations to just any resource that is representable on the web, be it pieces of texts, sections of images, cuts of videos, or sets of other resources.
And in the same room, Peter Robinson and Federico Meschini presented 
Works, Documents, Texts and Related Resources for Everyone and made proposals that might help the paradigm of the Semantic Web and Linked Data come true in the library world.

Now reconsider the typical output of a humanities project. If we manage to do the annotations in the Open Annotations way, and if we manage to express the relation web through the formalisms of Linked Data, then we do not need special purpose interfaces for the output materials. These formalisms are built directly on top of mainstream web technology, but are more basic than the formalisms found in specific disciplines. As such, I expect that at this level we'll see a new layer of infrastructure emerging. Part of that infrastructure will also be the facility of temporal browsing, i.e. directing your browsing to points in the past, in order to surf the archived web at that point in time. See 
the memento idea by Herbert van de Sompel and others.

There will be very generic interfaces based on Linked Data, and these interfaces are flexible enough to absorb new relationships and datastructures.

The data is now freed from the interface, the interfaces are (meta)data driven, and our future researcher has an instant view on the richness of an old resource.

One challenge remains, in my opinion: how do we make sure that this new infrastructure is really sustainable? I admit that I have only a clouded vision on this: let there be a new kind of cloud, with workspaces for users that collect works that interest them. Let the cloud be smart, in that it implements the new infrastructure. Let the cloud be efficient in that it optimises the copies of works for storage economy, but also for access economy. And let the cloud be honest and creative in dividing the costs over all workspaces, charging users for storage, but also paying users for their contribution to the preservation of the works they copy.

This is truly revolutionary: to be paid for storing a copy of a preservation worthy work.