Talking data

Sustaining the richness of the humanities

At Digital Humanities 2010 I recorded quite a bit of focused attention to sustainability. My own contribution was a challenge to think differently about how sustainability is likely to be realised, using ideas from biological evolution and multi-agent games. In the digital world, every access involves the making of a copy. If we could tweak things in such a way that all these copies positively contributed to the preservation of the corresponding resource, then we would wield tremendous power for the good of preservation. The abstract is online and expect the presentation there as well, shortly.

In several other lectures I heard two very different approaches to sustainability. In 
The Specimen Case and the Garden: Preserving Complex Digital Objects, Sustaining Digital Projects Lewis Ulman eloquently advocated an approach that I have always found right, but lately I have begun to doubt it. In (very) short: a humanities project typically delivers a rich interconnected set of materials in many media, accessible through a (web)-interface that let you explore everything in its connectedness. The web-interface is likely to die off at some time, and the best you can do is to document all relationships, store the documentation with the data, so that posterity will not have much trouble to redevelop an interface for the web of materials.

What is wrong with this? Well, it is not really wrong, in fact it is infinitely better than taking no particular measures, but what I do see happen in the distant future is this: a researcher issues a query, finds among his search results an XML document from the project's frozen result. He wants to explore the related materials of the project, but there is no interface. He sees that there is documentation how to build a web interface, but the documentation refers to obsolete architectures and systems, and the researcher is not in a position to dedicate that kind of effort needed to do it. On to the next search result ...

Can we do better?

At least two talks pointed to an other way. In 
The Open Annotation Collaboration: A Data Model to Support Sharing and Interoperability of Scholarly Annotations Jane Hunter proposed a new way to handle annotations: OpenAnnotation. A web based, collaborative, transparent, explicit, standardized paradigm to represent scholarly annotations to just any resource that is representable on the web, be it pieces of texts, sections of images, cuts of videos, or sets of other resources.
And in the same room, Peter Robinson and Federico Meschini presented 
Works, Documents, Texts and Related Resources for Everyone and made proposals that might help the paradigm of the Semantic Web and Linked Data come true in the library world.

Now reconsider the typical output of a humanities project. If we manage to do the annotations in the Open Annotations way, and if we manage to express the relation web through the formalisms of Linked Data, then we do not need special purpose interfaces for the output materials. These formalisms are built directly on top of mainstream web technology, but are more basic than the formalisms found in specific disciplines. As such, I expect that at this level we'll see a new layer of infrastructure emerging. Part of that infrastructure will also be the facility of temporal browsing, i.e. directing your browsing to points in the past, in order to surf the archived web at that point in time. See 
the memento idea by Herbert van de Sompel and others.

There will be very generic interfaces based on Linked Data, and these interfaces are flexible enough to absorb new relationships and datastructures.

The data is now freed from the interface, the interfaces are (meta)data driven, and our future researcher has an instant view on the richness of an old resource.

One challenge remains, in my opinion: how do we make sure that this new infrastructure is really sustainable? I admit that I have only a clouded vision on this: let there be a new kind of cloud, with workspaces for users that collect works that interest them. Let the cloud be smart, in that it implements the new infrastructure. Let the cloud be efficient in that it optimises the copies of works for storage economy, but also for access economy. And let the cloud be honest and creative in dividing the costs over all workspaces, charging users for storage, but also paying users for their contribution to the preservation of the works they copy.

This is truly revolutionary: to be paid for storing a copy of a preservation worthy work. 


Charles van den Heuvel said...

Hi Dirk
Interesting. I am not so sure yet whether it would be an advantage "to be freed from interfaces" as Peter Robinson stated. I am very much in favor of the proposed architecture of the Open Annotation Consortium. Really great stuff, but the smoothless integrations should be in the back end in my view. I believe that it is still relevant for the user to have an interface to allow she/him to select specific "filtered" information to avoid overkill that is not relevant for the research in question.

Dirk Roorda @ DANS said...

Good point. At least the improved back-end frees us from the nightmare of preserving those interfaces indefinitely.

Lewis said...

Hi, Dirk. Thanks for tying so much of the work on sustainability together in your post! Note that Melanie and I proposed a two-pronged approach. The one you describe (the specimen case that preserves a particular historical instantiation of a project) and a more evolutionary approach that enlists institutional curators who manage inevitable change over time. Underlying all of the choices you describe in your post, I suppose, are choices about what is being sustained or preserved. Literary scholars are interested in works, texts, editions, book arts, and literary culture, and different sorts of evidence are needed to construct interpretations of each. Some scholars may indeed want to know something not only about the interpretations recorded in a TEI transcription or Open Annotation but also about the physical systems by which those interpretations were presented to and received by readers. For that, we need the "specimen case" approach. If we want to sustain the normal development of DH projects, we will need to provide some mechanism by which they can continue to respond to changing conditions after their original developers are no longer involved -- and you have noted several of the approaches currently on the table. Your use of the metaphor of evolution is instructive. Evolution is simply a mechanism that life on Earth -- in general -- developed that allows it to persist in the face of changing environmental conditions. But the analogy only serves us so far. In the long run, evolution hasn't sustained individuals or species; it has provided enough variation in the gene pool broadly conceived to allow life to diversify -- and therefore persist -- in response to changing conditions. Evolution has yielded human behaviors that have allowed us to sustain ourselves so far, but in order to understand how we got here, we have had to turn to fossil records that reveal previous states -- and we have those records only because of some physical processes that by chance preserved some of them for us. Paleontologists would love to have a more complete record, even if it wouldn't allow them to reanimate early hominids :-).

Post a Comment