Talking data

Let the data speak

It seems a reduction in complexity, when someone says: let the facts speak. We are invited to move away from the complex world of opinions, interpretations, conjectures, and comments on conjectures to the plain and simple world of facts, outlined by numbers and yes/no statements.

There are many objections against this artificial division, as anybody knows who has tried to make sense of a new set of primary data. There is the statistical analysis, the modelling, the choice of representation, and the addition of metadata in order to explain the significance and value of the data. All these issues point from the realm of plain facts to the realm of complex human understanding.

So when the data speak, they speak a language, and we have to deal with that language. Sometimes quite literally: when the data in question are language data, recording how we, humans, speak. This gives rise to a whole set of new challenges: listening to the data. What do you do if you have sound recordings of a thousand different languages?

Yet, there is another layer to the picture, and that is where the data are meaningful representations of human culture. Texts are the prime example, texts that discuss the events of history, the works of art, the states of mind, the structure of knowledge. In order to research texts, three kinds of language must be dealt with: (i) the language of data, i.e. the problem of appreciating a text among the family of all its variations, with a critical reflection on the origins of that family; (ii) the language of language, i.e. the world of spelling variation, part-of-speech analysis, syntactical analysis, up to the first stages of semantics; (iii) the language of meaning, i.e. the world of concepts and relationships as studied by the many disciplines of the humanities and social sciences.

To me, the excitement is often in the middle. Things always happen at boundaries. Dealing with the language of language amounts to building bridges between the high-level world of human discourse and the low-level world of data and computing. The CLARIN project explicitly takes this position. With the preparatory stage almost finished, it already has created a lot of action. And I am excited that here, in the Netherlands, the funding is in place to carry on.

Here at DANS, we are involved in quite a number of projects with a CLARIN connection. We are not linguists ourselves, our natural language is the language of data. But we are aware that in order to give an audience to the data, we have to facilitate the language of language. So, we will work on the machinery required for language resources, doing the dreary details. And already now we see some projects doing the next step: speaking the language of meaning after listening to the language of language.


Remco van Veenendaal said...

Thanks DANS and Dirk for starting this blog! It may prove to be a valuable resource for discussion about (language and text) data.

Making sure facts speak and (continue to) have a listening audience is one of the main tasks of the Institute for Dutch Lexicology (INL; and in particular the Flemish-Dutch Human Language Technology Agency (TST-Centrale;

The mission of the TST-Centrale is to manage, maintain and make available Dutch digital Language Resources for research, education and commercial purposes. The Dutch language has to keep playing a key role in the information society.

We are also actively involved in the CLARIN project ( and e.g. close colleagues of DANS, the Meertens Institute and the Max Planck Institute in Nijmegen – all “CLARIN Centers” in the (Dutch branch of the) CLARIN infrastructure.

We – and in particular our (computational) linguists – are mostly involved in Dirk’s “language or language”, but our work also touches the “language of data” and the “language of meaning”. It would be great to have more comments from these perspectives and look at language and text from all sides.

The TST-Centrale is an initiative of, is funded by the Dutch Language Union (Nederlandse Taalunie; and housed at the INL.

Sun Ivey said...

It takes seconds to find the document which you are looking for. It is very convenient for a customer to access the service 24/7. Accessibility and durability are last but not the least advantages of a VDR. The world goes digital as well as information does. So check out vdr data room.

Hope Santini said...

MBA student and part time Research Associate at Oxford Brookes. I am looking to join a community that has a profound interest in entrepreneurship My areas of interest are classic management theory, the virtual classroom and online education delivery in developing countries.
MBA Dissertation Topics | Marketing Dissertation Topics

Sophie Grace said...

Interesting information and cute writing style.This is a cool read .. 192.168.l.l

Melissa Falbo said...

M&A Tools as an online, secure programming as-an administration. Worked in business process modules address each aspect of mergers, acquisitions, m&a software solutions and change activities.

Johan Rock said...

Thank you Admin for the information. I am very happy to read it because it is very interesting. Marketing Dissertation

Johan Rock said...

Excellent knowledge Providing by your Article, thank you for taking the time to share with us such an excellent article. I want to share with you a big discount offer Click Here for this offer

Brainbox said...

We are glad to be the ONLY dental office in Dental Center in Reston. Our area makes it simple for you to consolidate a dental encounter with your other day by day exercises.

Grabby Johan said...

What makes the team stronger and firmer? The trust within the members and the dedication through which they gain the confidence of the ones they are working for. It doesn’t matter how many times our clients ask us for help. We provide them with the best online academic writing services.

Thomas William said... website is a wonderful option for all those students who are looking help in completing their Entrepreneurship Homework Help. This assignment help website have a large team of assignment experts and professionals who help you complete your assignments in various subject matters.

Post a Comment