Tim Berners-Lee gave a talk on Linked Data at TED this year. The video is embedded below but if it’s slow or it doesn’t work at all, the original is here.
Similar to the way he wanted documents uploaded to the Web and linked together 20 years ago, he now wants us to do the same thing with our data.
The video is 16 minutes long. If you don’t have that much time, watching from 3:58 to somewhere around 8 minutes is enough to get an overview. More after the jump.
Linked data is a big, big, big, big problem. There are mountains of public data produced by national statistics offices, central banks and supranational organizations. There is even more produced by trade associations and research organizations in specific verticals, not to mention the private sector and individuals.
It’s impossible to comprehend the scale of sources and types of data, and what might become possible if it were all linked together on the Web, but it’s an exciting prospect. As Tim points out, the value of data is its relationships with other data. So much is so difficult now because it’s impossible to connect or even discover all of the data we need.
A lot of useful data is available on the Web but it’s buried in online databases, spreadsheets and text documents. This destroys its value by making it difficult and therefore expensive to find.
There are already several community-driven projects trying to seed a World Wide Web of data including DBpedia, as mentioned by Tim in his talk. There is also at least one company trying to promote sharing data and that’s Swivel. It has created a platform for users to upload and comment on data and has also made some headway enlisting ‘official sources’ such as the OECD as contributors.
Interestingly, Amazon Web Services has also created a public data resource that applications built on its cloud computing platform can access for free.
Linking data will obviously be a huge undertaking and my guess is that some familiar copyright issues will arise as users move faster than organizations to create what they want with whatever data they can access.
However, unlike Web pages it might not be as simple as users or organizations publishing linked data, at least for professional uses. Unlike Web pages that can be read and understood in context, data is supported by definitions and methodology.
Data and metadata obtained from multiple sources need to be standardized for accurate and meaningful analysis and it’s not a static process as source methodologies can change independently over time.
It will be very interesting to see how ontologies, folksonomies and other elements of data standardization and discovery fit together in a World Wide Web of data.
{ 0 comments }
Posted on March 16, 2009
My name is Phillip Baker and this is my personal blog about finding value in a world of free information.