The UBC Graduate School of Journalism has joined the DocumentCloud project designed to help share, find and deal with the mounds of documents that investigative reporters unearth.
It is the first Canadian journalism school to sign on to the non-profit project funded in 2009 by a two-year grant from the Knight News Challenge.
The project is designed to build an open-source platform to make original source documents easy to find, share, read and collaborate on, anywhere on the web.
“DocumentCloud will be a powerful new tool designed to bring document-based reporting into the computer age. Working with journalism schools has a couple of huge benefits for us,” said project co-founder Aron Pilhofer, and editor of interactive news technologies at The New York Times.
“First, these are students who, for the most part, have grown up in an era of ubiquitous computing and are the most willing to adopt new technological solutions to reporting problems. They are the early adopters we want to reach, and we believe they will be able to provide valuable insight about ways we can improve the tool and make it more useful.”
“But more than that, journalism schools are one of the few remaining places where investigative journalism is flourishing. As traditional news sources cut and cut, we are seeing journalism schools across the country starting to pick up the slack, and produce some really outstanding work that deserves wider distribution. We believe DocumentCloud can help do just that.”
The software tools under development analyze documents and parses the information to identity names, places, dates and other words and phrases that establish the meaning of a body of text in a process described as entity extraction.
This is the process of automatically extracting document metadata from unstructured text document “to not only improve keyword search but also open the door to semantic search, faceted search and document repurposing.”
The power of the platform increases as more partners sign on as it acts as a store of documents from diverse news organizations.
More than 40 organizations are involved in the project to contribute documents and test the first iteration of the software. They include The New York Times, ProPublica, The Washington Post, New Yorker, MSNBC, National Public Radio, PBS NewsHour and the Vancouver Sun.
Among the other academic units involved are The Centre for Investigative Journalism at City University, London, and The Investigative Reporting Workshop at American University.
Pilhofer demonstrated an alpha version of the software in October at the Online News Association annual conference. DocumentCloud expects to release a public beta in March 2010.
(Photos of Aron Pilhofer courtesy of lite)