The Data Liberation Project recently launched, making government data more available and useful for the general public.
Jeremy Singer-Vine, a data journalist who formerly worked as a data editor for eight years at Buzzfeed News, rolled out the new website earlier this fall as a result of his own frustrations as a data journalist.
After several years of working on data-based investigative stories, Singer-Vine said he was looking for “the next challenge.”
“I’m slowly looking for funding for the project, and in a dream world, it would be enough funding to have a team of people working on this,” he said, “because especially once the FOIA pipeline really gets going and certain scraping projects get spun up, it should and would be more than a one-person, full-time job.”
This project, he said, is a version of something he’s thought about pursuing for several years because of the gaps he saw in the accessibility of data.
“In the process of our work, we do put in a lot of effort to obtain datasets, for example, through FOIA from the government,” Singer-Vine said.
But he said some of the datasets journalists receive through the Freedom of Information Act are “not useful at all.” Or the data is helpful, but not relevant to a story in progress. His three goals are to
- Make high-quality government data available to journalists, researchers and the general public
- Develop an understanding of what works and what doesn’t in this particular field of records and data publishing
- Be a resource for those who want to pursue public-records requests
So far, Singer-Vine has requested five data sets per month since launching the project.
In September, for example, he requested data on public housing inspections, disaster housing from the Federal Emergency Management Agency, radiation exposure compensation claims, student loan data and theft and loss of controlled substances.
He provides frequent and detailed updates on the website, keeping readers informed on every step of the request process. For this request, which he sent to the Drug Enforcement Administration on theft and loss of controlled substances, he provides readers with all of the information he already has about that dataset.
He then describes how the public would benefit from the disclosure of these records. In this case, the disclosure of such records would “allow the public to identify important trends in controlled substance theft/loss that could help identify problems in their communities, as well as to hold the DEA accountable,” Singer-Vine said.
And for readers who want to know exactly how the dataset was requested, Singer-Vine provides links to his original FOIA requests.
Such requests are related to federal data and records, but every state has a public records-request system as well.
Singer-Vine said he started with five specific requests because he wanted to begin with federal data that could be localized or put in a local context.
He noted that there were “obvious things” he was looking for in the data he requested, including relevance.
“I started trying to think about data sets that are federal in scope, but very localizable,” he said, “so that lots of different people in any part of the country more or less, would find something relevant to them in their communities.”
In October, Singer-Vine made five more requests, relating to Department of Veterans Affairs’ supportive services for veteran families program, federal inmate complaints from the Department of Justice, the risk management program database from the Environmental Protection Agency, Office of Refugee Resettlement records relating to the care of unaccompanied children and a Customs and Border Protection Agency database.
While the records requests are in progress, Singer-Vine plans to begin web scraping and document parsing.
The two methods are useful because he’s able to take information that is technically public, but effectively inaccessible because of “either how widely strewn it is or that it’s formatted as PDFs or that there are general technological barriers,” he said.
Singer-Vine works as a consultant and freelance data editor based in New York City while devoting as much time as possible to the Data Liberation Project on the side.
He got his start in Journalism at Slate Magazine before moving to the Wall Street Journal, where he worked as a reporter/programmer.
He moved to Buzzfeed News in 2014, where he served as the founding data editor, running a team of reporters that combined traditional reporting skills with data analysis.
Singer-Vine left Buzzfeed News in early 2022 to pursue creating the Data Liberation Project.
Since then, he’s continued publishing his popular journalism newsletter called Data is Plural, in which he compiles various useful, unique or interesting datasets on a weekly basis. The newsletter now has just over 29,000 subscribers.
The most recent newsletter featured datasets on work stoppages, avian flu detections, social media suppression, literature prize-winners and video games.
He was a 2014 Pulitzer Prize Finalist for his work with colleague John Emshwiller at The Wall Street Journal on the nation’s often overlooked factories and research centers that once produced nuclear weapons and now pose contamination risks.
Singer-Vine said projects such as Big Local News at Stanford University, which aims to make it easier for local journalists to access public records about policing, public health, government and other vital topics, is among the many that have sprung up in recent years.