Shop Notes
Learning how to make the most of our data at NICAR
Posted: March 1, 2014 | Tags: campaign finance, Federal Election Commission, Investigative Reporters and Editors, journalism, National Institute for Computer-Assisted Reporting, NICAR, reporting, Workshop news
Reporters are increasingly using data to find stories. At NICAR's annual conference, where 1,000 journalists are gathering in Baltimore this year, reporters, editors, programmers and datavisualization experts are sharing best practices for acquiring databases or building their own.
A session on Federal Election Campaign (FEC) data, led by Aaron Bycoffe, Jack Gillum, and Chris Schnaars, focused on how to gain access to the publicly available but highly obfuscated records that show who donated what to political campaigns. The speakers said just 150 people accounted for $810 billion worth of donations to Super PACs in the 2012 presidential race. While these contributions are legally required to be public, they are not required to be easily accessible. Senate campaign contributions are exempt from the FEC's mandatory electronic-filing rules, allowing them to print their filings and turn in paper copies to the FEC. Then the FEC pays an outside contractor to digitize them, ultimately wasting money and significantly prolonging the time it takes for contributions to Senate campaigns to be made public.
But getting access to the data is only half the battle. What do you do with hundreds of pages of information spread out across dozens of pages online? You could make an intern manually enter it all into an Excel spreadsheet. Or, as Tommy Kaas and Nils Mulvad demonstrated in Friday's class, you can use a data-mining program such as Helium Scraper. The program recognizes "kinds" of information that should belong in the same column in a spreadsheet. This process only takes a few minutes, and you can then select the "kinds" you want mined into a comprehensive spreadsheet. In 50 minutes, Kaas and Mulvad showed the class how to scrape this and this into single spreadsheets.
If you're interested in data-driven journalism, check out some of the tipsheets and handouts that are being passed around on the IRE website and on Twitter (#NICAR14).