Shop Notes

Learning how to make the most of our data at NICAR

Posted: March 1, 2014 | Tags: campaign finance, Federal Election Commission, Investigative Reporters and Editors, journalism, National Institute for Computer-Assisted Reporting, NICAR, reporting, Workshop news

Reporters are increasingly using data to find stories. At NICAR's annual conference, where 1,000 journalists are gathering in Baltimore this year, reporters, editors, programmers and datavisualization experts are sharing best practices for acquiring databases or building their own. 

A session on Federal Election Campaign (FEC) data, led by Aaron Bycoffe, Jack Gillum, and Chris Schnaars, focused on how to gain access to the publicly available but highly obfuscated records that show who donated what to political campaigns. The speakers said just 150 people accounted for $810 billion worth of donations to Super PACs in the 2012 presidential race. While these contributions are legally required to be public, they are not required to be easily accessible. Senate campaign contributions are exempt from the FEC's mandatory electronic-filing rules, allowing them to print their filings and turn in paper copies to the FEC. Then the FEC pays an outside contractor to digitize them, ultimately wasting money and significantly prolonging the time it takes for contributions to Senate campaigns to be made public. 

But getting access to the data is only half the battle. What do you do with hundreds of pages of information spread out across dozens of pages online? You could make an intern manually enter it all into an Excel spreadsheet. Or, as Tommy Kaas and Nils Mulvad demonstrated in Friday's class, you can use a data-mining program such as Helium Scraper. The program recognizes "kinds" of information that should belong in the same column in a spreadsheet. This process only takes a few minutes, and you can then select the "kinds" you want mined into a comprehensive spreadsheet. In 50 minutes, Kaas and Mulvad showed the class how to scrape this and this into single spreadsheets. 

If you're interested in data-driven journalism, check out some of the tipsheets and handouts that are being passed around on the IRE website and on Twitter (#NICAR14). 

Recent Posts

Many Facebook users unaware of how to control their newsfeeds, adjust privacy settings

A new Pew Research Center survey finds younger adults more privacy-conscious, and older ones less aware of the control they have of their newsfeeds on Facebook. Many were aware of the Cambridge Analytica data breach.

30,000 across from White House demand 'families belong together'

The Trump administration missed the first of two deadlines to reunite children separated from their parents. The action came after public backlash mounted, and a federal judge issued a nationwide injunction to reunite families separated at the U.S.-Mexico border within 30 days. More than 30,000 people rallied in Washington on June 30 to demand the Trump administration reunite families immediately.

Newspaper circulation plummets — again

Newspaper circulation plummeted again last year, following the trend of decreased distribution since the early 2000s, according to a new Pew Research Center report released Wednesday.

 Subscribe to the RSS Feed



Follow the workshop at IRWorkshop