Sunlight Foundation

DoD correspondence log converted from pix to spreadsheets

I'm posting, in an Excel spread sheet, the congressional correspondence logs covering the first three months of 2007 that we got a while back in a less than user friendly format from the Office of the Secretary of Defense. Here's a sample of what we got in response to our FOIA -- a .tif or tagged image file format -- I picked one at random, but we have a CD-Rom with 189 files just like it.

Anu turned the files over to Scott Wells, our multi-talented office administrator, who used a program called ocrad (it runs on Linux) to convert it to a text file, which Anu posted here. Here's a sample of what the converted .tif files looked like:

OSD CONTROL NUMBER: OSD 03257-07 DOCUMENTTYPE: INCOMING DOC: 2128/2007 DOR: 31212007

FROM: uss LEVIN, c TO: SECDEF

SUBJECT: REauEsT FOR NUMBER OF IRAal INDIVIDUALS WHO HAVE HELPED THE u.s. SUSTAIN AND MANAGE ITS PRESENCE IN IRAa

AGENCY: JCS TASK: PRS SUSPENSE: 3/1312007 ACD:

FILE NUMBER: IRAa

OSD CONTROL NUMBER: OSD 03288-07 DOCUMENT TYPE: INCOMING DOC: 212812007 DOR: 3/2/2007

FROM: uss VOINOVICH, G TO: SECDEF

___'_BJEIT_CLAIM AGAINSTTHE FEDERAL GOVERNMENT FOR COSTS INCURRED AS A RESULT OF A TERMINATION OF CONTRACT _

AGENCY: SA - TASK: RD SUSPENSE: 3113/2007 ACD: 3/13/2007 _

FILE NUMBER: 160

OSD CONTROL NUMBER: OSD 03443-07 DOCUMENT TYPE: INCOMING DOC: 212812007 DOR: 3/6/2007

FROM: uss CANTWELL, M TO: LA

SUBJECT: REauEsT YOUR SUPPORT IN EXPEDITING MY INVESTIGATION _

AGENCY:SA TASK:RD SUSPENSE: ACD:

FILE NUMBER: T-

Not perfect, but at least digitized and searchable. But still, somehow unsatisfying. I fooled around with the text file and was able to convert it to a tab delimited form (all those years coding agate at the Philadelphia Inquirer really came in handy).

Now, a few explanations. There are three sheets on the spreadsheet. Sheet one is the cleanest version of the data with some value added fields, sheet two has every field--the enhanced ones and the original ones, and sheet three has only the original fields from the text file. I ended up ignoring some fields (in part because DoD stopped sending them to us in response to subsequent requests, and in part because we've been unable to learn from DoD what those fields mean--the ones I didn't really touch were Agency, Task, Suspense and File Number). There's also a pair of columns called "Extra One" and "Extra Two" -- some of the data got bumped further to the right, but it was hard to tell which column to assign the extras to.

There's also some very messy data. For example, there's a lot of garble like this, WOULD LIKE7a REIOhR_REPID_R POSITION IN DOD,

and this OSD "j2o5-07. The latter is from the OSD control number field, which one could use in a freedom of information request to more easily get a copy of the actual letter to which it refers. Those numbers are supposed to look like this: OSD 00136-07.

Now, to get this data into better shape, I need to go back to those .tif files, print them all out (there are 189 of them) and painstakingly go through them, comparing each page to the corresponding line in the spread sheet, fixing garble and double checking numbers, names and dates.

Now, the really ridiculous thing about all this is that the Office of the Secretary of Defense keeps its records in a form not dissimilar to the one that we've managed to put together here. To respond to our FOIA request, someone at DoD (probably a contractor) printed pages from a database, which they then turned into .tif files, which they then copied onto a CD-Rom, which they then sent to us. I'll have more to say about that aspect of this later.

Search the Blog

Related Content

Popular tags

2012 election 2012 elections 2013 Inauguration Ad Ad Hawk Ad Hoc AIG american crossroads Arab Spring Barack Obama BP budget Campaign contributions Campaign Finance Center for Responsive Politics Citizens United consumer banking Contracting Conventions2012 Correspondence crossroads GPS dark money Data Mine datamine debt ceiling Disclose act Distributed Research Dodd-Frank Earmarks Election 2012 Elizabeth Warren FARA FCC FDA FEC Federal Election Commission Fellows Finance Data Catalog Financial Bailout Financial Reform FLIT FOIA follow the unlimited money Foreign lobbying Foreign Lobbying Influence Tracker freshmen Fundraising Guns Handy Tools health care Hoc House House Freshmen 112th House Majority PAC Immigration Independent Expenditure Independent expenditures influence Influence Explorer investment James Bopp Jr. Lobbying lobbying tracker Logs_6553 Majority PAC Mark Sanford Market Meltdown Media Medicare meeting logs Mitt Romney National Rifle Association Newt Gingrich nonprofits NRA obama OGD Open Government Directive Orrin Hatch outside spending Party Time PMA Group political ad sleuth Political Party Time Politwoops President Obama Priorities USA Action Recovery Recovery.gov Rep. John Murtha Research Restore Our Future revolving door Rick Perry Rick Santorum Romney Ron Paul Sen. Christopher Dodd Senate Sheldon Adelson states of transparency Stealthy Wealthy stimulus Sunlight Live super committee super congress Super PAC super PAC profile Super PACs supercommittee Supercongress supreme court TARP Taxpayers for Common Sense transparency