A Project Lobster progress report!

So I completely forgot I needed to register for OKFN’s Open Interests Europe hackathon last weekend, which even had a lobbying track, and just round the corner from the office, too.

I decided to have my own lobbying hackathon by eating pizza and caffeine pills and being misogynistic spending my weekend finishing the Lobster Project’s analytics scrapers for ministers and lobbies respectively. I abandoned the plan of generating NetworkX objects and storing them in the database for later use in favour of directly generating them and reading out the metrics, and dealing with the performance hit by writing slightly less horrible code.

Specifically, I decided to optimise for fewer calls to the database API. Memoising the rankings function cuts its usage from two calls a meeting to 82 for the first month, plus any future changes, and storing the cache itself means that only new combinations of ministers and titles generate a query in future runs. Getting all the lobbies for the month in one query, and then processing them in Python using itertools, replaces one query for each meeting with one admittedly complex query per month and a small function.

This still took far longer than I expected to run, but then I realised there was more data.

Anyway, they work and they are generating results by month, so we will be able to draw nice time series charts, up to September 2011. Unfortunately, the ScraperWiki datastore is doing something quite weird – replacing float values with nulls or zeroes – and although I thought I might have fucked up type declarations, pragma tells me that the column types are what they ought to be. So I’ve got a query outstanding with the ScraperWiki folk.

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>