So, those Oystercard outages. I wrote a sizable post on this immediately before going on holiday, but something odd happened with WordPress’s clever ajaxy bits and it vanished. Computers…anyway, we can work out various things about the problem from the few details supplied.
In the first incident, around 1% of the cards somehow became nonfunctional. We don’t know how; we do know, however, that it was indeed the cards, because the fix was to bring them in and issue new ones. This raises an interesting question; why did new physical cards have to be issued? The process of issuing a card involves writing the data TfL holds on you to the blank card; there isn’t much difference between this and overwriting whatever is on the card with the details held in the database. This suggests either that the affected cards suffered actual physical damage – unlikely, unless someone’s running about with a really powerful RF source and a bad sense of humour – or else that TfL can’t trust the information on file, and therefore needs to erase the affected records and set up new user accounts.
So, how could it happen? Card systems can work in various ways; you can do a pure online authorisation system, like debit or credit cards, where information on the card is read off and presented to a remote computer, which matches it against a look-up table and sends back a response, or you can do a pure card system, where your credit balance is recorded on the card and debited when you use it, then credited when you pay up. Or you can have a hybrid of the two. Oyster is such a hybrid. TfL obviously maintains a database of Oyster user accounts, because it’s possible to restore lost cards from backup, to top-up through their Web site without needing a card reader, and to top-up automatically. But it’s also clear that the card is more than just a token; you can top up at shops off-line, and the transaction between the card and the ticket barrier is quick enough that you don’t need to break stride (consider how long it takes to interact with a Web site or use a bank card terminal).
Clearly, the actual authorisation is local (the barrier talks to the card), as is offline top-up, but the state of the card is backed up to the database asynchronously, and changes to your record in the database are reflected on the card, presumably as soon as it passes through a card reader. To achieve this without stopping the flow of passengers, I assume that when a card is read, the barrier also keeps the information from it in a cache and periodically updates the database. Similarly, in order to get online top-ups credited to the cards, the stations probably receive and cache recent updates from the database; if the card number is in the list, it gets an “increment £x” command.
We can probably rule out, then, that 1% of the Oyster card fleet were somehow dodgy when they started to flow through the gatelines that morning, and that the uploaded data from them caused the matching records to become untrustworthy. It’s possible – just – that some shops somehow sporked them. It’s also vaguely possible that bad data from some subgroup of cards propagated to the others. But I think these are unlikely. It’s more likely that the batch process that primes the station system with the last lot of online and automatic top-ups went wrong, and the barriers dutifully wrote the dodgy data to the cards.
This is also what TfL says:
We believe that this problem, like the last one resulted from incorrect data tables being sent out by our contractor, Transys.
People of course think this was somehow connected with the NXP MiFare class break, but it’s not necessary.
In this scenario, some sort of check incorporated in the database was intended to detect people using the MiFare exploit (probably looking for multiple instances of the same card, cards that didn’t appear in the database, or an excess of credit over the cash coming in), but a catastrophic false positive occurred. This is a serious lesson about the MiFare hack, and about this sort of public-space system in general; the effects of the security response may well be worse than those of the attack. Someone using a cloned, or fraudulently refilled, card could at best steal a few pounds in free rides. But the security response, if that was what it was, first threatened a massive denial-of-service attack on the whole public transport system, and then caused TfL to lose a whole day’s revenue.