Spam is a political issue

Josh Marshall’s Talking Points Memo and this blog go way back. So I’m disappointed in this. Specifically, I am concerned about the exact service the drug pushers are paying Marshall for.

OK so. Basically everyone searches the Web. A lot of people who consume news get it from search-driven sources like Google News. Typically something like 99 per cent plus of search traffic goes to the first results page, and 90 per cent of that to the top three hits. So placing your message there is valuable. A whole spam industry exists to this end.

Search engines in general, and Google in particular, war endlessly against this practice. The most recent example of this was when the Google basically hit Demand Media, the folk who brought you “How to cough up mucus” and “one weird trick”, with the Internet death sentence.

This is important because Demand Media’s business model was basically to fill the Web with crap that matched a lot of keywords, so as to occupy that space on the results page and sell ads. (Hilariously, they discovered that relevant content was actually bad for business, because people read it rather than clicking on ads. It’s like the opposite of this blog.)

I think a lot of astroturfing is basically the same idea – trying to inject your propaganda into search results, and specifically news search results. This is where TPM comes in. Getting your crap onto a page in the domain immediately gets it googlejuice, and also defeats the filtering. If Marshall means any of his excuses, he’ll set a robots.txt line to exclude the advertorial from search.

As this would basically render it worthless, I’m not holding my breath, although I will be checking in on now and then. And I do think Google should treat advertorial as content-farming for the purposes of search integrity if it wants to be at all consistent. Yes, this is a political decision, but then zapping Demand Media and ignoring the Chamber of Commerce is also a political decision.

Ho hum. “I saw the finest minds of my generation trying to make you click on ads. That sucks”, indeed. How’s that Color app doing?

Update: If you want to monitor this: curl | zgrep idealab-impact

Get involved.

So, the Sun is on the move. This was a bit of a surprise to me, what with News reorganising to separate the UK papers from Sky. You might think the papers look like a cost centre without the TV assets, or that the division is intended for sale as a package. But they’re up to something interesting.

The headline is that the Sun website is going behind a paywall, but this is really beside the point. The core of the offering is a mobile app that provides football highlights and daily deals – the details are at the link – but the really interesting bit is that you have to buy the paper to activate the app. The website is here or there; it’s about the mobile app, the paper, and (as always with Murdoch) the crosspromotion.

The other really interesting thing here is that News knows in some detail who sells copies of the paper, how many, and where. In the UK, newspapers are distributed on a sale-or-return basis. The retailer orders as many papers as they think they need, and bundles up the unsold copies for collection with the morning’s delivery. These are counted, and deducted from the retailer’s bill. If you’ve ever been in a corner shop late at night, you’ll have seen the shopkeeper bundle up, count, and label the returns with the preprinted barcode supplied by the wholesaler.

The point of the exercise is that the newspaper, not the newsagent, takes the risk. This is important because, obviously, the paper cannot sell if it is not on the shelves. If the newsagent gets stuck with unsold copies, they will order fewer papers and take the chance of running out. Alone among British newspapers, The Guardian is distributed on cash terms, which is why it’s so often sold out. I think they consider it unsportsmanlike or something.

So, to shorterise: the sale-and-return distribution model requires the publisher to know all the points of sale. Another feature is that it is a great way to measure the newspaper’s effective circulation in detail, and as the papers are being accounted for, this is subject to the paper’s auditors. You can see why advertisers might like this.

Now, the Sun is going to be getting its readers to type or scan something in the paper into the app. If, as I pointed out on Twitter, this something is specific to the individual papers, it’s possible to identify where the user buys their newspaper. I had some doubts about practicalities in the printing process, and wondered if they intended to do something clever with the PayPoint API or hand out separate inserts to be added by the newsagents.

But today, the question is answered – I was able to examine a pile of the things, and they indeed carry a 9 12-character unique identifier, printed as part of the newspaper. Keen and agile minds will observe that this provides enough entropy to identify the whole print run uniquely, indeed, a print run substantially bigger than even the Daily Mirror‘s 1960s five-million plus. This might mean that it encodes more information than just a serial number, or alternatively that they’ve left space to do so should they want to in the future. Also, it’s not obviously sequential, so you can’t trivially work out the daily print run, although I haven’t made any serious study of this.

(Update): It’s actually 12 characters, in three alphanumeric groups of 4, and all the issues I’ve seen started SS… today. That might be part of a geographic identifier, but it might also be Sunday Sun. Anyway, that gives quite a bit of scope.)

Why is this useful? Well, this gives them insight into close-up neighbourhood geography across the UK. People are likely to buy their newspaper in the same place as they buy plenty of other things, for one. But it’s also a look into the UK’s cash economy. Newspapers have always been sold mostly for cash, anonymously. And the heavy dose of football in the experience points at other ambitions. After all, they can probably work out which is your local pub, and a lot of them have a WiFi hotspot provided by the Sky (ISP) subsidiary The Cloud.

It is probably telling that this comes just after their new “casual” Sky TV offering, something which seems to exist to satisfy demand generated by advertising in the Sun app. Also, Sky TV has always been very good at last-minute ad insertion.

But there’s also a political element here, especially for those of us who fundamentally wish ill to all Murdoch’s business ventures. When the News of the World was rolled into the sea like a sack of waste, as Hunter Thompson said of Richard Nixon’s mortal remains, I was very impressed by the public response, which was either rejoicing, or else, absolute apathy. None of its millions of readers was moved to protest or even to complain. Surprisingly few of them even bother to buy the Sun on Sunday. As a result, I asked on this blog if Sun readers actually exist, in the sense of people who self-identify as such, rather than being labelled by others, in the way that readers of the Guardian, Telegraph, or Mail do in Jamie Kenny’s sense.

In this sense, I think this project is an effort to make Sun readers out of people who happen to read the Sun in the same way as some people happened to consume the News of the World as a weekly kitschburger, a newspaper-style product. The polite way of saying this is “deepening user relationships”.

The really depressing element of this is probably how much of the ad revenue sounds like it’s going to come from gambling. No, it’s not even that, even if the goal of the week starts coming with an editorial soundbite like page 3 did in the Rebekah Wade years. (That said, the business model may well end up being all about cross-media advertising – the Springer papers in Germany seem to be trying to collect as many classified ad outlets as possible.)

The really, really depressing element of this is the increasing degree to which your local pub is being converted into an integrated Murdoch experience. I already resent this (how could I not?) but it’s only going to get worse. Note that the TV ad strapline for all this is “Get Involved”.

(Note: Le Monde‘s South Kensington correspondent Marc Roche argued a few weeks ago that the Sun‘s problems were down to the “disappearance of the blue collars and their replacement by immigrants who can’t speak English”. The Sun does not appear to be basing its strategy on this, to say the least.)


At Fistful of Euros: so what did Nicolas Sarkozy know about DSK, why did he only leak the bits he leaked, and what are the voters going to do about it?, how I was wrong about the euro, NATO for dictators, floating without a strategy, although if we had one it would probably be wrong.

At Stable & Principled: my entire post on Dr. Tim Morgan, how we were right as far back as July about the coalition economic strategy, the Kübler-Ross model of grieving, and the coming Sad Donkey Economics movement has vanished into thin air. This is a disturbingly common event with S&P, although nowhere near enough to account for its general lack of content. You know you want it, though. Update: Up!

I thought I was following you…

For George Osborne read Bernie Madoff: he’ll take your money and take your job, but don’t worry – if you wait long enough, he promises you’ll get it all back from someone else.

(Ed Balls, here.)

Note he didn’t say “Of course, we accept the necessity of cuts, but George Osborne is really like Bernie Madoff”. This was rather the point of this post. In the end, the people who thought themselves masters of rhetoric, campaigning, and media management were stuck with a hopelessly confused message that they would have rightly mocked had it come from anyone else. On the other hand, funny old Gordon (is he going mad?) had a clear and immediately comprehensible message.

Of course, neither I, Ed Balls, or probably anyone else in Britain actually thinks we’ll never need to do anything about the budget deficit. The question is what, how much, from whom, and when. The official answer from Labour on the campaign trail and since was “about half as much, or as much over twice as long”.

Even keeping the plan target from the Pre-Budget report, to reduce the deficit by half over the next parliament, there’s still significant room to do a better job. You could look at the distributional impact and call attention to the fact that poor families with children lose out the most. You could look at the breakdown between growth, inflation, taxation, and cuts, and perhaps dust off the file from the late 90s. During the 90s fiscal stabilisation, Ken Clarke and Gordon Brown pursued a policy of splitting the adjustment burden equally between spending restraint and higher taxes. You could even use “the Clarke-Brown plan” as a talking point, seeing as Ken Clarke is back in the government. Beyond that, we could look at aiming for stabilisation first, and reduction only once the risk of a second recession is past.

But none of this is likely to help if it comes with an initial disclaimer that it should not be believed too hard. Rhetorical commitments are not much, as Nick Clegg would point out, but they do have more self-binding power than saying nothing. If you preface everything with a statement that you don’t really believe it very hard, you risk convincing everyone that the other half of the statement is the bit they shouldn’t take seriously.

How did we get here? I think it’s worth looking at the idea of “the centre” carefully. In my opinion, the idea that the centre ground plays a special role in politics is a sort of wrapper round a package of interesting assumptions. One of these is something like the median voter theorem – the idea that if voter preferences are roughly normally distributed along a scale between two camps, then the preferences of the voter halfway along the scale determine the outcome. Another is related, but different – that the political scale translates into practical priorities for government. Even if the spectrum is primarily made up of statements about identity, ethics, and emotion, it can be transcribed perfectly from political DNA into practical proteins. (Perhaps institutions are the RNA in this metaphor.)

This set-up is quite robust. There’s the classic version of centre ground politics, where preferences along the classic left-right scale are normally distributed and therefore most people are somewhere in the centre. As a result, democracy equals moderation, and campaigning is basically all about assembling a policy package that pushes the party line over the median. There are a couple of others. One is a version in which preferences have a binomial distribution, one peak for Labour, one for Tories, and there are a few swing voters in the middle. As the two peaks are roughly equal in size, though, this doesn’t change much – the decisive factor is still which way the centre goes. This is probably closer to the standard operating procedures of big political parties, although the theoretical legitimacy still comes from the first, moderate majority model. This still works in a world like 1970s Germany, where the swing voters are represented by a third party, and the primary form of political competition is trying to be the bigger of the two big parties and therefore the swing party’s preferred coalition partner. Here’s a fine example of living in either world, as is this.

Another one is a pathological variant – the 51% model beloved of Karl Rove. This accepts the two camps, but denies that there is a significant zone of potential agreement in the middle. Instead, it argues, the biggest source of potential voters for either side is the reserve army of the nonvoters; in a low-turnout polity, on the assumption that nonvoters break the same way as the general population, there are so many nonvoters with some prior party affiliation that they outweigh the swingers. The policy recommendation from this is that a party must do all it can to achieve asymmetric mobilisation, to rile up its own base while trying to damp down the others. Rather than trying to adapt to whatever the real preferences of the people are, as in the first model, or micro-targeting the relatively small group of swing voters, as in the second, the point is to wind up a bigger gang for whatever makes up a minimal consensus in your party.

Interestingly, this should have as a consequence the incremental radicalisation of the party that starts it. As anyone who wanted higher wages would be replaced by a member of the original reserve army of the unemployed, so anyone who deserts i is likely to be replaced by someone more extreme.

There’s a limiting case for this. If the political rhetoric that marks the scale is really meant to be transcribed into action, at some point the initiating party will get so extreme that its position is intolerable to a large majority of the public. But here’s a serious problem. In the well-behaved, school politics lesson world of scenario one, politicians are thought to set their positions on the political scale by reference to the practical policies they will command. They are flying by reference to the horizon, or at least to the artificial horizon of the polls. But I think it’s fair to say, at least going by the fact that they frequently say this is precisely what they’re doing, that politicians also set their positions by reference to other politicians. Rather than watching the horizon, they are watching the other guy’s wingtip and flying in formation.

Now in some cases this might actually work. If they are all working from a common view of reality, it’s entirely valid to reckon that the central axis of the Labour Party is however many degrees to the left of the leftmost Tory. There will be drift over time, but nothing too drastic. They can adjust their relative positions without colliding. What matters is that the constraints in their calculations are mutually consistent. They are linked because they have the same republic in their heads. This is, I think, what underlies the whole concept. It assumes a common public sphere and no bad behaviour. This is why operationalising postmodernism was important.

Of course, basic cognitive biases suggest that people who work together and share an institutional culture will think this even if they are competitors in practice.

The problem here is that the whole thing relies for stability on nobody adopting the counter-game strategy and either trying to change the rules, or just to drag the opposition so far off their home ground that they lose all credibility and fail first. I’m pretty confident that, to borrow a phrase from Nick Clegg, 1931 plus a pound does not equal “progressive”, and neither does it equal a winning strategy for the opposition. I’m much less confident that there is any way to correct the course except for adopting the 51% strategy or something very like it and trying to drag the lot bodily leftwards. I don’t particularly like the look of 102% World (two hypermobilised and completely mutually intolerable camps) either, but there you go.

Update: I have just read this post of Steve Randy Waldman’s which is more than relevant.

the House of Lords is not just stranger than you think..

This has me thinking one thing – TheyWorkForYou needs to integrate the text-mining tool researchers used to estimate the point at which Agatha Christie’s Alzheimer’s disease set in by analysing her books. We could call it WhatHaveTheyForgotten? Or perhaps HowDrunkIsYourMP? Jakob Whitfield pointed me to the original paper, here. It doesn’t seem that complicated, although I have a couple of methodological questions – for a start, are there enough politicians with a track record in Hansard long enough to provide a good baseline for time-series analysis?

Instead, we could do a synchronic comparison and look at which politicians seem to be diverging from the average. Of course, some might object that this would be a comparison against a highly unusual and self-selected sample. Another objection might be that the whole idea is simply too cruel. Yet a further objection might be the classic one that there are some things man should not know.

Update: Implemented!

fast zombies shoot!

So the England Zombies are looking more like Fast Zombies again. If I’ve bored you by talking up James Milner, I’d like to take this opportunity to claim my bragging rights. Here’s something interesting; back at the weekend, in the depths of self-loathing, the Obscurer published a table showing the teams with various statistics, including shots on goal. It struck me that England were looking rather good on that, and that the top four looked mostly like a plausible semi-final line up. So I’ve put together a spreadsheet ranking the teams by shots on target/matches played.

Data source here. Having in my browser history makes me feel dirty for some reason.

That puts England 5th in the world – quarter finals again – but ahead of all the three possible opponents in the second round, Germany (7 on target/game vs. 7.333 – Google Spreadsheets is lax about sig figs), Ghana, and Serbia, and well ahead of the Netherlands and Italy. Further, out of the top four, Spain aren’t looking a cert to qualify out of their group, and they have an even worse tradition of World Cup choking than we do. This may be daft, sunshine and beer optimism; but it’s daft, sunshine and beer optimism with data.

Update: Well, would you look at that.

Introducing WhoseKidAreYou

I’d like to introduce you to a new project. The other day, I was reading an imbecilic union-bashing editorial by one “Hugo Rifkind”, and I wondered….whose kid are you? Wikipedia informed me that diary columnist (it’s like a journalist but not quite) Rifkind is indeed the former Defence and Foreign Secretary’s son, and he’s “written” a “book” about “the London media world” called Overexposed Overexposure, which kicks the bottom out of the rotting barrel of satire.

And there, I had it – we need a Web site to monitor nepotism, and backscratching influence-peddling more generally. WhoseKidAreYou! There’s been quite a lot of work on designing machine-readable ways of expressing relationships between people, but to start with, I reckon we need a decent wiki server or else perhaps a Django install, and the British journalists section of Wikipedia as a start. We can crowdsource the rest; we’ve got bitterness and resentment on our side, plus a powerful kicker of personal loathing!

We’ll need to hold basic biographical data, plus job and publication history, a link to corresponding Wikipedia data, and of course, the crucial affiliations. Not just WhoseKidAreYou, but also WhoseThinktankDoYou”Work”For. Once we’ve got a reasonable amount of data, we can think about social-graph visualisations and other fancy twirls; we could also do a browser extension that picks out bylines, searches the DB in background, and shows a notification. “Did you know this was written by Christopher Hitchens’ illegitimate son, working for a thinktank founded by Douglas Murray?”

I am deadly serious about this, and I would like your comments. The project isn’t really suited to – it’s far from neutral and it’s explicitly partisan and generally vicious – so it’ll have to be unilateral. I’ve set up a Google group (aka a mailing list/usenet group) over here.

UPDATE: More is here, including how to take part.

UPDATE UPDATE: Hugo Rifkind has been in touch, to point out that I misspelled the title of his book.

Dr. Benway strikes again, with Venture Capital

OK. So we looked into voice stress analysis and the world telecoms infrastructure. And we concluded that proper VSA – the sort with the peer-reviewed scientific papers an stuff – was technically impossible. Recap; the original VSA research is based on a change in a signal in your voice between 8 and 12Hz, but even the highest-quality voice codecs used for public telephony filter out everything below 50Hz, so a VSA system based on – well – science couldn’t possibly work.

But there was always the possibility that “Nemesysco” had hit on some kind of roaring king-hell breakthrough. Minitrue couldn’t find a copy of the patent that covers their product; you might wonder why there wasn’t a US patent if it’s so great, or why every call-centre workflow system and high-end mobile phone in the world doesn’t have it as a much-valued standard feature, or why Amir Liberman, the CEO of Nemesysco, isn’t incredibly rich.

After all, he’s been hawking it since at least 1998. His company was formed in early 2000, just a tad late for the joy of the .com boom; at the time they were marketing towards consumers and businesses. But, as the venture capital dried up, the stock exchange cursed everything to do with computers, and it looked like a whole world of vaguely technical young sheisters would have to get a job…something happened, and suddenly his product became “Israeli intelligence service technology” that would save you from terrorists.

There is no evidence that Tsahal or the intelligence services ever made use of it, but as reader Chris “Chris” Williams points, there is a certain mana attached to the Israeli military – link your product to them, and it gets just that bit badder. I tell you, it’s the sunglasses.

So, let’s cut to the chase. The patent is here, thanks to the Canadian government. The “claims” section described how it is meant to work – there’s even an example implementation in Microsoft Visual Basic (you bastards). Here’s how: it takes samples of speech and identifies “plateaus” – flat bits – and “thorns”. Thorns are defined as:

A thorn is a notch-shaped feature. For example the term thorn may be defined as:
a) a sequence of 3 adjacent samples in which the first and third samples are both higher than the middle samples
b) a sequence of 3 adjacent samples in which the first and third are both lower than the middle samples

Now, all speech is roughly speaking a succession of sine waves; by definition it’s going to fit this. Anyway, they take a control sample of speech, count the plateaus and thorns and compute the standard errors, then they ask the questions they want to test, and do the same thing. They then look at the difference between the values and compare them to reference values to tell if you’re lying.

Where do these reference values come from? It is appreciated that all of the numerical values are merely examples and are typically application-dependent. So basically, the all-crucial message on the screen depends entirely on the sensitivity values you punch in to the thing; perhaps great if you’re trying to bully some random Palestinian, but not so good if you need real information.

Hey, if they only knew Visual Basic and were willing to commit Software Crime, Harrow council could crank the reference values down to zero and deny EVERYBODY their housing benefit.

From this, he reckons he can determine:

Excitement Level: Each of us becomes excited (or depressed) from time to time. SENSE compares the presence of the Micro-High-frequencies of each sample to the basic profile to measure the excitement level in each vocal segment.

Confusion Level: Is your subject sure about what he or she is saying? SENSE technology measures and compares the tiny delays in your subject’s voice to assess how certain he or she is.

Stress Level: Stress is physiologically defined as the body’s reaction to a threat, either by fighting the threat, or by fleeing. However, during a spoken conversation neither option may be available. The conflict caused by this dissonance affects the micro-low-frequencies in the voice during speech.

Thinking Level: How much is your subject trying to find answers? Might he or she be “inventing” stories?

S.O.S: (Say Or Stop) – Is your subject hesitating to tell you something?

Concentration Level: Extreme concentration might indicate deception.

Anticipation Level: Is your subject anticipating your responses according to what he or she is telling you?

Embarrassment Level: Is your subject feeling comfortable, or does he feel some level of embarrassment regarding what he or she is saying?

Arousal Level: What triggers arousal in the subject? Is he or she interested in you? Aroused by certain visuals? This new detection can be used both for personal use for issues of romance, or professionally for therapy relating to sex-offenders.

Deep Emotions: What long-standing emotions does your subject experience? Is he or she “excited” or “uncertain” in general?

SENSE’s “Deep” Technology: Is your subject thinking about a single topic when speaking, or are there several layers (i.e., background issues, something that may be bothering him or her, planning, etc.) SENSE technology can detect brain activity operating at a pre-conscious level.

He can apparently detect that all from a total of two measurements. Note also that there is no mention of Micro-High Frequencies in his patent claims; if they were particularly high, they would probably vanish in the band-pass filters above 3.4kHz….

I have collected these claims across his Web site; I wonder if Harrow council is aware that exactly the same technology is being marketed as a “Love Detector“? Or that another company has ripped off the patent, and he warns buyers that theirs won’t produce the advertised 85% accuracy, even though it’s the same patent? This is scienciness, not science. But then, the point is to scare the poor.

Update: See here

G.729 and the welfare state

I was going to fisk the Government’s depressing sudden love affair with the discredited nonsense of “lie detectors”, but I see the Ministry has already done it. Go and read; it’s an instant classic. And as a bonus, there’s a great comment from the Great Simpleton, who you occasionally find in comments here, about some effects of telecoms infrastructure on the welfare state.

It’s certainly all nonsense – the 3G voice codec, AMR Narrowband, includes a band-pass filter between 200Hz and 3.4KHz, as do G711 and G729, so the markers VSA relies on, which are to be found between 8 and 12Hz, will be undetectable on any current mobile or fixed phone. Even the AMR Wideband high-quality voice standard will pass nothing – the band-pass for that one is 50Hz-7KHz. Any sound that does turn up at the VSA, therefore, is an artefact of some kind – a stray cosmic ray, or the acoustic echo cancellation at the local exchange going out of kilter when it produces the synthetic network noise to reassure you the line isn’t dead. (You might be advised not to Skype the benefits office – they’re considerably wider band when they are comprehensible at all.)

To expand on my comment over there, though, someone already markets a voice-stress analyser application for Windows Mobile smartphones. It’s probably mostly witchcraft and social engineering, but it’s very likely easier to do the opposite; either filter out the frequency band that is meant to be the marker, which could maybe sound weird or be too obvious if you could hear it at all, inject noise into that channel, or create a synthetic signal. That would be the hardest of the three to implement, but it would provide some interesting affordances – you could choose to sound more untrustworthy. If you could hear it, that is.

The only thing this achieves, then, is to deny some people their bennies entirely at random. Which is, of course, a highly political act.

Update: See here.