Finding Evil in a Haystack


Foreign Policy has a reasonably good article about the systems the NSA uses to monitor communications.

I say it is reasonably good in that they get some details right on the tools and algorithms used to analyse communications including (even thought they don;t use the term) link analysis, emergent grouping and other statistical analysis methodologies that allow systems and analyst to isolate the abnormal from teh billions of normal transactions in the data.

Over the last week, critics and defenders of the National Security Agency have heatedly debated the merits of metadata — information about the phone activity of millions of Americans that was given to the government via a secret court order.

The information collected includes records of every call placed on the Verizon communications network (and, it appears, every other U.S. phone carrier) including times, dates, lengths of calls, and the phone numbers of the participants, but not the names associated with the accounts.

For some, the collection of these data represent a grave violation of the privacy of American citizens. For others, the privacy issue is negligible, as long as it helps keep us safe from terrorism.

There are indeed privacy issues at play here, but they aren’t necessarily the obvious ones. In order to put the most important questions into context, consider the following illustration of a metadata analysis using sample data derived from a real social network. The sample data isn’t derived from telephone records, but it’s close enough to give a sense of the analysis challenges and privacy issues in play.  

There is plenty of data out there…phone records, twitter traffic, Facebook, and so on…all you need is the right tools and methodology.

While this example is relevant to what happens behind the NSA’s closed doors, it is not in any way intended to be a literal or accurate portrayal. While every effort was made to keep this example close to reality, a wide number of hypotheticals and classified procedures ensure the reality is somewhat different.

We start with a classic scenario. U.S. intelligence officials have captured an al Qaeda operative and obtained the phone number of an al Qaeda fundraiser in Yemen.

You are an analyst for a fictionalized version of the NSA, and you have been authorized to search through metadata in order to expose the fundraiser’s network, armed with only a single phone number as a starting point.

The first step is refreshingly simple: You type the fundraiser’s phone number into the metadata analysis software and click OK.

In our example data, the result is a list of 79 phone numbers that were involved in an incoming or outgoing call with the fundraiser’s phone within the last 30 days. The fundraiser is a covert operator and this phone is dedicated to covert activities, so almost anyone who calls the number is a high-value target right out of the gate.

Using the metadata, we can weight each phone number according to the number of calls it was involved in, the lengths of the calls, the location of the other participant, and the time of day the call was placed. Your NSA training manual claims these qualities help indicate the threat level of each participant. Your workstation renders these data as a graph. Each dot represents a phone number, and the size of the dot is bigger when the number scores higher on the “threat” calculus.

This is classic link analysis…mostly it is automated and presents and highlights anomalous transactions (calls) based on previous analysis. Insurance companies and banks do this every day and int eh case of companies like American Express it is real-time and involves velocity algorithms as well…the faster the frequency of transactions the higher the chance it is suspicious…same goes for communications networks.

This is already a significant intelligence windfall, and you’ve barely been at this for five minutes. But you can go back to the metadata and query which of these 79 people have been talking to each other in addition to talking to the fundraiser.

Using a common mathematical calculation, each phone number can be weighted based on how it provides links to other numbers in the network (the math is similar to the formula Google uses to rank pages). High-scoring accounts are almost always extremely significant to a social network (although that’s not the same thing as being an important terrorist).

Your search reveals that many of these people are talking to each other and not just to the fundraiser. That suggests they may be coordinating their activities. It might suggest that their conversations pertain to al Qaeda business, especially if you factor the criteria from the first graph (the graph above only reflects network position).

What if you looked at all of the incoming and outgoing calls for all 79 of the phone numbers you have examined so far? This is where the rubber hits the Big Data road.

As I said this is routine run of the mill analysis that large government and financial organisations do all the time for fraud detection, anti-money laundering operations and criminal network analysis. It is pretty cool to watch it operate.

Go read the rest to get some understanding…and know this…many of our own government agencies already do this…ACC, IRD, WINZ, Police, most major banks…in Australia this is also done with the ACCC, the ATO and the Federal Police.


THANK YOU for being a subscriber. Because of you Whaleoil is going from strength to strength. It is a little known fact that Whaleoil subscribers are better in bed, good looking and highly intelligent. Sometimes all at once! Please Click Here Now to subscribe to an ad-free Whaleoil.

  • Patrick

    The UK have been monitoring phone & bank usage for many years, way before 9/11. I knew a copper in New Scotland Yard, his job was to trawl known villains mobile & bankcard usage. Criminals would use pay as you go Sims but inevitably would top them up with their bankcards & bingo the cops had a match. They would then track the phones geographical locations via the telecoms companies – at least one murder conviction was obtained that way, a guy shot in his Range Rover in Essex. The spread of SQL databases (of which ever flavour) has helped immensely.

    • onelaw4all

      “his job was to trawl known villains mobile & bankcard usage.”

      Which is a targeted monitoring, as opposed to the untargeted/blanket monitoring of a constitutionally protected populace.

      • Patrick

        He told me it was known villains but what is the bet it was all & sundry. After all give them an inch etc.
        Also it was prior to 9/11, many countries changed their laws after 9/11 & after the London bombings I bet the UK are monitoring a lot more than they did back then.

  • Mr_Blobby

    “They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.” Benjamin Franklin.

    The question is did we give it up or was it taken.

    Before we give up our privacy for safety, lets review what the risk level is. Perceived risk as apposed to actual risk.

    How many people have been killed by cars, alcohol, cigarettes, cancer, plane crashes, natural events etc.

    How many from terrorism.

    • Rex Widerstrom

      I might be tempted to add that those who justify the erosion of liberty on the basis of “it’s already happening” or “but x already does it, why shouldn’t y” are Quislings.

      The banks do it, yes, but when you sign up for a credit card you give them express or implied permission to do so. You don’t give Bank A, of whom you’re not a customer, permission to trawl your transaction record at Bank B. If you don’t want your bank knowing your complete financial history it’s as simple as opening accounts with two or more banks (and not running up any defaults).

      And you don’t authorise your bank to cross-match your transactions with the calls you make and receive using Telco C.

      Any way you look at it Prism and similar instruments are an unwarranted intrusion into the lives of citizens, treating us all as guilty till proven innocent – worse in fact, as we’re constantly monitored to make sure we’re don’t become guilty tomorrow, or the next day…

      If records need to be matched it should be done on the basis of reasonable suspicion, and with a warrant under oversight of an independent judicial officer. At a minimum.

      That method works – just ask Australian “ex” Labor MP Craig Thomson how embarrassing it can be when your union credit card receipts to a “restaurant” are matched with the business’s number, which turns out to be that of a brothel.

  • cows4me

    I mock the idiots that claim they have nothing to hide so why should they fear the arseholes that spy upon us. I argue I have done nothing wrong either so why should someone spy on me.Look what the Democratic Party and their band of arse lickers in the IRS did to political movements on the right. No fucking government can be trusted, imagine if She Beast had had the technology of that offered by Prism, we would now be residing in a totalitarian state. Our war isn’t a war on terror our real war will be against a one word government added by institutions like the NSA and those holding the technological power.

    • Mr_Blobby

      Yes, the if you have nothing to hide you have nothing to fear argument, is a bit like a threat, from a robber telling you that you have nothing to fear, if you sit down and not cause any problems, whilst being robbed.

  • Rodger T

    So the real issue with KDC is the yanks didn`t like the competition?