Finding Evil in a Haystack


Foreign Policy has a reasonably good article about the systems the NSA uses to monitor communications.

I say it is reasonably good in that they get some details right on the tools and algorithms used to analyse communications including (even thought they don;t use the term) link analysis, emergent grouping and other statistical analysis methodologies that allow systems and analyst to isolate the abnormal from teh billions of normal transactions in the data.

Over the last week, critics and defenders of the National Security Agency have heatedly debated the merits of metadata — information about the phone activity of millions of Americans that was given to the government via a secret court order.

The information collected includes records of every call placed on the Verizon communications network (and, it appears, every other U.S. phone carrier) including times, dates, lengths of calls, and the phone numbers of the participants, but not the names associated with the accounts.

For some, the collection of these data represent a grave violation of the privacy of American citizens. For others, the privacy issue is negligible, as long as it helps keep us safe from terrorism.

There are indeed privacy issues at play here, but they aren’t necessarily the obvious ones. In order to put the most important questions into context, consider the following illustration of a metadata analysis using sample data derived from a real social network. The sample data isn’t derived from telephone records, but it’s close enough to give a sense of the analysis challenges and privacy issues in play.  

There is plenty of data out there…phone records, twitter traffic, Facebook, and so on…all you need is the right tools and methodology.

While this example is relevant to what happens behind the NSA’s closed doors, it is not in any way intended to be a literal or accurate portrayal. While every effort was made to keep this example close to reality, a wide number of hypotheticals and classified procedures ensure the reality is somewhat different.

We start with a classic scenario. U.S. intelligence officials have captured an al Qaeda operative and obtained the phone number of an al Qaeda fundraiser in Yemen.

You are an analyst for a fictionalized version of the NSA, and you have been authorized to search through metadata in order to expose the fundraiser’s network, armed with only a single phone number as a starting point.

The first step is refreshingly simple: You type the fundraiser’s phone number into the metadata analysis software and click OK.

In our example data, the result is a list of 79 phone numbers that were involved in an incoming or outgoing call with the fundraiser’s phone within the last 30 days. The fundraiser is a covert operator and this phone is dedicated to covert activities, so almost anyone who calls the number is a high-value target right out of the gate.

Using the metadata, we can weight each phone number according to the number of calls it was involved in, the lengths of the calls, the location of the other participant, and the time of day the call was placed. Your NSA training manual claims these qualities help indicate the threat level of each participant. Your workstation renders these data as a graph. Each dot represents a phone number, and the size of the dot is bigger when the number scores higher on the “threat” calculus.

This is classic link analysis…mostly it is automated and presents and highlights anomalous transactions (calls) based on previous analysis. Insurance companies and banks do this every day and int eh case of companies like American Express it is real-time and involves velocity algorithms as well…the faster the frequency of transactions the higher the chance it is suspicious…same goes for communications networks.

This is already a significant intelligence windfall, and you’ve barely been at this for five minutes. But you can go back to the metadata and query which of these 79 people have been talking to each other in addition to talking to the fundraiser.

Using a common mathematical calculation, each phone number can be weighted based on how it provides links to other numbers in the network (the math is similar to the formula Google uses to rank pages). High-scoring accounts are almost always extremely significant to a social network (although that’s not the same thing as being an important terrorist).

Your search reveals that many of these people are talking to each other and not just to the fundraiser. That suggests they may be coordinating their activities. It might suggest that their conversations pertain to al Qaeda business, especially if you factor the criteria from the first graph (the graph above only reflects network position).

What if you looked at all of the incoming and outgoing calls for all 79 of the phone numbers you have examined so far? This is where the rubber hits the Big Data road.

As I said this is routine run of the mill analysis that large government and financial organisations do all the time for fraud detection, anti-money laundering operations and criminal network analysis. It is pretty cool to watch it operate.

Go read the rest to get some understanding…and know this…many of our own government agencies already do this…ACC, IRD, WINZ, Police, most major banks…in Australia this is also done with the ACCC, the ATO and the Federal Police.

Do you want ad-free access to our Daily Crossword?

Do you want access to daily Incite Politics Magazine articles?

Silver Subscriptions and above go in the draw to win a $500 prize to be drawn at the end of March

Not yet one of our awesome subscribers? Click Here and join us.

As much at home writing editorials as being the subject of them, Cam has won awards, including the Canon Media Award for his work on the Len Brown/Bevan Chuang story.  And when he’s not creating the news, he tends to be in it, with protagonists using the courts, media and social media to deliver financial as well as death threats.

They say that news is something that someone, somewhere, wants kept quiet.   Cam Slater doesn’t do quiet, and as a result he is a polarising, controversial but highly effective journalist that takes no prisoners.

He is fearless in his pursuit of a story.

Love him or loathe him.  But you can’t ignore him.