Cables - SECRET Confidentiality analysis

What ?

We were a bit interested about the SECRET confidentiality of cables and we wondered about what kind of information we could derive from them.

We decided to look at them with an history point of view ; trying to find a relation between the number of secret cables in a certain point of time with a particular big event at this same point of time.

How ?

We used once again the useful GraphX framework for Spark. However, here the aim was not to construct any graph for visualization but to find some numbers to display on a time axis.

Thus, once the skeletal graphs were built, we extracted the information from them and displayed it in chart forms.

All of the "coding" part of this search can be found in the directory "secret-documents" found in the graph-exploration part of the plusd group.

The results ?

We first looked at all the data, separated by years, and see whether we can find an interesting year that has more secret documents than "usual".

We first got these following results :

The first graph shows the percentage of secret cables over the years. The second graph is here to show how much documents there is overall. This allows us to not look too much into years from 1985 to 1999, whose low numbers don't allow us to tell anything from the percentages.

From these results, we see that year 2003 seems to be a bit more unusual as it breaks this kind of progression we can see from 2000 to 2010.

We then focused on 2003 and separated the document by months :

We easily see that January has the highest number of secret documents (~10%). It is interesting to note that January 2003 is two month before the beginning of Iraq War (which began with the invasion of Iraq by the United States).

For example, with some quick searches on the wiki leaks data, we found these following interesting cables :

In the end, we see we can easily explore cables data to find interesting points of time.