War Diary group

Iraq and Afghans War Diaries:

People responsible for these tasks/milestones:
Ketevani Zaridze
Kassir Hussein
 
Abstract:
 
Up to 91,000 secret military reports from Afghanistan and 391,832 secret reports from Iraq covering wars from 2004 to 2010.  The reports describe the majority of actions involving the United States military. They include the number of persons internally stated to be killed, wounded, or detained during each action, together with the precise geographical location of each event, and the military units involved and major weapon systems used. These archives shows the vast range of  tragedies that are almost never reported by the press.
 

Project Goals:

  • Straightforward aggregation of war results: number of casualties (civilian or military), number of attacks (successful or unsuccessful), war costs, equipment quantities, sizes and number of military units
  • Define different military unit activities and find connection between them
  • Find the existing pattern and strategy used by the US or their enemy based on the 2 wars
  • Conclusions to where and how the conflict intensity has increased decreased or shifted in time

 

Methods to achieve project goals:

  • Initially start from simple data aggregation (information can be analyzed without using advanced computational methods)
  • We can start with a nonparametric analysis of the data by splitting it into weekly, monthly intervals
  • Clustering of event spots (we can try to use statistical methods such as spatial point processes since events are spatiotemporally correlated. This method is commonly used in several fields such as epidemiology)


Resources/Dataset:

Afghan War Files

Iraq War Files

US Military Units and Equipment in Afghanistan

US Military Units and Equipment in Iraq

US Military Abbreviations

We will need to do cleaning and crawling for military unit, equipment and abbreviaiton data since we don't have them all in database friendly format.

 

Milestones:

31 March: Finish looking for new datasets, get familiar with the data itself. Upload the data to the cluster, import into a database with meaningful attributes.
14 April: Have a simple back-end to communicate with data - do "simple" aggregations
28 April: Experiment with more advanced analysis tools/NLP
12 May: Complete visualization, report