The UCDP GED & the Power of GIScience

Sundberg & Melander (2013) introduce the Uppsala Conflict Data Program (UCDP) new Georeferenced Event Dataset (GED) in their 2013 piece published in the Journal of Peace Research. The details of the dataset are presented in a concise manner, however I had to dig a bit deeper to find more information regarding the geocoding of lethal events. I found a very interesting article written by Kristine Eck (Department of Peace and Conflict Research, Uppsala University) that highlighted the geocoding procedure of the UCDP’s GED (see excerpt below):

The creators of the dataset appear to have really thought about that importance of communicating uncertainty to end-users of the database. The 3-step process includes manual input from “coders”, revision of entries by a supervisor, and a final verification of the entries with specific automated processes (scripts). I applaud the creators of the dataset for this rigorous verification of the entries into the dataset. Moreover, I am also happily surprised to read that the creators of the dataset have really thought about how to deal with uncertainty in the geospatial data (e.g. a fatal event that occurs “somewhere near place X”, or “In province Y”). The introduction of a system that assigns an integer value (1-7) to an attribute/event based on the precision of the geospatial information associated with the event itself is not particularly new: the Armed Conflict Location and Event Dataset – ACLED – has a 1 to 3 scale similar to the UCDP’s GED. What is noteworthy is the use of centroid locations, rather than important cities, as pseudo-locations to events that have vague event areas (and not so much locations).

While the sociopolitical ramifications of a database of this sort are important and should be debated, I really think that the authors and creators of the dataset have done a thorough job of thinking through the use of geospatial information within their data. They strive to minimize bias towards densely populated areas, and strive to maintain, not “improve” or “make more detailed” by introducing MORE error into the location information, the uncertainty in spatial information by using an uncertainty scale and using location information other than a a country’s capital city, for example, as the default location of events that have vague locations/areas.

I believe that this dataset is a great step forward for GIScience, as it has proven to be useful and arguably essential to the success of the UDCP’s GED. As for the Sundberg & Melander piece, I really wish they went more into detail about the decisions behind the georeferencing of these events. That’s probably just the (albeit reluctant) GIScience side of me starting to come out, though.

– ClaireM

Eck, K. (2012). In data we trust? A comparison of UCDP GED and ACLED conflict events datasets. Cooperation And Conflict, 47(1), 124-141. http://dx.doi.org/10.1177/0010836711434463

“UCDP GED avoids a great deal of these problems through a triple-checking process. The first manual check is done by the coder, and the second by the UCDP project leader, who manually checks the data and uses Spatial Key, a visualization software for geographic data, to map the data and locate possible miscoded coordinates. In the third stage, automated scripts in Python and PHP are run to check for internal consistency in dates, actors, dyads, conflicts, and fatality counts. The automated scripts pick up problems like the same city being given different coordinates. The scripts normally pick up dozens of errors per country, suggesting that they are invaluable in the data-cleaning process.

The second recurring geocoding problem in the ACLED data is the misuse of the geoprecision codes. In ACLED and UCDP GED, a geoprecision code of 1 indicates that the coordinates marking the exact location that the event took place, usually a inhabited area. When a specific location is not provided, i.e. “Helmand province,” ACLED and UCDP GED employ different strategies for managing this issue. ACLED selects the provincial capital while UCDP GED selects the centroid point when available and the provincial capital when a centroid point is not available. One can debate which is the best practice, but what is crucial is that the data provider convey uncertainty about the location to the user. This is done through geoprecision codes; higher numbers on the geoprecision code indicate broader geographic spans and thus greater uncertainty about where the event occurred (the range for ACLED is 1-3, for UCDP GED it is 1-7).”

 

Comments are closed.