Lost in the Data

Guo and Mennis outline the emerging field of spatial data mining for the introduction to a special journal issue.  Work in this field has been prompted by the ever-increasing availability of finer and finer grained data, from an exponentially increasing number of sensors ranging from satellites to cell phones to surveillance cameras.  There has been a great interest in tapping into and making sense of these streams of “big data”, but in order to do so we must develop new ways of exploring, processing and analyzing them.  This is essentially what was alluded to in relation to geostatistics last week: our data and technology have surged way ahead of our available methodological toolbox.

One of the biggest issues with these new data sources is that they are largely unstructured: for example, they may just be a string of text such as a tweet, with some locational metadata.  In order to analyze a large number of unstructured data points, it is necessary to impose a structure via classification.  This is no easy task!  Although computer programs such as qualitative coding software packages exist and can group phrases by theme, most classification algorithms that exist necessitate a training process where the user tweaks the parameters of the classifier manually on a subset of the data.  The development of foolproof unsupervised classifiers that can not only sort unstructured data effectively, but also do so in a way that the output is of use to researchers, is a major challenge in this domain.

A key related idea to the advent of big data is the long-standing trade-offs between resolution and extent in spatial scale.  Though big data presents us with both extent and resolution of unprecedented magnitude, there still remain the limits imposed by humans’ own cognitive abilities.  Computer programs developed to make sense of big data must classify and generalize the raw input data in a way that allows geographers to effectively navigate this sea of data, rather than simply leaving us lost.

-FischbobGeo

Comments are closed.