Thoughts on Spatial data mining and geographic knowledge discovery – An introduction (Mennis & Guo, 2009)

Mennis and Guo’s work generalized the trend, progress, and achievement on Spatial data mining, processing, and interpreting till 2009. It is a very helpful review for those who are not familiar with most of the techniques and approaches in the field of spatial data mining. Their work especially focus on the spatial classification and prediction, spatial association rule mining, spatial clustering, regionalization and point pattern analysis. Although this article makes everyone feels so excited about how the boom of geospatial data and mining technique, which feeds into research field, private sectors, and sometimes government operation, opportunities comes with a cost.

I am not saying more available geospatial data is bad, however, there are certain challenges the authors fails to discuss in detail. First is the selection bias when mining spatial data. For people who aware of GPS tracking devices and do not want to share their geospatial data, and those who have not access to GPS tracking devices yet, they are excluded in some of the hottest geospatial data mining realm, such as social media spatial data mining, there is a selection bias with the data, which may leads to unintended exclusion of population from the interpretation of the data. Although it can be taken care of if data from various sources can be joined together are used in the processing and interpreting stage, it is definitely something spatial data miners should be aware of.

Second is the privacy issue, more geospatial data does not actually makes everyone happier, it has a cost. Although more and more geospatial data is masked to protect privacy, the huge amount of data flows inevitably expose some or most population under privacy crisis. Thus there has to be an awareness for data miners to protect study subjects or data contributors’ privacy, and proper supervision in this field need to be address to prevent malicious mining of geospatial data.

At last, the availability of seems infinite geospatial data is thrilling for people who works in this field for sure. However, it also increases the difficulty and skill requirement for data miners. It is not only computational skills that allows data miners to mining the data. More importantly, is the skill to discover, to observe, and formulating the right question, which until nowadays is still heavily depend on human to make the call. Also, the ability to look at geospatial data critically is necessary. Unless the data fits perfect with our questions, there are uncertainties need to be address rather than blindly trust the data because the size of it.

Comments are closed.