Spatial Data Mining: Shekhar, Zhang, Huang and Vatsavai (2003)

The article by Shekhar, Zhang, Huang and Vatsavai (2003) begins with a clear explanation of the differences between spatial and non-spatial data mining, with some interesting examples. It would have been useful to include some of the information from last week’s article from geoviz about the prevalence of spatial information in digital data (~80%) for context, especially given the link between geoviz and data mining made at the end of the article. The article then goes on to list different statistical phenomena and methods, with clear examples which was helpful for context and keeping the text engaging.

The section I found most interesting, and which I think Allen will focus on during his research is clustering. One thing that was not mentioned in the article and which I wonder about, is the role of scale in spatial clustering, especially with large data sets. If you’re looking for spatial clusters, won’t scale play a big role in determining the clusters, ie. something might seem like a small cluster, but at a smaller scale, it is part of an even larger cluster. Using Allen’s research project of taxi ridership in NYC as an example, I would imagine that certain areas of Manhattan will have high instances of taxi ridership, but at a smaller scale,  Manhattan as a whole would be an area of taxi ridership clustering. I wonder how the choices of scale and data granularity in analysis lead to different results, and whether it is useful to run analysis at different spatial scales.


Comments are closed.