I am late to post and I think everyone else has already posted lots of excellent ideas about these topics! I found the spatial data mining article very interesting. I think that statistical modeling and machine learning are two disciplines which share a lot in common and in some cases may even be redundant versions of one another. When I read papers written by computer scientists implementing machine learning with data, it seems that the goal (in this case mostly through unsupervised data mining) is to improve predictive ability, often measured by area under an ROC curve, for example. The goal of models in statistics is often to estimate (causal) effects and requires a different conceptual framework for model building and selection to avoid, for example, controlling for a variable in the causal pathway.
Additionally, many of the issues in spatial data mining / spatial statistics are mirrored as well. Correlation and dependence in space and time create problems for the traditional parameter estimators in statistics and for the traditional algorithms in classification/prediction/clustering in machine learning. It’s not enough to just consider spatial dependence, it’s also important to consider nuances of spatial data which may make goals difference – such as the authors mention below figure 3.2, where they talk about how spatial accuracy should be measured not in a binary (correct/incorrect) sense but should account for how close (spatially) the classification was. I would really like to more thoroughly understand how statistics and machine learning algorithms really align and differ. It’s clear this is a highly interdisciplinary field – we need people trained in GIS, computer science, and statistics!
Tags: GEOG 506