Archive for the ‘General’ Category

VGI and Crowdsourcing Disaster Relief (Zook et al., 2010)

Sunday, November 12th, 2017

This paper mainly reviews the applications of four online mapping platforms in Haiti earthquake in 2010. It cannot be denied that the four platforms (i.e., CrisisCamp Haiti, OpenStreetMap, Ushahidi, GeoCommons) contribute to disaster relief in the earthquake. However, these technologies also bring problems remain to be discussed and further solved.

Primarily, in the beginning parts of this paper, the authors emphasize the importance of information technologies (ITs) in disaster response and then note how volunteered mapping helps. However, they focus on Haiti where the IT infrastructures are quite limited and geo-referenced data are lacked. I agree that volunteered mapping can provide efficiently and effectively provides these data for disaster rescue and tremendously facilitate the rescue. However, this may not happen in countries with good infrastructures and well-mapped. In that case, I will wonder what is the strength of volunteered mapping comparing with the traditional mapping databases and whether we need it.

Besides, since the platforms use volunteered geographic information (VGI), the fundamental problem is how to ensure the quality of these data. In term of disaster response, I think we should consider two general types of errors proposed by Goodchild (2007): a false positive (i.e., a false rumor of an incidence), or a false negative (i.e., absence of information about the existence of the incidence). The former will lead to the inefficiency in disaster rescue, and the latter can result in low effectiveness. Both could affect the human life even just one individual. Moreover, I will doubt that a place with more dense information is necessarily a place more in need. Information density can result from different reasons, but human lives are not different across areas. According to the authors, there are only 11% people can access to the Internet and one third have mobile phones. It means at two thirds people cannot send out distress calls in Ushahidi. Resources are firstly taken by people who have the access. The authors argue that we should blame the originally insufficient infrastructures. In other words, they think even without VGI the discrimination happens. This is a trick argument and defenses nothing. Of course, I agree that social inequality always exists. However, VGI is not value-neutral and it may worse the existing inequality. Criticists does not blame VGI bring the inequality but worsen the inequality, and currently there is no efficient way to solve the issues.

In conclusion, this paper provides comprehensive review of the benefits brought by volunteered mapping in disaster response in Haiti. But it is not critical enough when discussing the defects of using volunteered mapping. Through reading this paper, we can identify many questions remaining to be answered including the inherent characteristics of VGI and its applications.

Zook et al. Haiti Relief & VGI

Sunday, November 12th, 2017

Volunteered Geographic information is tool used to consolidate knowledge where it is needed by those willing to offer their data and expertise. I feel comfortable arguing that it is strictly a tool; it create no new process or analysis of information but simply refers to consolidation of knowledge to complete various projects. The project in which the information is being used could potentially be considered science depending on its nature, but VGI is a tool. In relation to the topic of privacy, VGI can either be enhanced or impeded depending on levels of privacy. in other words, if personal data is openly available for collection and use, certain tasks may be easier to complete as a result of readily available pools of knowledge. In contrast, if information is kept private, then certain tasks may lack some critical knowledge and result in inaccuracy or bias in final products.

I think the article does good job at framing VGI as a tool that facilitates transactions of knowledge and data to complete projects more efficiently than by individuals. I was skeptical of the utility/quality of the work completed but the article makes a good point that more users means more people to catch errors and mistakes throughout the process.

One particular concern I have is regarding potential failures to provide a comprehensive amount of information comparable to what could e collected on by local knowledge and expertise: Is everything doable through VGI or are there certain limitations to projects that need to be completed outside of VGI?

Dunn and Newton (1992)

Sunday, November 12th, 2017

This paper discusses two prominent forms of network analysis: Dijkstra’s shortest path analysis, and off-kilter network analysis.
Dijkstra’s algorithm presents a very simple form of network analysis by regarding the path from point A to point B through a series of nodes and arcs, which are accounted simply by length. Indeed, the authors make a point that original network analyses were affected by computer scientists, and did not account for the inherently geographical nature of transportation and movement. Namely, it does not account for directionality or geographical coordinates. Off-kilter analysis recognizes these issues by accounting for external factors, and partitioning flows of movement depending on the maximum allocated capacity of these certain roads. This needed situation for speed and efficiency is illustrated in disaster scenarios by the authors, where precarious roads and a mass of traffic need to be accounted for quickly and dealt with efficiently.
This was written in 1992, at the cusp of a widespread informatics revolution in the home market. As it stands right now, Dijkstra’s analysis is still highly relevant, I believe that it is used in Google Maps for directions, though I have the sense that off-kilter analysis has become a viable option for many people. With traffic collection data such as Waze collecting cell-phone information, paired with basic information about current infrastructure, it has become possible for GPS services to account for the dynamic changes in traffic data for users of infrastructure.
I can’t help but feel that this is nonetheless a still rudimentary network analysis, there are still many more factors that could potentially be quantifiable and added into the algorithm. What about greenhouse gas emissions, or the level of scenery? I wonder how easily those things could be accounted for. I am still wondering about the accountability of qualifiable data as playing a part into network analysis. Perhaps in the future our GPS’ could account for personal preferences and tailor their network analysis to the individual itself? This would raise questions over privacy, perhaps. Though with growing levels of information being tracked anyway, it’s almost something to be expected. I would be interested in knowing more about the evolution of network analysis, and I am looking forward to the presentation on Monday

Network Analysis (Curtin 2007)

Sunday, November 12th, 2017

I found this article quite interesting, in both its recap of traditional Network analyses (i.e. Djikstra’s formula) as well as how the network features of GIS are some of GIScience’s earliest and most popular uses. I find the point on how Graph Theory is ultimately the thought holding this immense function together. On this train of thought, I was very surprised to hear that a ‘non-topological’ network existed and is still used to some degree. How a network can be formed without information linking the network to other nodes makes no sense to me, and seems to defeat the point of creating a network.

I like how the author states that Network GIS is a sub-discipline of GIScience, and goes so far as to claim it’s the only one with linear referencing, which I assume since many GIS functions rely on network analysis, ultimately anything that uses a network incorporates this (making it seem not that out of the ordinary).

Lastly, I found the use of Network Analysis in multi-disciplinary fields like microbiology and neurology very interesting, and definitely would use this as an argument that network analysis is purely a tool. As a tool it’s extremely powerful in that it’s a simple to use and understand data structure which can use many algorithms for interesting analyses.

-MercatorGator

Thoughts on “Network Analysis in Geographic Information Science…” Curtis 2007

Sunday, November 12th, 2017

I came into this paper not knowing too much about network analysis, but having some general notion of it through its ubiquity in geographic and neuroscience literature (network distance, social networks, neural networks). I thought the paper did a good job of outlining the fundamentals of the field before progressing into geographic specificities and future challenges. I learned that the most The basis of describing networks is in their topological qualities; namely connectivity, adjacency, and incidence, which is what makes it applicable to such a diverse range of phenomena.

Curtis states that “In some cases these network structures can be classified into idealized network types (e.g., tree networks, hub-and-spoke networks, Manhattan networks.” Are idealized network types simplifications of the input data which are performed to fit a certain standardized model?

On page 104, Curtis mentions that “The choice of network data structure chosen can profoundly impact the analysis performed”, just like scale can influence whether or not clusters are observed at a certain resolution and the choice of some variables over others can influence classification algorithms in SDM. Again, we see that the products of any geographic modeling/ network analysis are not objective, but dependent on subjective choice which requires justification.

I assume that the “rapid rendering” discussed in reference to non-topographic data structures is because of  function  of quicker run time.Why are the data in non-topographic networks processed more quickly than in topographic ones? Is it because without having to assess  relationships between points, each point only has to be accounted for once without regard for its connectivity with other points?

It was interesting to note that one of the biggest challenges or paths forward for geographical network analysis was in applying existing algorithms from different fields to geographic data. Usually the challenges are in adapting current methods for new data types or resolving some gaps in domain knowledge, but this is a different kind of challenge probably born out of the substantial developments made in network analysis in different fields.

-FutureSpock

 

Thoughts on “Assuring the quality of…”Goodchild 2012

Sunday, November 12th, 2017

In discussing methods to assure the quality of VGI, Goodchild states that; “The degree to which such triage can be automated varies; in some cases it might be fully automatic, but in other cases it might require significant human intervention.” In VGI, the source of the data is human (as opposed to a scraping algorithm in SDM, for example), but the verification of data quality would definitely benefit from automation to deal with the large scale of geographic data that is produced everyday. He goes on to say that “Some degree of generalization is inevitable, of course, since it is impractical to check every item of data”, but by using the data analysis tools that have been developed to deal with large datasets, researchers can strive for a more complete assessment of accuracy.

To reintroduce the concept of positivism in GIS, Goodchild states that ” Our use of the terms truth and fact suggest an orientation towards VGI that is objective and replicable, and for which quality can be addressed using the language of accuracy. Thus our approach is less likely to be applicable for VGI that consists of opinion….or properties that are vaguely defined” This position seems to indicate that only quantitative or objectively measured geographic phenomena are capable of being tested for accuracy/uncertainty. I find this a flawed position because of the strong explanatory power of qualitative GIS and alternate ways of measuring attribute data. In suggesting it is not possible to apply the same rigorous standards of accuracy to these methods, the implication is that they are less scientific and worthy of merit. Even if this is not the intention, I would have appreciated some suggestions or potential methods by which to ascertain the accuracy of VGI when applied to qualitative GIS data.

The three definitions of crowd-sourcing provided by Goodchild describe its different applications, from “solving a problem”, to “catching errors made by an individual”, to “approaching a truth”. This progression appears traces the familiar role of GIS as a tool, tool-making, or science. It is interesting to note that the third definition does not converge onto a truth as observations approach infinity, but rather that after 13 contributors, there is no observable increase in accuracy for a position contributed to Open Street Map. This suggests that unlike a mathematical proof or principle which will always be proven true given the correct assumptions, the VGI phenomenon is messier and has to account for human factors like “tagging wars” born out of disagreement about geographic principles, or the level of “trust” which may discourage someone from correcting a contribution from a reputed contributor.

The social approach tries to minimize the human errors mentioned above by quantifying variables like “commitment” and “reliability” and allowing for social relations amongst contributors  to act as correction mechanisms.

-FutureSpock

 

 

 

 

Goodchild and Li (2012) – Quality VGI

Saturday, November 11th, 2017

Goodchild and Li (2012) outline crowd-sourcing, social and geographic approaches to quality assurance for volunteered geographic information (VGI). Representing an increasingly important resource for data acquisition, there is a need to create and interrogate the frameworks used to accept, query or reject instances of VGI on the basis of its accuracy, consistency and completeness.

The authors argue that VGI presents a distinct set of challenges and considerations from other types of volunteered information. For example, Linus’s Law—that in software development, “given enough eyeballs, all bugs are shallow”—may not apply as readily to geographic facts as it does to other types of information. Evaluators’ “eyes” scan highly geographic content selectively, with exposure of geographic facts varying from the very prominent to the very obscure.

To me, it is unclear why this disparity is unique to geographic information. The direct comparison between Wikimapia and Wikipedia may be inappropriate for contrasting geographic/ non-geographic volunteered information, since their user/ contributor bases differ so markedly. I might actually advance in the opposite case; that the fact that geographic information is all connected by locations on the surface of the earth makes it more ‘visible’ than, for instance, an obscure wikipedia page on an isolated topic.  

The authors call upon further research to be directed towards formalising and expanding geographic approaches to quality assurance. These approaches seek to verify VGI using external information about location and by applying geographic ‘laws’. In my opinion, this provides an interesting strategy that is relatively unique to geographic information. Through geolocation, any instance of VGI could be linked to other geospatial databases, and could potentially be accepted or flagged on the basis of their relationships to other nearby features or variables. Elements of this process could be automated through formalisation. This approach will of course come with its own set of challenges, such as potential feedbacks generated by multiple incorrect sources reaffirming inaccurate information.
-slumley

Curtin 2007: Network Analysis in GIS

Saturday, November 11th, 2017

Network analysis is very useful for showing relationships between objects/agents/people and does not require some of the more formal geographic foundations. The result is the formation and growth of informal and natural linkages to create complex systems which can model how things are connected to each other. It essentially provides an alternative to geographic datum for locating points in space through their relationships to other points. A good example are social media networks: the connections that individuals make online forms a global network of information about people and their relationships with each other.

An interesting topic highlighted in this article is the contrast between topological and non-topological data models. This distinction is interesting for me as a geography student since it seems ridiculous to exclude topology when thinking about networks. the paper makes a similar statement by explaining how these models were effectively useless as the are simply points and lines with no substantial information available for analysis. I would have appreciated a bit more explanation fro non-topological data models such as an example of how it may be used and why that might be advantageous over topological models in some uses.

The article makes one particularly large claim: Network GIS is the only sub-discipline to have redefined the spatial reference system on which locations are specified. Im not going to agree or disagree with this statement but I think the paper could’ve done a better job at supporting this argument and contrasting against potential sub-disciplines.

Thoughts on Goodchild (2012)

Saturday, November 11th, 2017

Goodchild does a thorough job assessing the benefits and hindrances of his three methods for quality assurance of VGI. His first two, the crowd-sourcing approach and the social approach, he evaluates in comparison to Wikipedia contribution. Goodchild failed to specify a few important details of the social approach. Ideally Wikipedia contributions are made by users who have specific knowledge of a subject. User profiles on Wikipedia list a user’s contributions/edits, as well as an optional description of the user’s background and interests (and accolades if they are a frequent or well-regarded contributor). An OSM user profile could similarly denote their [physical] area of expertise, and also register regions where the user has made the most contributions/edits, giving them more “credibility” for other related contributions.

An important aspect that Goodchild failed to mention regarding the crowd-sourcing approach is the barrier to editing OSM features. While Linus’ Law can certainly apply for geographic data, someone who sees an error in OSM would need to be a registered and knowledgeable user to fix the error. In Wikipedia, an “Edit” button is constantly visible and one need not register to make an edit. Legitimate Wikipedia contributions must also be accompanied by a citation of an outside source, an important facet that geographic information often lacks.

The geographic approach to VGI quality assurance requires a set of “rules.” Goodchild is concerned with the ability of these rules to distinguish between a real and imagined landscape, giving an example based on the characteristics of physical features such as coastlines, river systems, and settlement location. Satellite imagery has provided the basis of much of OSM’s physical geographic features. Quality assurance is more often concerned with the name and location of man-made features. A set of rules for man-made features could be more easily determined through a large-scale analysis of similarly tagged features and their relationship to their surroundings. I.e. a restaurant located in a park away from a street might be flagged as “suspicious” since its surroundings do not match the surroundings of other “restaurant” features.

Volunteered Geographic Information and Crowdsourcing Disaster Relief: A Case Study of the Haitian Earthquake, Zook et al. (2010)

Saturday, November 11th, 2017

This article by Zook et al. (2010) talks about VGI specifically in the context of the 2010 Earthquake in Haiti, but more broadly discusses many of the issues presented in Goodchild and Li (2012) regarding the accuracy and validity VGI. I think Zook et al. (2010) do a good job of considering many aspects of VGI, including issues in data licensing and compatibility, as well as the exclusive nature of VGI which is mostly restricted to people with the technical skills to participate in many cases, and the fact that “there will always be people and communities that are left off the map” (29). While reading that line I wondered, even though VGI is not necessarily accurate, and even though some people will be completely excluded from the VGI for a myriad of reasons (no access to internet or mobile platforms, illiteracy, distance from centres of help, etc…) is it not worth trying? There is a level of error and inaccuracy in any projected geographic information, but that does not stop us from using GISystems.

Moreover, while reading this I thought back to the Johnson, Ricker and Harrison (2017) article I shared with the class, where many of the same issues in accuracy, licensing and intention are presented. I wondered if, despite these unresolved issues, UAVs do not present an opportunity to collect objective, real-time data in instances of disaster mitigation and relief? Because UAVs were used in recent instances of disaster relief, I wonder how the discussion has shifted to include some of the particular issues that arise from their use.

Network analysis in GIS (Curtin, 2007)

Friday, November 10th, 2017

I found it very interesting how Curtin (2007) points out that network analysis is the only subfield of GISciences that has redefined a spatial reference system. Linear referencing, or using the network itself as a reference, is so intuitive that I had never thought of it as an alternative method of spatial referencing. I realize that standardized spatial referencing is something that I take for granted and alternative methods may be an interesting direction for future research.

This statement can be readily debated, but in my mind, network analysis is a field within GISciences that perhaps has the most tangible impact on our daily lives, and can be applied to the most diverse types of phenomena. The authors highlight routing as one of the most fundamental operations in network analysis, and I couldn’t imagine our society functioning without it. Routing is particularly relevant in urban areas where efficient movement from point A to point B across complex road systems is essential for the transportation of people and goods.

Shortest path routing may be the most basic implementation, but I am curious to understand how other factors can be incorporated into routing algorithms to enhance efficiency. The authors indicate that “many parameters can be set in order to define more complex versions of shortest path problems”. In urban areas, for example, how are factors such as traffic, road speed limits, and road condition integrated to provide better routing options?

In reading this article, I was reminded of a previous article that we read on spatial social networks (Radil et al., 2009). Both of these articles highlight the interesting role of space in network analysis. Networks are fundamentally spatial due to their graphical basis, but they can also be used to represent explicitly spatial geographic networks.

Curtin (2007)

Friday, November 10th, 2017

As with many of the topics covered in class, though I have used network analysis, I never read much background on the subject, because I mostly used it as a tool in various GISystems applications. For instance, I had not ever thought of the origin of the shape file, or some of the positive/negative attributes beyond the fact that I use shape files for some things, and not for others. Once again, this shows the shortcomings of using GIS strictly as a tool, and some of the important background and concepts that are lost when used in this way.

One thing that particularly stood out in this article by Curtin (2007) was the discussion of the Travelling Salesman Problem (TSP), and how solutions are heuristic, and the problem/ abstraction from “true” solutions not properly or completely understood. To me, this links back to the what I feel I am getting out of this course, which is a deeper understanding of the background, importance, and shortcomings of various GIScience concepts which is truly lacking in other GIS courses I have taken. As Curtin (2007) mentions, network analysis is now mostly used in route mapping like MapQuest (once upon a time) and Google Maps, without most people having any background knowledge on how those routes are computed or the algorithms used. This is something that the author touches on briefly, but doesn’t explore fully, and something I feel is very important in the broadening use of GIScience in everyday life.

On Goodchild & Li (2012) and Validation in VGI

Thursday, November 9th, 2017

I thought that this article “Assuring the quality of volunteered geographic information” was super interesting. Encompassing the evolution of “uncertainty” in GIScience was interesting, and a welcome addition as a segue into the three approaches into quality assurance (crowd-sourcing, social, and geographic).

Exploring the social approach further, it stipulates that there will always be a hierarchy, even within a seemingly broad/open structure. Goodchild & Li discussed briefly that there is often a small amount of users who input information and a smaller amount of people who verify that information, in addition to the large number of constant users.

For future additions to OSM or other crowd-sourced sites, it would be super interesting to show who’s actually editing/adding, and make that info easily available and present on the screen. Currently in OSM, one can see usernames of the most recent editors of an area, and with some more digging, one can find out all the users that have edited in an area, and with even more digging, one can look at these editors’ bios or frequently mapped places and try to piece together info about them that way. I guess it would be more of a question of privacy (especially in areas where open data isn’t really encouraged, or where there aren’t a lot of editors other than bots, or both), but hopefully this sort of post-positivist change comes. I recently learned that most of OSM’s most active users & validators (worldwide) are white North American males between the ages of 18 and 40, which unfortunately is not unbelievable, and begs further questions about what information is being mapped and what’s being left out. Some info isn’t mapped as the mappers are not interested in this information (for example, what a 25 year old guy would want to see on a map may not even overlap with what a 65 year old woman would want to see on a map. This gets even more tangled when also considering gender, geographic, or ethnic/”race” dimensions). Showing this information, or at least making it less difficult to find or access without lots of time and adequate sleuthing skills, might compel layman users to be more interested in where exactly their information is coming from.

Thoughts on assuring the quality of VGI (Goodchild and Li, 2012)

Thursday, November 9th, 2017

I think that the most important thing to note from Goodchild and Li’s article on assuring the quality of VGI is that his proposed approaches are only applicable to VGI that is “objective and replicable.” This is to say that he is discussing VGI which attempts to capture the truth of a particular geographic phenomena (such as contributions to OpenStreetMap), rather than VGI which references an individual’s particular experience in geographic space (such as a volunteered review of a tourist location). I don’t intend for this post to devolve into a discussion on the nature of scientific “truth” and “fact”, but it is definitely interesting to think about the extent to which any type of VGI (and any type of geographic fact, I suppose) can truly be objective. All volunteered information is subject to the bias of its contributor.

I would have liked for this article to also address the challenges in defining “accuracy” for VGI that is purely objective, rather than fact-based. When we are talking about things like a restaurant review on Yelp or a woman reporting the location of an incidence of sexual assault, what does “accuracy” mean? A restaurant review might be inaccurate in the sense that it could be fabricated by a reviewer who never actually went there, but this is nearly impossible to identify. Perhaps it is the intent of the contributor that is the most important in examples like this (ie. does the reviewer have malicious intent against the particular restaurant?), but underlying intent is still incredibly opaque. Perhaps this is a topic for further class discussion…

On Ester et al (1997)’s Spatial Data Mining in Databases

Sunday, November 5th, 2017

In their article “Spatial Data Mining: A Database Approach” (1997), Ester et al outlined the possibility of knowledge discovery in databases (KDD) using spatial databases, utilizing four algorithms (spatial association, clustering, trends, and classification). Unfortunately, the algorithms are not entirely connected to how one mines spatial information from databases, and the algorithms introduced don’t seem incredibly groundbreaking 20 years later. This paper seemed very dated, particularly because I feel like most of these algorithms are now tools in ESRI’s ArcGIS and the frameworks behind GeoDa, and because the processing issues that seemed to plague the researchers in the late 1990s are not issues (on the same scale) today.

Also, I found it strange that the paper adopted an incredibly positivist approach, and did not mention anything about how these tools could be applied in real life. They acknowledged this as a point of further research in the conclusion, but weighted it less heavily than the importance of speeding up processing times in ‘90s computing. In their introduction, the authors discuss their rationale for using nodes, edges, and quantifying relationships using Central Place Theory (CPT). However, they do not mention that CPT/theorizing the world as nodes & edges is an incredibly detached idea that 1) cannot describe all places, 2) does not realize that human behaviour is inherently messy and not easily predictable by mathematical models, and 3) only identifies trends and cannot be used to actually explain things, just to identify deviances from the mathematical model. Therefore, not everything can be identified by a relationship that a researcher specifies to scrape data using an inherently flawed model, and therefore there will be inaccuracies. It will be interesting to learn if/how spatial data miners have adapted to this and (hopefully) humanized these processes since 1997.

Data base approach to spatial data mining (Ester and al.)

Sunday, November 5th, 2017

Spatial data mining consists of the use of database information and manipulation through algorithms to process spatial information as effectively as possible. it is able to use available information to infer other pieces of information through dependancy between variables. Thus, it can relate to aspects of spatial privacy by using personal information voluntarily provided to determine additional information about people that might otherwise not be divulged (or areas for the use of this paper).

To be upfront, spatial data mining is a topic that I was rather intimidated to look into. Since I have a basic understanding of computer science and was confused by the majority of the more technical information presented in the paper. However, I thought the paper did a good job at conveying how the concepts are used and why they are applied; I understood the logic behind the algorithms and how information is mined. effectively, i believe that the paper caters to a wide audience thanks to its combination of technical and conceptual information.

The article explicitly covers the basics of spatial data mining; basic operations and concepts used in the area of study. This raises the question “what are the complex and advances methods of spatial data mining?”. If this paper was written in 1997, the field has probably made considerable advances since and what new methods might be on the horizon. However for the purpose of this article, the basics were very well introduced to provide a range of readers to learn about the field of spatial datelining though knowledge discoveries in databases.

Shekhar et al – Spatial Data Mining

Sunday, November 5th, 2017

This paper presented the primary tools in which to affect data mining on a set of data. The tangible results found as a result of data mining were not new to me, I believe it is something that many budding GI scientists engage with at the beginning of their education. I remember engaging with learning and training data from other classes, typically in the form of geolocating.
I found that the hidden data sets emerging from these analyses poses a very interesting insight into our epistemology of data sets. With learning and training data it seems that we’re engaging with a very basic form of machine learning. I am intrigued by the opportunities this faces with a more open form of data. I can imagine that with more open data sources, the machine learning aspects could learn from other data sets and gain more insight within hidden data. I wonder if our treatment of data and rights will come into discussion in the future. I’d be interested in know in what forums these conversations are taking place.
As a whole, all of these techniques seem to provide a very valuable tool. To extrapolate meaning from disparate forms of data, such as by clustering, determining outliers and figuring out co-locational rules can be an extremely insightful tool for a lot of disciplines in the social and physical sciences. Taking a rudimentarily psychological lens, I find it interesting how much of these techniques assume a behaviouralist understanding of spatial processes, in which they interact in rational ways with each other as part of a greater whole. The fact that they take interest in outliers seems to factor in the irrationality of some processes. I would also be interested in knowing where the research on that is headed.

Spatial Data Mining (Shekhar et al)

Saturday, November 4th, 2017

I found this paper particularly tough to get into, as Spatial Data Mining veers more towards a tool used in G.I.S. than any of the topics we have covered thus far in my opinion. Although the tweaking of methods like SAR and MRF models to meet the issues regular data mining ran into (i.e. ignoring spatial auto-correlation, and inferring spatial heterogeneity) is a sign of tool building, I still find this topic in GIScience to be very technical and definitely in the tool realm of G.I.S. Furthermore, many of the clustering techniques mentioned (i.e. K-means) have been around for years now, and have been accepted as the standard in most regular G.I.S. projects, making me ask the question “what makes spatial data mining so special?”. Is it simply the size of the data being mined, and the unsupervised aspect of it? As this paper cites papers from 1999 & 2000 on spatial data mining’s ability to work with large amounts of data back then, I wonder how well spatial data mining works with big data, and how the validation process and statistical analysis of this would work today.

Although this paper focuses on the uses of spatial data mining and the raster dataset, I wondered that if this technique were used to go over vector data possibly including personal information (i.e. age or phone number) and tied this to space to look for ‘hidden patterns’, this would definitely be a violation of privacy.

All in all, although this field seems quite complex, it also seems very simple in that it embodies all of the basic algorithms used in traditional GIS projects, though on a larger scale.

-MercatorGator

 

Thoughts on Shakhar et al. (2003)

Saturday, November 4th, 2017

Shekhar et al. (2003) outline various techniques in spatial data mining which can be used to extract patterns from spatial datasets. In discussing techniques for modeling spatial dependency, detecting spatial outliers, identifying spatial colocation, and determining spatial clustering, Shakhar et al. effectively demonstrate the relevant challenges and considerations when working with a spatial dataset.  Due to factors such as spatial dependency, and spatial heterogeneity, “general purpose” data mining techniques will perform poorly on spatial datasets and new algorithms must be considered (Shekhar et al., 2003).

Shekhar et al. define a spatial outlier as a “spatially referenced object whose non-spatial attribute values differ significantly from those of other spatially referenced objects in its spatial neighbourhood” (p 8). I have not previously been exposed to research on spatial outliers, but I was surprised to read such a definition in which an outlier is determined by its non-spatial attribute. I am left wondering if it is possible to invert Shekhar’s definition and define spatial outliers in terms of differences in spatial attribute values among objects with consistent non-spatial attribute values. For example, when talking about the locations of bird nests, could we define a spatial outlier as a nest which is significantly far from a cluster of other nests?

As this article was broadly speaking about knowledge discovery from spatial datasets, I was reminded of last week’s lecture on geovisualization. While the objective approach of spatial data mining contrasts the exploratory geovisualization process, I am curious how the two approaches can effectively be combined to drive a more holistic process of knowledge discovery from spatial data.

Spatial Data Mining – Ester, Kriegel, Sander (1997)

Friday, November 3rd, 2017

Tobler’s Law of Geography is central to spatial data mining. The purpose of knowledge discovery in databases is to identify clusters of similar attributes and find links with the distribution of other attributes in the same areas. Using decision tree algorithms, spatial data systems and their associated neighborhood graphs can be classified, and rules can be concluded from the results. The four generic tasks introduced in the beginning of the article are not addressed later on. Identifying deviation from an expected pattern is presented as central to KDD as well, but an algorithm for this doesn’t appear to be discussed.

The article remains strictly concentrated on the implications of KDD algorithms on spatial database systems and computer systems. Little relation is made to non-spatial database systems, even though many of the algorithms presented are based on non-spatial decision-tree algorithms.

I’m sure that patterns can be detected in human attributes of nodes in a social network. Since distance along an edge is so crucial to spatial classification, do non-physical edges quantified in other ways perform similarly in the creation of human “neighborhoods”? When patterns are deviated from, can conclusions be drawn as easily about social networks?

“Neighborhood indices” are important sources of knowledge that can drastically reduce the time of a database query. Creating spatial indices requires some knowledge of a spatial hierarchy. Spatial hierarchies are clear-cut in political representations of geography. As pointed out in the article, often the influence of centers (i.e. cities) is not restricted to political demarcations. These algorithmically created neighborhood indices may present interesting results to urban planners and geographers, who often have difficulty delineating the extent of influence of cities. beyond their municipal borders.