Thoughts on “Approaches to Uncertainty in Spatial Data”

November 3rd, 2019

This chapter deals with a number of aspects of uncertainty in spatial data. The one that caught my eye in particular is definition: how well defined or not well defined a geographical object is. Well defined objects tend to be human made (like census tracts), while poorly defined objects tend to be natural (like a patch of woodland). This raised a question for me that this chapter does not address: how should or can someone deal with objects that may be overlapping/related if one is well defined and the other isn’t? For example, would performing an intersect on two such object be appropriate, considering the gap in the quality of definition? Does a large difference in definition make two objects incomparable? Maybe not, and that is why the paper does not address this particular issue. However, I would say there could be issues in data incompatability between a well defined and poorly defined object. For example, if there is a well-defined census tract overlaid on a poorly defined patch of woods, how well could the intersect between the two be defined? This perhaps feeds into other issues of uncertainty mentioned in the chapter, like vagueness and error. But fundamentally, I would say that such a notable difference in definition would make these object incompatible. Perhaps one data type could be converted, for example the wood patch could be converted and given “hard” borders under the assumption that these are clearly defined, even if they aren’t. Even so, however, this overlooks the central properties of the object and may not bridge the gap between the level of definition in each object.

Spatial Data Uncertainty (Devillers & Jeansoulin, 2006)

November 3rd, 2019

This chapter provides basic concepts on quality, definitions, sources of the problem with quality, and the distinction between “internal quality” and “external quality”.

The question about crowdsourced geospatial data quality is the first one to come up. When it comes to crowdsourced geographic data, it is very common to hear suggestions that the data is not good enough and that contributors cannot collect data at a good quality, because unlike trained researchers, they don’t have enough experience and expertise of geospatial data. Therefore, we should pay particular attention to the issues stemming from the quality of crowdsourced geospatial data. Also, note that any crowdsourced data is biased in on or more ways. Contributors can have different aspects and levels of quality of judgment and decision making. Their decisions and preferences could significantly influence their data. I am curious about how to identify and estimate biases in crowdsourced data?

Furthermore, the authors mention that users can evaluate external quality based on internal quality. However, nowadays, geographical resources (both data and applications) are mostly accessible via web services. Data producers do not always provide internal quality of data. In this situation, how users evaluate the external quality of resources? Last, while the internal and external quality measures are applied to measure the quality of data which is factual in nature, how to assess the quality of information aiming at opinions or vague concepts?

Spatial data quality: Concepts

November 3rd, 2019

This chapter in the book “Fundamentals of Spatial Data Quality” gives a shot on basic concepts in spatial data quality by pointing out that the divergences between the reality and the representation are what the spatial data quality issues often deal with. And there are several aspects of where the errors would happen during the data production process such as the data manipulation process and the human involved data creation process. Moreover, spatial data quality is summarized to be assessed from internal and external aspects. This chapter explains well what the data quality is and what errors could be and is very easy to understand.

It is interesting that the introduction starts with a quote, “All models are wrong, but some are useful”. However, does it mean all spatial data or data created could be interpolated as the product of model or filter? Authors argue that the representation of reality may not be fully detailed and accurate but partially useful. But how to determine whether the data with those uncertainty or errors should be accepted is a much more urgent problem. Also, as the topic is “spatial data uncertainty” and spatial data quality issues discussed in the chapter, does the uncertainty exactly mean different sources of error assessed in spatial data quality?

The chapter defines the internal quality as level of similarity between data produced and perfect data while external quality means level of concordance between data product and user needs. My thought is if user participate in the data producing process (which is about internal quality), will the external quality be efficiently and effectively improved? Can we just replace “as requested by the manager” with “what user wanted” in Figure 2.4 and there should be no external quality worries?

Thoughts on ” Scaling Behavior of Human Mobility Distributions”

November 3rd, 2019

This paper presented an empirical study of how temporal and spatial scale impacts the distribution of mobility data. The main finding is not surprising – a different spatial and temporal scale of analysis leads to a different distribution of data. Once again we saw the importance of scale in the analysis of the spatial datasets.

What interests me are finding 3 and 5. Finding 3 states that ordering between metrics over datasets is generally preserved under resampling, which implicates that the comparison across the datasets can be made regardless of the spatial and temporal resolution. This reminds me of the reading of spatial data quality. Though it is critical about the effects of scale, it is also important to bear in mind about the “use”. In the case of comparing human mobility across different datasets, the scale does not seem to matter anymore.

Find 5 concludes that the sensitivity to resampling can itself be a metric. I think this is a good point but I was having some difficulties to grasp what the authors want to express in the subsequent argument of “difference in sensitivity indicates that information about population mobility is encoded in the scaling behavior”. I think they could have explained this better. To my understanding, the difference in sensitivity to resampling is nothing more than the difference in the heterogeneity of the datasets.

Another point I want to make is that although the analysis is performed on mobility datasets, it seems to me that the most conclusions they made can be generalized to all kinds of datasets. I’m not sure what is special about the mobility data here in their analysis.

Thoughts on “Miller et. al – Towards an integrated science of movement”

November 3rd, 2019

“Towards an integrated science of movement” by Miller et. al lays out the advances that have been made in the understanding of mobility and movement as a whole given the growth of location-aware technologies, which have provided much more accessible data acquisition. They are interested in synergizing the components of animal movement ecology and human mobility science to promote a science of movement.

In regards to mobile entities that are defined as “individually identifiable things that can change their location frequently with respect to time”, are there specific definitions that clearly define what “frequently in time” means? Examples have been made with birds or humans, but would trees or continental masses be considered mobiles entities as well?

It would be interesting to assess the impact of tracking location on the observations, in other words if tracking can affect the decisions made by whoever or whatever is being tracked. For example, a human who knows they are being tracked might change their trajectory solely based on the fact they do not want to potentially compromise sensitive areas or locations they visit, while an animal could behave differently if the technology used to track its movement make it more visible to predators. There is an ethical dilemma in tracking a human being without their consent, but it must be acknowledged that tracking does come with some consequences in terms of results differing from reality.

Reflecting on “Scaling Behavior of Human Mobility Distributions”

November 3rd, 2019

Analyzing big data is an obstacle across GIS, and movement is no exception. Cutting out potentially unnecessary components of the data in order to reduce the dataset  is one way of addressing this challenge. In Paul et al.’s piece they look at how much cutting down on datasets’ time windows may affect the end distribution.

Specifically, they examine the effects of changing the spatio-temporal scale of five different movement datasets, revealing which metrics are best to compare human relationships to movement across datasets. The findings of the study, which examines GPS data from undergraduate students, graduate students, schoolchildren, and working people, reveal that changing temporal sampling periods does affect the distributions across datasets, but the extent of this change is reliant on the dataset.

After reading this piece, I would like to understand more about how researchers studying movement address privacy. I’m sure having enormous datasets of anonymized data addresses part of this issue; however, I’m sure different government agencies, organizations, corporations, etc. collecting this data have different standards regarding the importance of privacy. How strictly enforced are data privacy laws (looking at movement data specifically)? 

Thoughts on “Fisher et. al – Approaches to Uncertainty in Spatial Data”

November 3rd, 2019

This article by Fisher et. Al clearly lays out the components and concepts that are part of spatial data uncertainty and explain solutions that have been proposed to counteract their potential consequences on data analysis and interpretation. A better understanding of what uncertainty really is helped me realize that an overwhelming majority of geographical concepts are poorly defined objects, either being vague or ambiguous.

One solution for reducing the effects of discord ambiguity, although maybe not realistic but very practical, would be to create a global lexicon that stipulates how certain statistics need to be calculated and defines concepts on a global scale. This would allow for easier comparisons between regions currently using different approaches and would uniformize the process. However, it is important to note that this could not be applied to every statistical measurement, definition or observations made given the fact there could be biases against certain regions. An example could be that a road is conceptualized differently in one part of the world when compared to another.

On the topic of data quality, the advent of geolocational technologies has propelled geospatial data to the forefront of organizations and businesses aiming to profit from their use. Without trying to be too cynical, wouldn’t private organizations have an incentive to manipulate the data quality at the detriment of others in order to benefit themselves? This is where Volunteered Geographic Information (VGI), an example being OpenStreetMap, comes into play as to balance the playing field, in this case being Google Maps.

Thoughts on “Spatial Data Quality”

November 3rd, 2019

The authors did a good job summarizing the concepts related to spatial data quality in terms of the definitions and the types and sources of error. Although I do not completely agree with the starting statement of “geospatial data are a model of reality”, I do agree that all geospatial data are imprecise, inaccurate, out of data, and incomplete” at different levels. The question for researchers is that to what degree such impreciseness, inaccuracy, outdatedness, and incompleteness should be either accepted or rejected, and how do we assess the data quality. The authors presented the concepts of internal and external quality, where the internal quality refers to the similarity between the data produced and the perfect data should have been produced, and the external quality refers to the “fitness for use” or “fitness for purpose”. I would argue that external quality should be the metric to look at. However, as the authors stated, there is very little evaluation method for external quality. I think this is because of the “non-absoluteness” and “relativeness” properties of the external quality. It seems to be that a case-by-case assessment approach is needed depending on what the “use” is. I’m curious to know if there is a generalized way of doing this. Moreover, with geospatial data coming from different sources such as VGI, crowdsourcing, sensors, etc., the uncertainties are intensified, whereas they provide more opportunities “for use”. I think coming up with ways to assess the external quality is of vital importance.

Thoughts on “Spatial Data Quality: Concepts”

November 2nd, 2019

This chapter begins with the quote “All models are wrong but some are useful”, which I believe sums up the article fairly succinctly, as it addresses the constant imprecise, inaccurate, incomplete, and outdated nature of GIS data. This reminds me of when we discussed non-ideal data from last week’s Openshaw (1992) paper; however, this piece explains it in much more detail than Openshaw, relating it back to external and internal data quality differences and data representation challenges. 

Since the rise in popularity of the internet and user-generated content, there is a lot more concern towards accessing data quality and accuracy. I have been conducting a bit of research on VGI, as that is my research topic, and data accountability and accuracy are huge concerns in that field. Much like differing definitions of quality given here, there is no one correct way to access accuracy. It is all reliant on the type of data being extracted and researched, and the motives for collecting such data. For instance, if a project was collecting user-generated data concerning users’ perceptions of a place, then accuracy does not matter, whereas in OpenStreetMap, for example, there is a team of moderators carefully watching and reviewing users’ inputs, as accuracy is a top priority. Thus, I think the motives for the research, specifically whether the researcher is looking for more accurate data, more precise data, or both, is a very important component to address when examining spatial data quality. 

This topic also reminds me of when we discussed open government data and how there is often not consistent data throughout each department, i.e. the formatting of the data, the original scale of the data, etc. does not usually match across departments, thus challenging the quality of the end result. I worked on a GIS project last semester analyzing water quality levels and ran into quite a few hiccups when I realized there were many months and years missing from the data sets I was trying to analyze. In hindsight, I should have examined the spatial data quality of the data I was planning to use more before starting my research.

Overall, I think this chapter does a good job of explaining the complexity of spatial data quality and the errors inherent to geospatial research.

Thoughts on Research Challenges in Geovisualization

October 28th, 2019

This article gives us a detailed introduction about Geovisualization. The author started by giving out reasons about whys should we care about Geovisualization. In short, I think the explanation would be that people can get knowledge from it by transforming the geospatial referenced data into information and turn the information into knowledge by analysing it. One example would be the switching from data to paper maps, and then paper maps to web-based maps. The visualization is advancing overtime, so researcher can get more out of a geospatial dataset. How much can we get from the dataset are largely depends on the visualizing techniques.

Then, the author introduced some issues that still remains in the geovisualization field. These problems are representation, visualization-computation integration, interface design, and cognition-usability. One issue I noticed about representation is that in order to take full advantage of the information we want to give out in the geospatial dataset, we want to personalized representation as much as possible. One example I can think of is about Google Map. When I’m trying to find certain stores in a big shopping mall, I always find it hard to locate a certain floor and the direction since sometimes there is no detailed information about this represented on the map. However, I find some other map application gives about 3D navigations in shopping mall so user will find it very easy to locate a certain store. Obviously, the latter application gets more out of the mall database, and make the navigation process more personalized.

Thoughts on GeoAI (Openshaw,1992)

October 28th, 2019

This article essentially introduced the emergency of GeoAI. The author gives out some detailed reasons about why we should use GeoAI, and he also briefly reviews the expert systems approaches, the use of heuristic search procedures, and the utility of neurocomputing based tools. At last, he predicted the future trends of GeoAI as a emerging new technology.

GeoAI is actually a new topic to me since I have never done a project using this technique. I think it would be very useful and convenient when we are facing a huge dataset and trying to analyse or model it. As far as I understand it, people basically just transfer their thoughts to the computer and let the computer to decide and calculate result. Then, there would be a point where GeoAI will be connected to spatial data uncertainty. Is it possible to train the data and let it decide the level of uncertainty in a dataset? Or it there any way to eliminate of reduce some uncertainty in a dataset?

Another aspect to think about the uncertainty problem in GeoAI would be the supervision of human. What I get from the article is that people can supervise the computer when they are doing analytic works using the algorithms researchers put in. Would this supervision process bring more uncertainty into the dataset, or it will help to reduce the error? These are thoughts that come to me when I’m reading the article.

Suggestions concerning development of AI in GIS

October 28th, 2019

This paper written by Stan Openshaw in 1992 introduces concept of artificial intelligence application in GIS to us by explain how AI emerged and being applied in geographic information system development and why AI is inevitably needed and matters a lot in spatial modelling and analysis. AI does bring a lot to GIS development concerning large spatial database management, spatial data pattern recognition and modifying spatial statistics analysis. Neurocomputing, as a revolution of the century, makes it possible for large data sets analysis and modelling, both supervised classification and unsupervised classification eliminate the difficulties and uncertainty of manual analysis and computing for pattern studies. And AI is definitely unavoidable to be referred to when applying spatial data mining to study large spatial data sets. The paper has a clear structure and explain well the complicated concept of AI application in GIS with a strong background of how and why AI should be used in GIS though I do not fully understand the specific method like expert system and ANN.

As we all known, spatial data in GIS is quite different from general type of data with characteristics of spatial dependencies, space- time dependencies, non-stationarities, etc. and AI technologies give more chance to deal with those complex properties. However, I am wondering if these characteristics in spatial data sets and special way in treating them using AI help develop Ai technologies itself (method structure, algorithm development). Will GIS bring opportunities and development for AI? What GIS have brought for AI?

Research Challenges in Geo-visualization

October 28th, 2019

This article gives us an overview of the importance of research on Geo-visualization topic and discusses some major themes for Geo-visualization and related issues, raising up the main discussion about current challenges emerged in Geo-visualization. Moreover, the authors summarize research challenges and problems proposed crosscutting and end with recommended action for these emerging challenges. Generally speaking, the paper goes through most of research challenges for Geo-visualization from various aspects and lists them one by one for each theme, but it seems not so sensible for me for that though problems have been discussed from different themes and crosscutting view with clear lists, paper structure still confuses me a little. Many terms like visualization method should be developed for better data mining technologies and new tools & methods should be improved with increasingly high representation technologies are not well explained clearly. Some points of view are overlaid when illustrating those challenges. Why representation, integration of computing and interfacing, Interface and usability are the four major theme and what makes them distinctive and related with each other are not well explained. Challenges referred in this paper about geo visualization are not just limited to the visualization technologies, and these could mostly be challenges faced for many concepts in GIS, discussing about data format problems, data amount problems, AI application issues, human-centered, etc. what are issues should also be discussed and think about for term like spatial statistics analysis methods development. I am wondering how to balance the information accuracy (value) and the interface friendliness. Also, Is the geo visualization always the final steps for data analysis, making results more understandable for further use? Will geo visualization technologies be more important dealing with data and information itself or just focusing on results to be better represented?

Research Challenges in Geovisualization (MacEachren & Kraak, 2001)

October 27th, 2019

In this paper, Maceachren and Kraak (2001) concluded the research challenges in geovisualization. The first thing that catches my eye is the cartograph cube, which defines visualization in terms of map use. The authors argue that visualization is not the same thing as cartography. Visualization, same as communication is not just about making maps, but is also using them.

While the authors highlight the importance of scale issue, integrating heterogeneous data also present a challenge for geovisualization, because of the different categorization schemes and complex semantics that are applied in data creation. Similar conditions or entities are often represented with different attributes or measured with varying measurement systems. Therefore, the heterogeneity raises questions when we use data from different data producers: How to assess heterogeneity? How to make decisions about whether data may be combined? How to integrate multiple data sets if the same semantics are used differently?

Further, the emergence of geospatial big data, such as millions of conversations via location-enabled social media, stretches the limits of what and how we map. The potential of using geospatial big data as a data source for geovisualization requires developing appropriate methodologies. While this paper mainly discusses geovisualization of quantitative data. I am also curious about how to visualize qualitative spatial data.

Thoughts on “Koua et. al – Evaluating the usability of visualization methods in an exploratory geovisualization environment”

October 27th, 2019

This article by Koua et. al articulates that the choices made and the techniques used when designing a geovisualization are crucial to convey all the necessary information to the interpreter. Based on certain objectives, certain visualizations were more effective at conveying the necessary information and were more usable compared to others, something that was tested with scientists in the field.

An interesting addition to the research would have been to test the geovisualizations with non-scientists given the fact they are becoming increasingly present in interactive newspaper articles online and on websites in general: what is easily conveyed to scientists may not be as easy to a general public. This research reinforced the notion that these visualizations are only used by professionals in the field, which is no longer the case. In an era where misinformation is rampant on social media and online, understanding how certain geovisualizations are interpreted by the general public could certainly help in designing more intuitive geovisualization techniques.

Technological advancements in the coming years will potentially open the door for new visualization techniques, which, for example, could make use of augmented reality and other emerging technologies. This could make it easier to visually represent certain situations and aid in the transfer of information.

Thoughts on Vopham et.al “Emerging trends in geospatial artificial intelligence (geoAI)”

October 27th, 2019

The article by Vopham et. al Emerging trends in geospatial artificial intelligence (geoAI) Potential applications for environmental epidemiology provides us with a general understanding of what geoAI is and how it is utilized.

The interdisciplinary nature of geoAI is highlighted not only by the scientific fields that develop and utilize geoAI, but also by the wide spectrum of applications “to address real-world problems” it has. These vary from predictive modeling of traffic to environmental exposure modeling. Focus on machine learning, data mining, big data and volunteered geographic information has helped the expansion of geoAI. The main topic of this paper, however, is how this scientific discipline can be applied to the advancement of environmental epidemiology.

I find the future possibilities and applications of geoAI particularly exciting. As explained in the article, the progress in geoAI that has allowed for more accurate, high-resolution data which has the potential to revolutionize the use of remote sensing.  As with most of the evolving GIScience technologies we have yet to uncover their full potential and applications.

Thoughts on Koua et.al “Evaluating the usability of visualization methods in an exploratory geovisualization environment”

October 27th, 2019

The article Evaluating the usability of visualization methods in an exploratory geovisualization environment by Koua et al. report on their findings regarding visualization methods and geovisualization. The study aimed to evaluate how the use of different visualization tools impacted the usability and understanding of geospatial data.

I found it quite interesting to see the results of the study, out of six different ways of visualizing the same data, the map was found to be the better tool for tasks such as locating, ranking and distinguishing attributes. On the other hand, the self-organizing map (SOM) component plane was better for the visual analysis of relationships and patterns in the data. This brings a question to mind about the type of users interacting with the product.

In the study, the participants were made up of 20 different individuals with a background in GIS and data analysis. This means that they had experience with GIS tools and their own preference of tools for analysis – they knew what to expect and (generally) how to use the tools. I wonder how the results would change if the participants of the study varied more in their knowledge background of GIS. How would someone with no particular experience with GIS tools interact and understand that same data? I find this particularly interesting because when creating a Geoweb product for public use that supports analysis, the user interaction and understanding of the product is crucial.

Reflection on “Research Challenges in Geovisualization”

October 27th, 2019

 

This piece gives a very thorough background on geovisualization and its problems, especially its problems across disciplines.

A part of the piece that caught my attention was when MacEachren and Kraak said that “Cartographers cannot address the problem alone.” Through all the papers we have read in this class, there is a trending theme that there needs to be more cross-disciplinary communication in GIS to solve crosscutting problems. This article is better than other articles who just mention that more communication needs to happen; this article actually lists ways to better research cross-disciplinarily, in addition to listing short, medium, and long term goals. 

Although, I also feel that this article was written in a way that was very very generalized and vague, which made it a bit difficult to follow. This also gave their reasons less clout because it’s always easier to explain vague solutions as opposed to more specific ones. Some specific GIS examples would have also been very helpful!

The potential of AI methods in GIS (Openshaw, 1992)

October 27th, 2019

In this old paper, Openshaw (1992) calls attention to the potential of artificial intelligence (AI) methods in relation to spatial modeling and analysis in GIS. He argues that GIS with a low level of intelligence has only little changes to provide efficient solutions to spatial decision-making problems. The application of AI principles and techniques may provide opportunities to meet the challenges encountered in developing intelligent GIS. One thing which draws my attention is that the author mentions it is important to “discover how best to model and analyse what are essentially non-ideal data”. But I didn’t see a definition or explanation of non-ideal data in this paper. Does the non-ideal data refer to less structured data or unreliable data? AI can use less structured data such as raster data, video, voice, and text to generate insights and predictions. However, every AI system needs reliable and diverse data to learn from. Very similar data can lead to overfitting the model, with no new insights.

Further, Openshaw demonstrates the usefulness of artificial neural networks (ANNs) in modeling spatial interaction and classifying spatial data. But he didn’t mention how to transfer data from the GIS to the ANN and back. The most widely used ANNs requires data in raster form. However, the spatial data used to produce an interpretive result in GIS is most efficiently managed in vector form. Therefore, I am wondering if there is an efficient methodology to transfer information between the GIS and the ANN.

As of now, GIScience is not new to AI. For example, the most well-known application of AI is probably image classification, as implemented in many commercial and open tools. Many classification algorithms have been introduced to clustering and neural networks. Also, recent increases in computing power have made AI systems efficiently deal with large amounts of input data. I am looking forward to learning more about the current uses of AI in GIS.

Thoughts on “VoPham et. al – Emerging trends in geospatial artificial intelligence (geoAI)”

October 27th, 2019

In “Emerging trends in geospatial artificial intelligence (geoAI)”, VoPham et. al explain the emergence of geoAI as a new research field combining concepts, methods and innovations from various fields, such as spatial science, artificial intelligence (AI), data mining and high performance computing, and give examples of recent applications in real-life situations. The fusion between AI and GIS helps us obtain more accurate representations compared to traditional methods given the ability to make use of spatial big data.

As mentioned in the article, geoAI has the ability to revolutionize remote sensing, with the potential to more accurately recognize earth features. Slight differences in the spectral response of a pixel could be detected by an algorithm trained to detect these ever so small differences, which could help detect and respond to forest fires more rapidly for example. A research project I worked on last year aimed at assessing the extent of the Fort McMurray forest fire of 2016, and although the results were extremely similar to what had been obtained by official government sources, the use of geoAI could have overcome the limitations of the NDVI and NBRI indices used.

As with any new emerging scientific field, it will be interesting to see how and to what geoAI will be applied to next. An example would be spatial Agent-based modelling (ABM), which aims to simulate the actions of specifically defined agents in space, which could highly benefit from geoAI and the input from spatial big data. Geographical ontologies could also be redefined by deep learning, which could conceptualize things differently from the way we currently do.