Archive for October, 2019

on Optimal Routes in GIS and Emergency Planning Applications (Dunn & Newton, 1992)

Sunday, October 20th, 2019

Dunn & Newton offer a pretty decent summary of the most basic form of optimal/shortest path algorithms, and what they say still holds true (at least to my understanding …) some 27 years later. They describe how Dijkstra’s and the “out-of-kilter” algorithms determine routes, the different uses for each of them in emergency planning, the computational requirements, and finally some potential developments in the topic. Although a verbal description of the contents of a matrix isn’t the most intuitive or interesting to me, I found the easy enough to follow and I’m glad they included the few visuals they did.

It is interesting to see what they list as future development opportunities, as most of these things are now incorporated into every current mapping site, public transit app, and dashboard GPS. Even their “most difficult challenge”, temporal change, has been met, and I can map myself to Saskatoon in a few seconds without having to worry about getting stuck in heavy traffic or road closures. It is honestly very difficult to think of something else that I would want to incorporate into shortest path algorithms or an application like Google Maps. My first two thoughts are pedestrian traffic and incorporating greater local knowledge for more complex decisions. i feel like pedestrian traffic could be done very easily through traffic projection just like street traffic, and I’m sure someone has at least tried to do it. On local knowledge, I guess I mean if you want to ask something like “find me the most scenic drive”, “avoid routes where there are frequent accidents”, “only take me down streets with ample night lighting”. Some of these could lead into problematic discussions surrounding the ideas of aesthetics (who decides?) and safety (for who?).

Reflection on Spatial Data Mining – A Brief Review (Perumal et al., 2015)

Sunday, October 20th, 2019

Before reading this paper, I did not know anything about spatial data mining. This article does a good job of introducing GIS and then how spatial data mining relates to it, but was lacking in its explanations of challenges within the field itself.

Concerning how the article was written, I wish they would have expanded on the issue of big data. The issues and challenges section seemed rather short for the scope of challenges which arise with spatial data mining. Additionally, they did not explain their diagram (Fig.4), which was their big solution to big data issues, at all. More examples in general would have also been helpful.

Concerning the topic of spatial data mining itself, a few topics stood out to me. One is ontologies and how there are no clear cut definitions for computers to understand and follow across all spatial mining projects. It reminded me of the lecture we had on ontologies a few weeks back, and about how tricky it was to pinpoint specific definitions to such general topics. This issue doesn’t mean that spatial data mining is inherently flawed, it just recognizes that the human perspective is still important. It shows that there’s a limit to what we can program computers to do for us.

The other topic that stood out to me was just the entire scope of this process. These algorithms must have so much data running through them. There are also so many factors considered in each of the techniques that the computer must understand and consider. This leaves a lot of potential for error and I wonder how spatial data mining researchers minimize this error.

Thoughts on Spatial Data Mining Approaches for GIS – A Brief Review

Sunday, October 20th, 2019

This review article outlines the challenges in the use of geospatial data and the challenges in Spatial data mining and it summarized the tasks and tools in spatial data mining. It proposed an architecture to address the challenge of huge data volume.
This article did a good job of summarizing the tasks and tools in spatial data mining for me to have a basic understanding of the topic. However, I was a bit confused by this article mainly because it sometimes uses several terms interchangeably such as GIS, GIS data, spatial data mining, big data and I had a hard time grabbing what is the main idea that the author wants to explain. It seems to argue that the challenge of spatial data mining can be viewed as merely the challenge of big data volume and the challenge can be solved by a “big data” approach by integrating the data. An important dimension is missing – big data is not just about the big volume, but also about its velocity. Some spatial data such as social media data with spatial attributes is generated in a timely manner in a variety of form. Given this, the proposed framework doesn’t seem to be useful to me because it doesn’t address the velocity challenge. Even apart from this, the framework proposed is also not very well explained. In summary, I don’t really like this article.

Thoughts on “Network Analysis in Geographic Information Science: Review, Assessment, and Projections”

Sunday, October 20th, 2019

In this article, Curtin gives a detailed introduction on how network GIS evolves overtime, and its potential challenges and opportunities. He also mentioned that GIS related network analysis can be adapted into so many disciplines such as medicine, psychology, urban planning, and computer science.

This reminds me of a website that provides people with walkability score for tons of thousands of locations. The walkability score will be calculated based on the commuting time to nearby bus stops, subway stations, restaurants, grocery stores, parks, and etc. The score will be higher if one can walk to those location in less than several minutes. This website aims at providing walkability information to people who wants to find an apartment with great location. The calculation of the walkability score is largely based on network analysis by approximating the average walk time from one place to another and intergrade those results.

Another example I can think of is one project where I used network analysis to find a new potential location of fire station that can help other location to cover more area and be more efficient. This is also calculated by finding optimal routes and approximated the commute time.

The accuracy of the first result doesn’t matter as much as the fire station result, since the react time of fire station can directly influence people’s lives. So, it seems that the accuracy problem is also very important in network GIS when it comes to some special disciplines. One issue structs me is that in what extent can we trust these results. Or, in other words, how can we measure the accuracy and credibility of it?

Thoughts on Optimal routes in GIS and emergency palnning applications (Dunn et al., 1992)

Sunday, October 20th, 2019

The authors of this article explains and using experiment to illustrate how the shortest path algorithm (Dijkstra algorithm) and the more complicated “out-of-kilter” algorithm fits in the field of network analysis. Although the article is written in 1992, it still provide the basic knowledge to those who are not familiar with network analysis before. It also reflects the most basic level algorithm for nowadays more advanced network analysis algorithm.

One thing that catches my notice throughout the whole article, is that although the “out-of-kilter” algorithm the article mainly tested focus on shortest path, in realistic application, I would argue the time efficiency are way more important than the distance between two routes, in the case of emergency evacuation. As the authors discusses, however, more factors are indeed needed to perform a network analysis that considers time efficiency, such as peak hours, route capacity, means of transportation, slope, landscape etc..

Another important issue from this 1992 article about network analysis is the limitation of computing power and technological foresight. Due to the computing power limit, only simple shortest path between nodes can be compute in euclidean distance, which in most of the case, will hit obstacle in real-life practice, due to multiple physical, social, and cultural barrier. The issue with technological foresight is that although network analysis itself does not necessarily require geographical coordinate to perform. Or in other words, network analysis itself is not limited by geographic coordinate system or projection. However, nowadays, when we apply network analysis on more advanced use, like GPS tracking, navigation, even real-time traffic monitoring/dispatch, the geographic side of network analysis just cannot be totally ignored, and sometimes rather necessary.

Reflection on Network Analysis in GIScience (2007)

Sunday, October 20th, 2019

Curtin’s piece gives a solid introduction to network analysis and the problems that come with it, however I wish some of the examples, such as the traveling salesman problem, had been explained a bit more in depth, instead of him mentioning it and then moving on.

A part of the article that I found intriguing was how Curtin listed all of the disciplines which use social network analysis, but also stated how there was not much, if any, communication between those disciplines:

“Research advances in this area could transform the sub-discipline of facility location science, but in order to do so the GIScience community must look beyond the current capabilities of existing software and recognize that related disciplines such as operations research and management science have a set of complementary techniques that can be readily applied to spatial problems.”

This made me wonder how much network analysis is taken and applied between disciplines. Is it common practice for a GIScientist who uses network analysis to look at other disciplines, such as medicinal or criminal examples of social network analysis? How would they know where and what to look for? Is there a better way for communication between disciplines? The article mentions how GIS research in spatial networks is helpful to other disciplines and I understand why that is true – but, what about the other disciplines that have published research relevant to spatial networks? Or in the years since this article’s publication, has there been more interdisciplinary research between all of these disciplines? Has open data made it more likely for researchers to collaborate or research outside of their own discipline?

Thoughts on “Spatial data mining and geographic knowledge discovery—An introduction”

Sunday, October 20th, 2019

In this paper, Guo and Mennis explore common spatial data-mining tasks and their development. They point out towards the end that “we often claim to ‘let the data speak for themselves,'” but “data cannot tell stories unless we formulate appropriate questions to ask and use appropriate methods to solicit the answers from the data” (Guo & Mennis 407). They go on to claim that “data mining is data-driven but also, more importantly, human-centered… the abundance of spatial data provides exciting opportunities for new research directions but also demands caution in using these data” (Guo & Mennis 407). These contentions get to the heart of why GIS is a science and not merely a tool; if it were, then the sorts of questions asked and how they are answered would not matter nearly as much, since as a tool GIS would be primarily concerned with input and output instead of the process and context involved. These quotes also brought to mind the incredible potential of open-sourced data from social media sites, as well as their potential limitations. While such great quantities of data have been and continue to be quite useful in GIS research, it will always be important to understand the conclusions that can be taken from studies conducted with such data. One example of this comes from my father, who’s a geography professor. He had a student who wanted to use data from a social media platform (I think it was Instagram or some similar image-based site) to map where a certain plant was distributed in a national park. She planned to do this by taking the locations of all pictures mentioning this plant to get a sense of where it predominantly grows in the park. When she had conducted her analysis, she discovered something incredible: the plants grow in straight, long, narrow lines within the park. However, my father immediately saw the flaw in her conclusion: these geotagged pictures were taken along paths, and that’s what she was seeing! Therefore, as spatial data become larger and more accessible, we must be increasingly cautious with how we use them and draw conclusions from them, just as Guo and Mennis point out.

Thoughts on “Optimal Routes in GIS and Emergency Planning Applications”

Sunday, October 20th, 2019

Reading this article is quite timely for me, as my final project in GEOG 506 will involve a bit of network analysis and I plan to use Dijkstra’s algorithm. It’s really interesting to read Dunn and Newton’s paper and their “out of kilter” algorithm, considering it is from 1992; you can tell from what they say that network analysis was nowhere near where it is today. For example, they discuss how “one of the most difficult concepts to incorporate realistically into a system… is that of temporal change… roadworks have daily, weekly, or monthly lifetimes, for instance, while congestion varies hourly” (Dunn & Newton 265). Today, there are programs like Google Maps and Waze that have, to varying degrees of success, taken on this issue of temporality in network analysis. Dunn and Newton pose an additional problem in network analysis that has been increasingly solved in modern studies: “it is perhaps pertinent to ask whether even highly sophisticated network algorithms may ever replace local knowledge and intuition, not least in emergency situations where judgement is of the essence” (Dunn & Newton 265). When I took GEOG 307 last semester, we read “Flaming to the Scene,” a paper in which the authors use location-allocation and network analyses to determine the best routes for firetrucks to take to a fire. They did this successfully, as their algorithm was implemented by the Regina fire department and decreased the average time trucks took to arrive to the scene of a fire. Therefore, it appears that many of the challenges in weighted network analyses brought up in this paper have been resolved.

Effectively communicating spatial arguments

Sunday, October 20th, 2019

After reading through the 1992 article “Optimal Routes in GIS and Emergency Planning Applications”, I notice myself reflecting on how the information in the article is presented to me. Perhaps as a result of working on my upcoming Geovisualization seminar, I am seeing everything through a lens of how users interpret visual stimuli. After I noticed this, I followed my own processing of the articles information being presented to me. The strange thing about many articles in the geographic field is that often they don’t include very good maps. Geography as a subject is inherently visual, after all, spatial data is all about how things related to each other in space – something that we tend to perceive visually. It’s difficult to make sense of spatial concepts without visual aids to explain them.

In the case of this article, its grade in terms of its visual communicative abilities is worthy of debate. It would be essentially impossible to understand much of what is being talked about with Fig’s 1 and 2, which add much needed context to the network analysis concept being explored. However, this is all that is communicated – purely the spatial data exploring the toolsets in question. The tools are framed as being used in an emergency management context, and the maps provided do little to add a place to the space. I would personally say that this degrades from the communicative abilities of the article, but I also understand the perspective that GIScience articles are intended to solely explore the tool/topic for its own merit rather than in the context where it is being explored. In reviewing more recent literature, I have seen writers err on both sides of this line – indicating that this is a debate that has not been settled since 1992.

Thoughts on “Spatial data mining and geographic knowledge discovery – An introduction”

Saturday, October 19th, 2019

In “Spatial data mining and geographic knowledge discovery – An introduction”, Mennis and Guo articulates the four main methods used in spatial data mining, namely the spatial classification method, spatial association rule mining, spatial clustering and geovisualization, while also explaining the challenges linked with the spatial data mining process.

Although it is true that spatial data mining technologies have greatly evolved over the last few decades, it is always the case that the law is always trailing technological advances, which may allow unethical uses that could compromise the privacy of certain service users, especially from the private sector. While the methods presented in this article seem to be appropriate for many different cases, it could be raised that a partitioning spatial clustering method, which is non-overlapping, might assign a data item to cluster ‘x‘ even though it could have equally been assigned to a cluster ‘y‘, something that could change from one iteration to another.

Interestingly, the conclusion supposes that “the data cannot tell stories unless we formulate appropriate questions to ask and use appropriate methods to solicit the answers from the data”, a notion that could be challenged with the rapid growth of several fields, such as machine learning and artificial intelligence. Although it is hard to conceptualize right now, tt wouldn’t be too far-fetched in the near future where machines could essentially determine by themselves the best algorithms to use in order to classify spatial data from a vast database.

Thoughts on “Optimal routes in GIS and emergency planning applications”

Saturday, October 19th, 2019

In “Optimal routes in GIS and emergency planning applications”, Dunn and Newton present the importance of GIS in the context of optimizing flow in emergency management situations. Two algorithms, namely Djikstra’s algorithm and out-of-kilter algorithm are presented as ways to determine the shortest path from a starting node to an end node. Where Djikstra’s algorithm is optimized for path finding in more simple networks, out-of-kilter’s algorithm is more efficient in complex networks with arcs having limited flow.

Network analysis is definitely key to better emergency response and evacuation situations that require optimized knowledge of all evacuation path networks, which may include more than just roads. However, analyzing emergency evacuation through these two algorithms does not leave space for unaccounted human decisions. An example could be painted for Sainte-Marthe-sur-le-Lac, a northern suburb of Montreal that experienced a dike breach earlier this year, which forced the immediate evacuation of more than 8,000 residents. Following the evacuation order, the road network was completely overflowing, with people stuck in traffic for more than an hour, which prompted people to start driving on terrains and properties to flea the scene. This gives an example where the magnitude of a catastrophe could force people to use paths outside the road network to get to their desired destination. Using the out-of-kilter algorithm to analyze networks in emergency situations is thus limited in its ability by not accounting for out of network transit.

Another interesting point would be in terms of the computational times necessary to update the preferred path in emergency evacuations. Since this article was published in 1992, have there been significant improvements in computational times? Has another algorithm emerged as more efficient to determine the most efficient path? Emergency evacuations requiring frequent updates, such as a flooding event or a hurricane, could be severely affected if computational time isn’t maintained under certain thresholds.

The pleasure(s) of GOD – Geospatial Open Data

Monday, October 7th, 2019

I know the acronym is OGD in the paper… but I wanted this to be my title, so please deal –

The ‘costs’ paper goes over what may be some underrated or undiscovered costs associated with open geospatial data. The four ‘big’ ones the authors point out are: 1) citizen participation smoke and mirrors; 2) uneven geographies; and 3) open data being a public subsidy for private companies; and 4) corporate influence. In my opinion: If the government wants to get in the business of opening data – because it’s fashionable now or we’ve decided it’s necessary for a well-to-participate civic society, it must do so with even-weighting on social good and long-term viability. We should solve this problem, as a nation, the same way we’ve done whenever some social output was necessary but not necessarily financially feasible: Crown-Corps. I’m sure we’re all fans.

Johnson and colleagues describe how open governmental data would enable faux-participation, which is what I think is meant by smoke-and-mirrors; will hopefully be able to follow-up with the cited Robinson paper. The note on North America’s public libraries reminds me of an excellent 99% invisible episode on how these critical pieces of socio-cultural infrastructure needed imaginative re-building. And they obviously do. We need places for people to think about how this world intersects with the digital one. One argument made – “that placing data in an open catalogue [was] no guarantee of its use” felt odd to me. Of course, I could guarantee that not placing data in an open catalogue would guarantee no use whatsoever. I’m not sure I understand how people not using open data when it’s made available is a cost associated with opening data.

Uneven geographies I felt was self-explanatory. Based on scale, access, and representation in data, various places may be left out, while others emphasized.

I lean on my Crown-Corp idea for dealing with issue # 3 & 4: open data ending-up becoming a pseudo-subsidy; and open data as an entry-point for corporate influence in government. I don’t think this is inherently a bad or necessary thing. Authors suggest that there is an indirect cost when opening data as companies take data to build products that they can sell back to the consumer. If some company follows these steps and provides their product for free, then there is no indirect cost – it’s purely built into the downstream direct costs to the consumer. My one-stop solution, the might Crown-Corp, could simply regulate data as a special commodity. If you are sufficiently likely to have formed part of some products training data, you are exempt from paying the product-making company. If a private tech giant is equipped to influence and standardize data formats, we can offer direct subsidies for creating platforms that are socially inclusive. Since datasets of benefit to civic society are likely to be different from those of corporate interests, again offer subsidies for helping open civic-priority data. All this starts with the establishment of a data-oriented CBC. Data journalism focused, open governmental geo-spatial data behemoth tasked with coalescing data for safe use by Canadians. Should entrepreneurial Canadians be interested in this data: simply charge them for it – this century’s oil right?

I’ve written too much already. Sorry. Last thing: Bates comments are 100% spot-on. Any open data will be used to reinforce trust in our government. If we’ve learned anything from 2016, it’s how quickly faith in once-esteemed institutions can be lifted. How can we ensure data is opened in a transparent way? Without having to rely on self-certified transparency?

I think a repeating pattern I’m struggling with in GI-Science is this belief that we as GI-Scientists are optimally placed to be considering how to deal with data and real-world representations of data, likely informed by our history as modelling geographic realities. Sometimes it feels like a stretch to me – many fields have spatial scientists, some of whom are talking about the same topics we pride ourselves on. And for when the others aren’t speaking the same language – why not? We are either critical to almost everything that is going to be happening this century, or we are in an echo-chamber.

Government Data and the Invisible Hand (Robinson et al., 2009)

Monday, October 7th, 2019

In this 2009 article, Robinson et al. discuss the downfalls of the United States federal government in providing reusable data to citizens and the role that they envision for private parties in providing this data. They suggest downscaling government focus on websites and portals to provide data to provide a bare-bones feed that third parties could access and reformat data to give back to the citizen end-user.

I don’t disagree with the issues that Robinson et al. bring up (all government data should be publicly and easily available, governments are slow to update and adapt because bureaucracy gets in the way), but I think that this is a fundamentally bad take on the solution. I don’t think that inserting private entities into the process is a good idea; government data should be provided directly to citizens by the government itself. I can’t say I know the state of data accessibility in the pre-Obama United States and I’m sure that there was a huge gap for the government to bridge, but their proposal really missed the mark for me. The people should not rely on private parties to communicate with their government, and vice versa.

On the topic of reliance on private parties, the authors also draw what I see as a huge false equivalency in saying “the government already relies heavily on private parties for facilitating aspects of core civic activities – travelling to Washington, calling one’s representatives on the phone, or even going to the library to retrieve a paper public record” (174). All these referenced industries are heavily regulated by federal law and oversight, and often subsidized by federal funds. They are all subject to the same kind of bureaucracy that the authors decry in the government’s provisioning of data.

My main two issues with creating this new role for private data providers are security and standardization. The authors touched on security concerned by saying they expect there to be at least one trustworthy private source for everyone, which is in my opinion not a solution. I also think that standardization of data (at least, whatever can be standardized, like metadata) is important. It is unfortunate that some agencies were required to shut down their more advanced systems in favour of a single, standard system, but I think that this is a failure in a single piece of legislation, not in the government’s handling of data.

The problem was identified successfully, but not the solution. I’m not against third-parties processing and aggregating public data into more intuitive/interactive/visually pleasing/organized formats, but I think that that should happen parallel to the government providing its own data and documents in raw/aggregate formats and formats that are already easy to use and access. Private entities should not be the primary point-of-access for public data. Providing data to its citizens is not where the United States government should be making cuts and penny-pinching.

Geospatial Open data Cost

Sunday, October 6th, 2019

The paper discusses about the direct and indirect costs of geospatial open data. As is defined in the article, open data are government data typically provided for free with minimal reuse restrictions. Open data are referred to as open government data where government plays an important role in regulating the open data system for collecting, processing, managing, sharing free data with certain value and cost to the public. It has been pointed out that open data does have high cost from data collection and maintenance process and other anticipated challenges because these data involved are free for customers to use, meaning little return for developing open data.

From my perspectives, first question is why open data are mostly managed and released by governments though there is a view that government data was already funded by the taxpayers? Is there any possibility for more companies or institutions to run the system for open data with more advanced geospatial data processing technologies while making funds from adverting investment on the website? Second question is to what extent should the open data be open? Is it necessary for open data to be accessible to everyone? What is the definition of the “general public”? Should government make the data understandable enough and easy enough for public costumers to use, or some partially-open data which can be shared for those who have been educated with some professional GIS knowledge. People with professional skills (academic use) would process those open data (much rawer) on their own for unique purpose which will reduce cost for government original data process. Moreover, if open data are used for commercial use, cost should be placed on those cases.

The last point is open data give rise to the new sources to get data like self-publishing data and data differences and variance of data forms force the development for GIScience. Ontologies in GIScience firstly should be more widely developed and new technologies for filtering valuable data and formulating data are indispensable. This paper helps me a lot in understanding the open data concept and emphasis the indirect cost (challenges) for policy maker to consider how should open data system be better developed with a clear stating structure.

Thoughts on “Government Data and the Invisible Hand”

Sunday, October 6th, 2019

It was quite interesting to read this ten year old paper in 2019, when so much has changed in our technological and digital worlds. The authors focus on increasing citizens’ access to governmental data, an issue which I personally do not hear much about in today’s conversations about governmental data. Whether this is because the Obama administration did work to make data more accessible to citizens and succeeded, or because most citizens do not care about the sort of governmental data that could be but is not publicly available, I do not know. Regardless, this paper’s focus on a greater access to governmental data does not address one particularly significant concern in the present day, which is whether or not our government has too much data on us. In that sense, frankly, it has not aged well in my mind because it does not discuss a well-known and well-defined 2019 problem. However, as mentioned above, this may be due to the authors’ concerns having been addressed in following years. I may be in a privileged position, having the ability to discount the problem of too little governmental data being available to the common user because it is a problem already solved. Regardless, the focus of the paper – or rather, the lack of focus of government surveillance and how much personal data the government has on private citizens – raises several questions for me, someone not familiar with debates about surveillance and data. When this paper was written in 2009, were the authors’ concerns about governmental data not being available to users reflected outside of academia at all, in the general population? Was this concern greater than that about governmental surveillance? In addition, how have socio-technological changes between 2009 and 2019 (one that immediately comes to mind is the widespread usage of smartphones) impacted not only how the government can use our data but also our concerns about the government and data? Once again, such questions are beyond the scope of this blog post, but I’ll be sure to bring them up in class if they are relevant to our discussion.

Thoughts on “Government Data and the Invisible Hand” (Robinson et al. 2009)

Sunday, October 6th, 2019

This article’s main argument is that government public data should be available in an organized, easy to use and find manner for the general public and third-party organizations. I agree with the article’s general argument; the government should have the appropriate infrastructure to provide public data in a reusable and universal format.

The article points out that oftentimes the government does not keep up with the fast evolution of technology and web capabilities that emerge. This article was published in 2009, now in 2019, similar issues are still at play. In my own personal experience, this is still the case in the Canadian Government. There have been big steps taken within the Canadian Government to modernize and make use of the wide variety of tools available for data analysis and visualization for internal and external use.

A point important to highlight is that despite data being accessible, third-party organizations and citizens interested in creating web products to analyze and better understand the data being used to inform policy and regulation decisions, do not have all of the data required to see the full picture. In the case of the Government of Canada, data is split into three different categories, public and open data; protected data (Protected A, Protected B, and Protected C); and classified data (Confidential, Secret, and Top Secret). All of this data is used at different levels of government to make decisions – data that due to security and privacy is not accessible to the public.

I believe that along with making data easily accessible to the public, it is also the responsibility of the government to deliver a quality web product for the public to view the data in the way the government used it. This still allows for third-party organizations to create different ways to view the same data.

Thoughts on “Government Data and the Invisible Hand”

Sunday, October 6th, 2019

In “Government Data and the Invisible Hand”, Robinson et. Al outline the process and advantages for the American Government to grant open online access to their data, which would provide the ability for third-party organizations to broaden data accessibility and contribute themselves by making use of them. Furthermore, it is argued that the private sector is “better suited to deliver government information to citizens” if the data is easy to parse through, given their ability to quickly change the tools based on the public needs as well as their position as outsiders.

If we’re thinking about geospatial data in this context, an important question remains after reading this article, which specified that public data should be provided by the government “in the highest detail available”: wouldn’t there be privacy concerns in that regard? There could be occurrences where the highest detail available for a dataset compromises the identity of individuals or groups if the spatial scale is fine enough. There would still be a privacy concern with non-geospatial data, as some sensitive information about individuals would have to be withheld from the public, meaning that a censorship would have to be done in order to preserve every citizens’ privacy. Alternatively, different political administrations could differ in what they deem acceptable and unacceptable for public access based on their own political positions. Finding a perfect balance between data accessibility and minimizing security concerns for the population is an extremely difficult challenge, as each and every one could have a different view. These differing subjective views could drastically affect the ability of private actors to make use of the data, especially if the administration has a veto in terms of what should or should not be publicly accessible.

All in all, I personally think that it is the government’s responsibility to first determine what constitutes sensitive data, as preserving privacy is of utmost importance. Following that, making all its non-sensitive data easily available online and promoting their use would go a long way to further our understanding of studied phenomenons using the data, but also improving society’s trust in government given a higher level of transparency.

Thoughts on “The Cost(s) of Geospatial Open Data”

Sunday, October 6th, 2019

This article introduced the potential costs brought by open data. Two categories of costs are identified in this article: direct cost and indirect cost. The direct cost of open data can be understood as collecting raw data, preparing data, and maintain the uploaded data. Then the author listed four indirect costs of open data: 1) issue aroused by citizen participation and engagement, 2) difficulty in reaching a sole standard due to unevenness of data provision, 3) tension between government expense and private sector use, and 4) the privatization of open data in private sectors.

I am very interested in the privacy issue in open data due to previous experience with crime datasets. In one previous project, I worked with Toronto Crime Datasets from its government open data portal, and I found out in the data acknowledgement that those points data have all been calculated by certain algorithm so that they won’t necessarily represent the true location of each events. Since this data is available to everyone who have access to the internet, I understand that this calculation is for privacy protection. This little change did not impact much to my project. However, what if some researches really need these kinds of information? Should the government giving out the raw data despite the privacy issue? What rationale should they use in terms of considering giving out sensitive datasets? To me, this is a dilemma of open data should be open or not, and the rationale of this question might also differ between different area or territories.

Open data and bureaucratic thrift

Sunday, October 6th, 2019

After reading through both of the articles this week, I’m reflecting on previous conversations and experiences I have had with open data and government access. I was especially impressed by the thoroughness of the “5 Myths” section of Janssen et al, which did an excellent job of unpacking some of the rhetoric and misinformation surrounding the current trend of open government data.

In reading both, I did feel that one aspect of open data was especially under-addressed, and could be explored further – the cost-saving factor motivating governments decisions to release open data to the public. As the size of the data sets local and national government actors manage has grown, the burden of managing those has increased. Keeping this data private and making careful decisions about who has access, what requests to authorize, and how to manage it quickly becomes a bureaucratic leviathan as the data sets exponentially increase. By making these data sets public, the labor and infrastructural costs of managing information access requests are massively reduced, making the governments work far easier. Many governments have adopted a policy that data is by default “open”, and unless policy makers and data managers specifically believe a certain data set should be private any new information generated is immediately available for public dispersal.

This dynamic has been explained to me multiple times by policy-makers at the city level, and I have personally seen its efficiency. In many ways this cost saving motivation provides more support for the argument at the center of Robinson et al, which is that data is better left in the hands of outside actors whereas it is the governments responsibility to ensure that what data is accessible is easily managed. The previous comment stated that “Public officials tend to focus on the number of datasets they release rather than on the effect of releasing high-quality sets of data.” I believe that the best explanation for this decision is the cost-saving factor I outlined above.

The costs of open geospatial data (Johnson et al. 2017)

Sunday, October 6th, 2019

Open data has become a big movement in local governments. This article raises concerns over the costs incurred in the process of government data provision. The idea that making government data freely accessible – especially when geospatial data is involved – would create direct and indirect costs.

The authors suggest that direct costs must include considerations of individual privacy and confidentiality. Indeed, privacy protection may create direct costs, but government officials must ensure that all open data respects and only discloses information that cannot be attached to individuals. For instance, journey data is being used in a variety of ways to create and improve geospatial data products and to deliver services to uses. The journeys people take can be used to infer where they live, where they work, where they shop. If individuals become unwilling to share movement data, then this will impact the ability for that data to be used in ways that create economic and social benefits.

Besides direct costs, Johnson et al. (2017) identify four areas where the provision of open geospatial data can generate unforeseen expenses. They indicate that the private sector pushes for the release of “high value” datasets to develop their commercial products or services. This could divert governments’ attention from “low value” data. However, note that high-value data could also have a significant impact on citizens. People are taking advantage of applications that made use of open data. Transit commuters know how long they’ll be waiting for a ride. Drivers easily find parking close to where they want to travel to. Renters get detailed information about crime and school for any address. The information that developer access to inform these applications come directly from high-value datasets.

One way to reduce costs is to limit what data sets are published. Public officials tend to focus on the number of datasets they release rather than on the effect of releasing high-quality sets of data. Cities should do a careful analysis of which datasets have the most impact, both in terms of social and economic benefits, so as to avoid hidden costs.