Archive for October, 2019

The pleasure(s) of GOD – Geospatial Open Data

Monday, October 7th, 2019

I know the acronym is OGD in the paper… but I wanted this to be my title, so please deal –

The ‘costs’ paper goes over what may be some underrated or undiscovered costs associated with open geospatial data. The four ‘big’ ones the authors point out are: 1) citizen participation smoke and mirrors; 2) uneven geographies; and 3) open data being a public subsidy for private companies; and 4) corporate influence. In my opinion: If the government wants to get in the business of opening data – because it’s fashionable now or we’ve decided it’s necessary for a well-to-participate civic society, it must do so with even-weighting on social good and long-term viability. We should solve this problem, as a nation, the same way we’ve done whenever some social output was necessary but not necessarily financially feasible: Crown-Corps. I’m sure we’re all fans.

Johnson and colleagues describe how open governmental data would enable faux-participation, which is what I think is meant by smoke-and-mirrors; will hopefully be able to follow-up with the cited Robinson paper. The note on North America’s public libraries reminds me of an excellent 99% invisible episode on how these critical pieces of socio-cultural infrastructure needed imaginative re-building. And they obviously do. We need places for people to think about how this world intersects with the digital one. One argument made – “that placing data in an open catalogue [was] no guarantee of its use” felt odd to me. Of course, I could guarantee that not placing data in an open catalogue would guarantee no use whatsoever. I’m not sure I understand how people not using open data when it’s made available is a cost associated with opening data.

Uneven geographies I felt was self-explanatory. Based on scale, access, and representation in data, various places may be left out, while others emphasized.

I lean on my Crown-Corp idea for dealing with issue # 3 & 4: open data ending-up becoming a pseudo-subsidy; and open data as an entry-point for corporate influence in government. I don’t think this is inherently a bad or necessary thing. Authors suggest that there is an indirect cost when opening data as companies take data to build products that they can sell back to the consumer. If some company follows these steps and provides their product for free, then there is no indirect cost – it’s purely built into the downstream direct costs to the consumer. My one-stop solution, the might Crown-Corp, could simply regulate data as a special commodity. If you are sufficiently likely to have formed part of some products training data, you are exempt from paying the product-making company. If a private tech giant is equipped to influence and standardize data formats, we can offer direct subsidies for creating platforms that are socially inclusive. Since datasets of benefit to civic society are likely to be different from those of corporate interests, again offer subsidies for helping open civic-priority data. All this starts with the establishment of a data-oriented CBC. Data journalism focused, open governmental geo-spatial data behemoth tasked with coalescing data for safe use by Canadians. Should entrepreneurial Canadians be interested in this data: simply charge them for it – this century’s oil right?

I’ve written too much already. Sorry. Last thing: Bates comments are 100% spot-on. Any open data will be used to reinforce trust in our government. If we’ve learned anything from 2016, it’s how quickly faith in once-esteemed institutions can be lifted. How can we ensure data is opened in a transparent way? Without having to rely on self-certified transparency?

I think a repeating pattern I’m struggling with in GI-Science is this belief that we as GI-Scientists are optimally placed to be considering how to deal with data and real-world representations of data, likely informed by our history as modelling geographic realities. Sometimes it feels like a stretch to me – many fields have spatial scientists, some of whom are talking about the same topics we pride ourselves on. And for when the others aren’t speaking the same language – why not? We are either critical to almost everything that is going to be happening this century, or we are in an echo-chamber.

Government Data and the Invisible Hand (Robinson et al., 2009)

Monday, October 7th, 2019

In this 2009 article, Robinson et al. discuss the downfalls of the United States federal government in providing reusable data to citizens and the role that they envision for private parties in providing this data. They suggest downscaling government focus on websites and portals to provide data to provide a bare-bones feed that third parties could access and reformat data to give back to the citizen end-user.

I don’t disagree with the issues that Robinson et al. bring up (all government data should be publicly and easily available, governments are slow to update and adapt because bureaucracy gets in the way), but I think that this is a fundamentally bad take on the solution. I don’t think that inserting private entities into the process is a good idea; government data should be provided directly to citizens by the government itself. I can’t say I know the state of data accessibility in the pre-Obama United States and I’m sure that there was a huge gap for the government to bridge, but their proposal really missed the mark for me. The people should not rely on private parties to communicate with their government, and vice versa.

On the topic of reliance on private parties, the authors also draw what I see as a huge false equivalency in saying “the government already relies heavily on private parties for facilitating aspects of core civic activities – travelling to Washington, calling one’s representatives on the phone, or even going to the library to retrieve a paper public record” (174). All these referenced industries are heavily regulated by federal law and oversight, and often subsidized by federal funds. They are all subject to the same kind of bureaucracy that the authors decry in the government’s provisioning of data.

My main two issues with creating this new role for private data providers are security and standardization. The authors touched on security concerned by saying they expect there to be at least one trustworthy private source for everyone, which is in my opinion not a solution. I also think that standardization of data (at least, whatever can be standardized, like metadata) is important. It is unfortunate that some agencies were required to shut down their more advanced systems in favour of a single, standard system, but I think that this is a failure in a single piece of legislation, not in the government’s handling of data.

The problem was identified successfully, but not the solution. I’m not against third-parties processing and aggregating public data into more intuitive/interactive/visually pleasing/organized formats, but I think that that should happen parallel to the government providing its own data and documents in raw/aggregate formats and formats that are already easy to use and access. Private entities should not be the primary point-of-access for public data. Providing data to its citizens is not where the United States government should be making cuts and penny-pinching.

Geospatial Open data Cost

Sunday, October 6th, 2019

The paper discusses about the direct and indirect costs of geospatial open data. As is defined in the article, open data are government data typically provided for free with minimal reuse restrictions. Open data are referred to as open government data where government plays an important role in regulating the open data system for collecting, processing, managing, sharing free data with certain value and cost to the public. It has been pointed out that open data does have high cost from data collection and maintenance process and other anticipated challenges because these data involved are free for customers to use, meaning little return for developing open data.

From my perspectives, first question is why open data are mostly managed and released by governments though there is a view that government data was already funded by the taxpayers? Is there any possibility for more companies or institutions to run the system for open data with more advanced geospatial data processing technologies while making funds from adverting investment on the website? Second question is to what extent should the open data be open? Is it necessary for open data to be accessible to everyone? What is the definition of the “general public”? Should government make the data understandable enough and easy enough for public costumers to use, or some partially-open data which can be shared for those who have been educated with some professional GIS knowledge. People with professional skills (academic use) would process those open data (much rawer) on their own for unique purpose which will reduce cost for government original data process. Moreover, if open data are used for commercial use, cost should be placed on those cases.

The last point is open data give rise to the new sources to get data like self-publishing data and data differences and variance of data forms force the development for GIScience. Ontologies in GIScience firstly should be more widely developed and new technologies for filtering valuable data and formulating data are indispensable. This paper helps me a lot in understanding the open data concept and emphasis the indirect cost (challenges) for policy maker to consider how should open data system be better developed with a clear stating structure.

Thoughts on “Government Data and the Invisible Hand”

Sunday, October 6th, 2019

It was quite interesting to read this ten year old paper in 2019, when so much has changed in our technological and digital worlds. The authors focus on increasing citizens’ access to governmental data, an issue which I personally do not hear much about in today’s conversations about governmental data. Whether this is because the Obama administration did work to make data more accessible to citizens and succeeded, or because most citizens do not care about the sort of governmental data that could be but is not publicly available, I do not know. Regardless, this paper’s focus on a greater access to governmental data does not address one particularly significant concern in the present day, which is whether or not our government has too much data on us. In that sense, frankly, it has not aged well in my mind because it does not discuss a well-known and well-defined 2019 problem. However, as mentioned above, this may be due to the authors’ concerns having been addressed in following years. I may be in a privileged position, having the ability to discount the problem of too little governmental data being available to the common user because it is a problem already solved. Regardless, the focus of the paper – or rather, the lack of focus of government surveillance and how much personal data the government has on private citizens – raises several questions for me, someone not familiar with debates about surveillance and data. When this paper was written in 2009, were the authors’ concerns about governmental data not being available to users reflected outside of academia at all, in the general population? Was this concern greater than that about governmental surveillance? In addition, how have socio-technological changes between 2009 and 2019 (one that immediately comes to mind is the widespread usage of smartphones) impacted not only how the government can use our data but also our concerns about the government and data? Once again, such questions are beyond the scope of this blog post, but I’ll be sure to bring them up in class if they are relevant to our discussion.

Thoughts on “Government Data and the Invisible Hand” (Robinson et al. 2009)

Sunday, October 6th, 2019

This article’s main argument is that government public data should be available in an organized, easy to use and find manner for the general public and third-party organizations. I agree with the article’s general argument; the government should have the appropriate infrastructure to provide public data in a reusable and universal format.

The article points out that oftentimes the government does not keep up with the fast evolution of technology and web capabilities that emerge. This article was published in 2009, now in 2019, similar issues are still at play. In my own personal experience, this is still the case in the Canadian Government. There have been big steps taken within the Canadian Government to modernize and make use of the wide variety of tools available for data analysis and visualization for internal and external use.

A point important to highlight is that despite data being accessible, third-party organizations and citizens interested in creating web products to analyze and better understand the data being used to inform policy and regulation decisions, do not have all of the data required to see the full picture. In the case of the Government of Canada, data is split into three different categories, public and open data; protected data (Protected A, Protected B, and Protected C); and classified data (Confidential, Secret, and Top Secret). All of this data is used at different levels of government to make decisions – data that due to security and privacy is not accessible to the public.

I believe that along with making data easily accessible to the public, it is also the responsibility of the government to deliver a quality web product for the public to view the data in the way the government used it. This still allows for third-party organizations to create different ways to view the same data.

Thoughts on “Government Data and the Invisible Hand”

Sunday, October 6th, 2019

In “Government Data and the Invisible Hand”, Robinson et. Al outline the process and advantages for the American Government to grant open online access to their data, which would provide the ability for third-party organizations to broaden data accessibility and contribute themselves by making use of them. Furthermore, it is argued that the private sector is “better suited to deliver government information to citizens” if the data is easy to parse through, given their ability to quickly change the tools based on the public needs as well as their position as outsiders.

If we’re thinking about geospatial data in this context, an important question remains after reading this article, which specified that public data should be provided by the government “in the highest detail available”: wouldn’t there be privacy concerns in that regard? There could be occurrences where the highest detail available for a dataset compromises the identity of individuals or groups if the spatial scale is fine enough. There would still be a privacy concern with non-geospatial data, as some sensitive information about individuals would have to be withheld from the public, meaning that a censorship would have to be done in order to preserve every citizens’ privacy. Alternatively, different political administrations could differ in what they deem acceptable and unacceptable for public access based on their own political positions. Finding a perfect balance between data accessibility and minimizing security concerns for the population is an extremely difficult challenge, as each and every one could have a different view. These differing subjective views could drastically affect the ability of private actors to make use of the data, especially if the administration has a veto in terms of what should or should not be publicly accessible.

All in all, I personally think that it is the government’s responsibility to first determine what constitutes sensitive data, as preserving privacy is of utmost importance. Following that, making all its non-sensitive data easily available online and promoting their use would go a long way to further our understanding of studied phenomenons using the data, but also improving society’s trust in government given a higher level of transparency.

Thoughts on “The Cost(s) of Geospatial Open Data”

Sunday, October 6th, 2019

This article introduced the potential costs brought by open data. Two categories of costs are identified in this article: direct cost and indirect cost. The direct cost of open data can be understood as collecting raw data, preparing data, and maintain the uploaded data. Then the author listed four indirect costs of open data: 1) issue aroused by citizen participation and engagement, 2) difficulty in reaching a sole standard due to unevenness of data provision, 3) tension between government expense and private sector use, and 4) the privatization of open data in private sectors.

I am very interested in the privacy issue in open data due to previous experience with crime datasets. In one previous project, I worked with Toronto Crime Datasets from its government open data portal, and I found out in the data acknowledgement that those points data have all been calculated by certain algorithm so that they won’t necessarily represent the true location of each events. Since this data is available to everyone who have access to the internet, I understand that this calculation is for privacy protection. This little change did not impact much to my project. However, what if some researches really need these kinds of information? Should the government giving out the raw data despite the privacy issue? What rationale should they use in terms of considering giving out sensitive datasets? To me, this is a dilemma of open data should be open or not, and the rationale of this question might also differ between different area or territories.

Open data and bureaucratic thrift

Sunday, October 6th, 2019

After reading through both of the articles this week, I’m reflecting on previous conversations and experiences I have had with open data and government access. I was especially impressed by the thoroughness of the “5 Myths” section of Janssen et al, which did an excellent job of unpacking some of the rhetoric and misinformation surrounding the current trend of open government data.

In reading both, I did feel that one aspect of open data was especially under-addressed, and could be explored further – the cost-saving factor motivating governments decisions to release open data to the public. As the size of the data sets local and national government actors manage has grown, the burden of managing those has increased. Keeping this data private and making careful decisions about who has access, what requests to authorize, and how to manage it quickly becomes a bureaucratic leviathan as the data sets exponentially increase. By making these data sets public, the labor and infrastructural costs of managing information access requests are massively reduced, making the governments work far easier. Many governments have adopted a policy that data is by default “open”, and unless policy makers and data managers specifically believe a certain data set should be private any new information generated is immediately available for public dispersal.

This dynamic has been explained to me multiple times by policy-makers at the city level, and I have personally seen its efficiency. In many ways this cost saving motivation provides more support for the argument at the center of Robinson et al, which is that data is better left in the hands of outside actors whereas it is the governments responsibility to ensure that what data is accessible is easily managed. The previous comment stated that “Public officials tend to focus on the number of datasets they release rather than on the effect of releasing high-quality sets of data.” I believe that the best explanation for this decision is the cost-saving factor I outlined above.

The costs of open geospatial data (Johnson et al. 2017)

Sunday, October 6th, 2019

Open data has become a big movement in local governments. This article raises concerns over the costs incurred in the process of government data provision. The idea that making government data freely accessible – especially when geospatial data is involved – would create direct and indirect costs.

The authors suggest that direct costs must include considerations of individual privacy and confidentiality. Indeed, privacy protection may create direct costs, but government officials must ensure that all open data respects and only discloses information that cannot be attached to individuals. For instance, journey data is being used in a variety of ways to create and improve geospatial data products and to deliver services to uses. The journeys people take can be used to infer where they live, where they work, where they shop. If individuals become unwilling to share movement data, then this will impact the ability for that data to be used in ways that create economic and social benefits.

Besides direct costs, Johnson et al. (2017) identify four areas where the provision of open geospatial data can generate unforeseen expenses. They indicate that the private sector pushes for the release of “high value” datasets to develop their commercial products or services. This could divert governments’ attention from “low value” data. However, note that high-value data could also have a significant impact on citizens. People are taking advantage of applications that made use of open data. Transit commuters know how long they’ll be waiting for a ride. Drivers easily find parking close to where they want to travel to. Renters get detailed information about crime and school for any address. The information that developer access to inform these applications come directly from high-value datasets.

One way to reduce costs is to limit what data sets are published. Public officials tend to focus on the number of datasets they release rather than on the effect of releasing high-quality sets of data. Cities should do a careful analysis of which datasets have the most impact, both in terms of social and economic benefits, so as to avoid hidden costs.

Reflections on Government Data and the Invisible Hands

Sunday, October 6th, 2019

The core proposal of Robinson et al’s work is to promote operational change on how government should share its public data. They point out that the reason for U.S. government agencies tend to have out-of-date website and unusable data is due to regulation and spending too much effort on improving each agency’s own website. Thus, they propose to hand the interaction part of public data, to third-party innovators, who has far superior technology and experience on creating better user interface, innovative reusable data, and collection of users’ feedback.

Although, under current trend of U.S.’s regulation and laws of sharing public data, it is true if the distribution of public data is better operated by third party innovators for better distribution and surplus value creation. I would argue, however, their work is missing some perspective on U.S’s current public data.

The first is standardization, it is more urgent for a public data standard to come out from the government, to ensure data quality and usability, rather than distribution. The top complaining of public data is that even data from the same realm (economic data), can end up very differently from different agencies who published it. This create more severe issue on the usability and accountability of the data, than distributing the data. So. in order for government agencies to become good public data “publishers” in Robinson et al’s proposal, all government agencies have to come up with a universal understandable and usable data standard, rather than each agencies using their own standard, or left the most basic part of data handling to private sector.

The second issue from their proposal is credibility of the data. If all public data is handed over to the public by third-party innovators, for increasing their own competitiveness, they will modify the original data to match what the public want, in stead of the original unmodified data. This create credibility issue, since there is way less legislation and regulation on what third-party distributors can and cannot do to the originally published government data. And this modification is inevitable for third-party distributors, since at least they need to modify the original public data to fit in their database.

At the end, I do think commercializing public data distribution can promote effective use and reuse of public data. Meanwhile create problems in all business, privacy issue, “rat race”, and intended leading on the exposure of more public-interested product, etc.. It will have its pros and cons, but before government agencies can solve their data standardization issue, and regulations are built to supervise third-party distribution of public data. Whether there will be more pros of Robinson et al’s proposal than cons remains questionable.

Reflecting on The Cost(s) of Geospatial Open Data (Johnson et al, 2017)

Saturday, October 5th, 2019

This paper examines the rise of geospatial open data, particularly at the federal level. It looks at very concrete, monetary costs, such as resource costs, and staff time costs; it also looks at the less concrete and maybe less obvious, indirect costs of open data, such as when expectations are not met, and the potential for more corporate influence in the government.

 

In an economics class that I am currently taking, we discussed the seven methodological sins of economic research, and I believe some of these points can transcend disciplines. For instance, one of the sins is reliance on a single metric, such as a price or index. I think it’s important to note that when the authors of this paper were discussing costs, they did not just include monetary costs in their analysis. I believe the addition of the indirect costs is an important component to their argument and that these indirect costs present even more pressing issues than the direct costs do. I think it is very important to acknowledge the far-reaching and even harder-to-solve problems of the effects and influences of citizen engagement, the uneven access to information across regions, the influence of the private sector on government open data services, and the risks of public-private collusion through software and service licensing. 

 

A critique I have of the paper is that I believe the title to be a bit misleading in its simplicity. The title implies that the paper addresses geospatial open data cost across disciplines, whereas the paper addresses the costs only at the government level, and not any other level (for instance, perhaps looking at OSM or Zooniverse, if crowdsourcing/VGI falls under the same category as open data). The abstract, however, makes it very clear that the paper is only addressing issues caused by government-provided open data.

Thoughts on “The Cost(s) of Geospatial Open Data”

Friday, October 4th, 2019

This article framed the direct and indirect costs of geospatial open data provision, with the main focus on the four types of indirect costs. I found this article very thought-provoking because we often think of the benefits provided by open data whereas neglecting the pitfalls that it brings.

One point that particularly interests me was the data literacy issue. The article points out that there exist a number of barriers for users so that even though the data is open there is no guarantee of its use. Similarly, Janssen et al.’s (2012) article argues that these barriers pose the risk that open data is only publicized data in name but is still private in practice. Two points that I want to make here. First, while I understand the advocacy for better data quality and standardized data format, what I want to hear more about is that why does it matter for both researchers and the public to be able to use the data. One could argue that not many people would actually care and researchers are the group that those data meant for. Is public engagement in using and interpreting the open data instinctively good, or does it provide greater returns for the public? I think this could be better clarified here. Second, I’m curious about if VGI or crowdsourcing data belongs to the category of open data.  Dose the costs discussed in the article still apply to VGI and crowdsourcing data? It’s clear that some direct costs such as the cost of data collection could be avoided, but it seems to me that some other issues such as privacy and data quality could be intensified. I think this a question that worth to be discussed.