This article introduced the potential costs brought by open data. Two categories of costs are identified in this article: direct cost and indirect cost. The direct cost of open data can be understood as collecting raw data, preparing data, and maintain the uploaded data. Then the author listed four indirect costs of open data: 1) issue aroused by citizen participation and engagement, 2) difficulty in reaching a sole standard due to unevenness of data provision, 3) tension between government expense and private sector use, and 4) the privatization of open data in private sectors.

I am very interested in the privacy issue in open data due to previous experience with crime datasets. In one previous project, I worked with Toronto Crime Datasets from its government open data portal, and I found out in the data acknowledgement that those points data have all been calculated by certain algorithm so that they won’t necessarily represent the true location of each events. Since this data is available to everyone who have access to the internet, I understand that this calculation is for privacy protection. This little change did not impact much to my project. However, what if some researches really need these kinds of information? Should the government giving out the raw data despite the privacy issue? What rationale should they use in terms of considering giving out sensitive datasets? To me, this is a dilemma of open data should be open or not, and the rationale of this question might also differ between different area or territories.

  site admin

    Crime data always has problems of accuracy (see the critiques of the FBI’s crime database). Mostly those are problems in attributional accuracy. Geographic masking is used to distort the location so the data is still usable for research. Still, you’re right–this is one of the tensions in privacy in open data.

