Jones, et al.: Modelling vague places with knowledge from the web

In “Modelling vague places,” Jones, et al. introduce a novel method of natural language processing for vague toponymic data. They use open-source Named-Entity Recognition methods to extract associative place-names from the results of Google searches of vague toponymic terms such as “the Cotswolds,” an area straddling 6 different counties in Southern England. Then, using a gazetteer, they assign coordinates to the data extracted to transform the text into geolocated points. These are interpolated using density estimation techniques to draw the boundaries of vaguely-defined regions.

The process is representative of the general move toward big-data research: in the past, researchers on the topic would conduct interviews with a necessarily limited number of human beings who would sketch out their notions of boundaries or centres of vague areas. Meanwhile, GIS systems employ administrative definitions which are clearly not always suited to the needs of, say, a google-maps end-user who wants to know the boundaries of a neighbourhood such as Mile End, which has no official representation on a map or spatial data layer. Ask 10 different Montrealers where the southern boundary of the neighbourhood lies, and you will probably get several different answers. If an ontologically precise boundary definition were the goal, we might prefer the huge n-value of this sort of textual analysis to the anecdotal reports of several different people.

While the researchers employ a gazetteer to assign geographic coordinates to place-names, we can imagine that geolocative metadata extracted from Facebook posts or tweets could offer a potential alternative, especially when dealing with small, densely-populated areas of cities rather than large regions like the Cotswolds or Scottish Highlands.

I imagine that big-data approaches offer a lot to the development of natural language processing–the ability of machines to process language as humans do. In some areas of NLP, such as named-entity recognition, machines can almost match humans’ ability to determine which words signify a person, an organization, or a place. As computers become better at thinking like us, they may begin to teach us the “truest” meaning of our own concepts.


Comments are closed.