Congratulations, Yuan

Congratulations to Yuan Jin for his Masters Thesis in Computer Science.

Bridging the Ontological Gap between Semantic Web and the RESTful Web Services

Abstract: Data are produced in large quantities and in various forms around the globe everyday. Researchers advance their research depending on the availability of necessary data and the discovery of them. As people’s demand to manage the data grows, however, three problems appear to hinder the attempts to effectively leverage the data. One is the semantic heterogeneity found in linking different data sources. Database designers create data with different semantics; even data within the same domain may differ in meaning. If users want to acquire all the obtainable information, they have to write different queries with different semantics. One solution to such a problem is the use of ontology. An ontology is defined as a specification for the concepts of an agent (or a community of agents) and the relationships between them (Gruber 1995). Concepts and relationships between concepts are extracted from the data to form knowledge network. Other parties wishing to connect their data to the knowledge network could share, enrich and distribute the vocabulary of the ontology. Users could also write queries to the ontology by any RDF query language (Brickly 2004). The use of ontology is part of the Web 3.0’s effort to provide a semantic-sensitive global knowledge network.

A second problem is about new ways to access data resources with ontology information. People used to build application-specific user interfaces to databases, which were offline. Now many choose to expose data in Web Services. Web services are a system to provide HTTP-based remote request calling services that are described in a machine-readable format (Haas and Brown 2004). They usually provide application (or web) programming interfaces to manage data. The question is Web Services are born in a world of applications relying on conventional ways to connect to data sources. For example, D2RQ (Bizer and Seaborne 2004) translates queries against ontology to SQL queries and it depends on JDBC to read from relational databases. Now the interfaces for these data sources are going to be changed. The Semantic Web world faces the challenge to lose data sources. If Web Services were going to spread over the Internet one day, this lack of connection would hold back me from applying the ontology to connect to heterogeneous data sources.

A third problem (or constraint) is working within the specific project domain. I embed this within a humanities cyberinfrastructure that integrates Chinese biographical, historical and geographical data. The data sources come in various forms – local and remote relational databases and, RESTful Web Services. Working with both legacy databases and the new web application interfaces narrowed down my choice of solutions. Commercial products provide ways to “ontologicalize” the Web Services. I argue that they are heavyweight (e.g. unnecessary components bound with the product) and cost-prohibitive for small-scale projects like ours. Several mature open source solutions featuring working with relational databases provide no or very limited access to Web Services. For example, no clue is found in D2RQ to join Web Services into their system, while OpenLink Virtuoso answers calls for SOAP but cannot manage data from RESTful Web Services.

I propose to build a connection between ontologies and Web Services. I devise the metadata to represent non-RDF Web Services in ontology, and I revise the code and create new data structures in D2RQ to support ontology queries to data from RESTful Web Services.