EDXML Ontologies

Ontologies are machine-readable descriptions of what the data landscape looks like.

Components

Every EDXML data stream starts with the ontology. The ontology is like the introduction of a novel describing the various characters in the story that follows. The ontology consists of the following components:

Concepts

Types of entities that occur in the data set. Depending on what the data is about these may be things like persons, computers, bank accounts, whatever you wish to define.

Object Types

Types of data, like dates, bank account numbers or email addresses. Again, you can define anything you need to describe the data.

Event Types

Types of events that are in the data set. An event type defines one or more properties and the object type of their values, relations between properties, attachment types, and so on. Event types can be seen as templates for paragraphs in a novel, expressed in terms of object types and concepts.

Event Sources

Each event is tagged with the data source that it originated from. Data sources form a virtual tree structure and are identified by means of URIs like

/company/offices/stuttgart/clientrecords/2009/

Ontology components define the structure of events at various levels of detail. At the top level, there are the event types. An event type might represent an e-mail message for example. An event type can define one or more properties, such as a sender or recipient. These properties may be associated with a concept. Each property is associated with an object type. Finally, at the lowest level, an object type defines a data type.

These detail levels are illustrated in the following diagram.

The hierarchical structure of EDXML event types, properties, object types and data types.

Modular

Most ontologies are monolithic, containing a complete description of all types of data. EDXML uses modular ontologies. Each data source only needs to output the definitions that are relevant for the data that it produces: Domain knowledge. This yields simple, small domain ontologies that are easy to create and maintain.

EDXML data streams from multiple data sources can be merged. This requires that the ontologies contained in both streams are merged as well. The EDXML specification outlines how machines can merge two EDXML ontologies into one. This technique is implemented in the EDXML SDK.

The final result is composable knowledge. By combining EDXML data sources one can compose a body of knowledge that correlates objects and concepts from various domains.

Merging distributed ontologies enable integrating semantic data like teeth of a zipper

Ontology Compatibility

It might happen that two ontologies define the same thing in different ways. For example, both ontologies might describe an object type for storing e-mail addresses. Now, the following situations can occur:

Two distinct object types for an e-mail address
Two of the same object types defined differently

The former situation may occur when one data source outputs an object type named email while the other defines email-address while both have the same meaning. Because these are two distinct object types, machines will never notice any e-mail addresses that occur in both data sources. In other words, a computer will not know it when two sources talk about the same thing. The data will not be linked and correlated.

The latter situation can occur when two data sources both define the same, equally named object type while their definitions differ. For example, both definitions might specify a different description of the object type. In this case, the two ontologies are said to be incompatible and cannot be automatically merged.

EDXML provides solutions for both problems by means of versioning and ontology bricks.

Ontology Bricks

Definitions of object types and concepts can be shared between data sources by keeping them in an ontology brick. Ontology bricks are building blocks for constructing modular EDXML ontologies that are logically interconnected and mutually compatible. The EDXML Foundation maintains a public collection of shared bricks on Github.

Note that ontology bricks only define object types and concepts. Event types and sources are specific to a single data source and must not be shared. Each data source can define its own domain specific event types in terms of object types and concepts that it shares with other data sources. This yields modular domain ontologies which can be merged on demand.

This principle of shared versus domain specific ontology components is similar to the principle of double articulation in ontology engineering, where ontologies are composed of domain axiomatizations and application axiomatizations.

Ontology Evolution

Upgrading a data model shared by multiple systems can be a true nightmare: Upgrading one system will expose all others to data they cannot handle.

EDXML ontologies are versioned and the EDXML specification defines a procedure to perform automatic ontology upgrades. In fact, data sources can output upgrades in the middle of an EDXML data stream. These upgrades flow along with the data to all system components that consume the data stream. These system components process the upgrades as part of their regular operation.

Knowledge Representation

When an ontology is combined with EDXML events, knowledge is generated. As this knowledge only emerges as a result of processing the actual data, we say that the knowledge in EDXML is emergent.

other subjects

Story Telling

Concept Mining

Introduction

Scientific Background

EDXML Foundation

SDK

Specification