Story Telling with EDXML

Thinking about data as stories is tightly related to actual EDXML features like event types and concepts.

A book as an analogy for semantic EDXML data which is like a story, including characters and story lines

Events as Stories

EDXML data contains a sequence of data records called events. Each of these events represents a little story. Literally, because EDXML events can actually be translated into plain English. A data source generating a sequence of events is like a novel being written.

Two sequences of events of different types, such as those produced by two different transcoders, are like story lines. Each of them is developing in its own way and at its own pace. Many event types can be defined to represent all sorts of things. Think of a financial transaction, an appointment in an agenda, a PDF document, a Twitter message, virtually anything.

Machines can read these stories and understand what the story is about. This is achieved by including a kind of grammar with the data.

Ontology

This grammar is contained in the EDXML ontology. Before a data source outputs any EDXML events, it will first output an ontology containing the definitions of all types of events that will appear in the data that follows. These definitions explain the structure and context of the story that is about to unfold. Context helps to understand the meaning of all the happenings (events) that make up the story line.

Grammar

The event type definitions are similar to grammars used in natural languages. Grammar gives structure to a sentence, enabling us to understand the meaning of it. The idea of using grammar to give meaning translates nicely into the way EDXML works. We have all done grammar in school, so we will use that to illustrate. Consider this sentence:

An example sentence showing marked concepts

In the above sentence, we marked the nouns and associated them with two different concepts. An EDXML ontology does the exact same thing by associating event properties with concepts. The introduction shows what these associations actually look like in some detail. In actual data, the concepts are more likely to be dull things like products, users or files rather than princesses and castles. The mechanism is the same.

The EDXML 'grammar' also relates different components of a sentence. For example, we can define a relationship between the person and the building:

An example sentence showing an inter-concept relation

In EDXML we do this by defining an inter-concept relation between two properties of an event type.

We can also mark the adjectives in the sentence and relate them to the concept that they refer to:

An example sentence showing an intra-concept relation

This is done by defining an intra-concept relation between two properties.

Combining definitions of event types, properties, concepts and relations, machines can read the event data much like how humans read a book.

Concepts

We already mentioned concepts. Concepts are a key feature in EDXML. Their role is similar to characters in a story. The ontology introduces the various characters that feature in the data. They are basically just mentioned and listed up front, it does not contain any spoilers regarding how they will develop in the story.

Egyptian fresco depicting various characters used to tell a story

Like the characters in a real novel, EDXML concepts have an emergent nature: Only by reading the full novel and following story lines the reader eventually gets to know the characters. In EDXML, concepts are definitions of items from which the information lies scattered about in multiple events. Events that may even be generated by multiple data sources. By reading the full data set, all knowledge about a particular concept can be correlated automatically. This process is called concept mining in EDXML.

So the stories that EDXML events represent may be told by multiple data sources in concert. Each source introduces its own event types and concepts and independently outputs its own event sequence. Interesting things may happen when the stories told by the various sources are somehow related. For example, the same character may appear in multiple sources and each source might tell about different aspects of the same character.

Stories: More is Better

Incorporating more data sources into data analysis operations can be challenging. The number of characters and story lines becomes overwhelming and database queries get complicated. The challenge of solving the puzzle becomes increasingly harder.

A pile of jigsaw puzzle pieces as an analogy for a mixed data set

EDXML turns this upside down. The more diverse the data set, the more opportunity for machines to correlate and produce high grade knowledge. Each new property relation unlocks new reasoning paths that machines can use to discover new facts. Where cognitive strain forces humans to surrender, machines excel and come to their rescue.

Now the computer can guide the analyst instead of the other way around.

other subjects

Introduction

Concept Mining

Ontologies

Scientific Background

EDXML Foundation

SDK

Specification