Story Telling with EDXML
Events as Stories
EDXML data contains a sequence of data records called events. Each of these events represents a little story. Literally, because EDXML events can actually be translated into plain English. A data source generating a sequence of events is like a novel being written.
Two sequences of events of different types, such as those produced by two different transcoders, are like story lines. Each of them is developing in its own way and at its own pace. Many event types can be defined to represent all sorts of things. Think of a financial transaction, an appointment in an agenda, a PDF document, a Twitter message, virtually anything.
Machines can read these stories and understand what the story is about. This is achieved by including a kind of grammar with the data.
Ontology
This grammar is contained in the EDXML ontology. Before a data source outputs any EDXML events, it will first output an ontology containing the definitions of all types of events that will appear in the data that follows. These definitions explain the structure and context of the story that is about to unfold. Context helps to understand the meaning of all the happenings (events) that make up the story line.
Grammar
The event type definitions are similar to grammars used in natural languages. Grammar gives structure to a sentence, enabling us to understand the meaning of it. The idea of using grammar to give meaning translates nicely into the way EDXML works. We have all done grammar in school, so we will use that to illustrate. Consider this sentence:
In the above sentence, we marked the nouns and associated them with two different concepts. An EDXML ontology does the exact same thing by associating event properties with concepts. The introduction shows what these associations actually look like in some detail. In actual data, the concepts are more likely to be dull things like products, users or files rather than princesses and castles. The mechanism is the same.
The EDXML 'grammar' also relates different components of a sentence. For example, we can define a relationship between the person and the building:
In EDXML we do this by defining an inter-concept relation between two properties of an event type.
We can also mark the adjectives in the sentence and relate them to the concept that they refer to:
This is done by defining an intra-concept relation between two properties.
Combining definitions of event types, properties, concepts and relations, machines can read the event data much like how humans read a book.Concepts
We already mentioned concepts. Concepts are a key feature in EDXML. Their role is similar to characters in a story. The ontology introduces the various characters that feature in the data. They are basically just mentioned and listed up front, it does not contain any spoilers regarding how they will develop in the story.
Like the characters in a real novel, EDXML concepts have an emergent nature: Only by reading the full novel and following story lines the reader eventually gets to know the characters. In EDXML, concepts are definitions of items from which the information lies scattered about in multiple events. Events that may even be generated by multiple data sources. By reading the full data set, all knowledge about a particular concept can be correlated automatically. This process is called concept mining in EDXML.
So the stories that EDXML events represent may be told by multiple data sources in concert. Each source introduces its own event types and concepts and independently outputs its own event sequence. Interesting things may happen when the stories told by the various sources are somehow related. For example, the same character may appear in multiple sources and each source might tell about different aspects of the same character.
Stories: More is Better
Incorporating more data sources into data analysis operations can be challenging. The number of characters and story lines becomes overwhelming and database queries get complicated. The challenge of solving the puzzle becomes increasingly harder.
EDXML turns this upside down. The more diverse the data set, the more opportunity for machines to correlate and produce high grade knowledge. Each new property relation unlocks new reasoning paths that machines can use to discover new facts. Where cognitive strain forces humans to surrender, machines excel and come to their rescue.
Now the computer can guide the analyst instead of the other way around.
Copyright © The EDXML Foundation