Who created EDXML?
EDXML was created in 2009 by Dik Takken in a quest to find a data representation that is extensible, true to the original data, domain independent and not too complicated.
Since 2012, EDXML is a formal specification, published under a permissive Creative Commons license.
Where does the name of EDXML originate from?
E is for Event, D is for Dataset. And XML is.. Well, you know what that is.
How is EDXML different from other versatile data serialization formats, like Apache Thrift, Protocol Buffers or Smile?
The biggest difference is the semantics that is included in the data stream. This allows machines to 'understand' what the data means and how to process it, providing the means to develop simple reusable data processing components that automatically do 'the right thing' with your data.
Existing serializations require that all system components know what the data means in advance. This means that all components need to be programmed in advance how to interpret the data. With EDXML, you could say that the basic application logic that is needed to process the data is packaged with the data itself, generated by the data source. Only the source needs to know how the data works, other systems 'learn' from the source.
EDXML is event based. Does every event need a timestamp?
No. The name 'event' suggests that it does, but a timestamp is not mandatory. Events can contain zero, one or multiple timestamps.
Can EDXML streams contain binary data, like PDF documents and photos?
Not directly. EDXML can refer to externally stored binary files by means of hashlinks. A hashlink is an EDXML data primitive that contains the SHA1 hash of the file it refers to.
What limitations does EDXML have, representation wise?
First of all, context is everything in EDXML. Without context, data has no meaning. Without meaning, computers have no clue what to do with it. So, you cannot just take an arbitrary bunch of information and call it an event. You need to define an event type first, which provides context for your data.
Second, information must be at least semi-structured. Events are made out of properties, some of which may be optional, some may be mandatory. Unstructured data, like a human written text, can be stored as event content.
Can EDXML represent nested, hierarchical data structures?
Yes, but not within a single event. You can define parent-child relationships between events, creating tree structures. These relationships are implicit, encoded in the semantics. The event data itself does not show any hierarchical structure.
How is EDXML different from standards like DFXML and STIX?
Scope, mainly. A single EDXML stream can mix DFXML data and STIX data, while neither DFXML nor STIX can represent EDXML data, generally.
Is EDXML a schemaless format?
No. The structure of events needs to be defined first (event types). However, introducing new event types is easy and can usually be done without software updates or down time.
Does EDXML include a transport protocol?
No, it does not. You are free to use HTTP, message queueing systems or a pile of paper as a means of transport.