The internet has made undreamt dreams come true and has left us with a few challenges to cope with ....

The internet has given us a wonderful way of communication: e-mail. Fast, uncomplicated, cheap. It travels as easily to the next door neighbour, as it travels to the other end of the world. E-mail has not entirely substituted other means of communication. It has added to our existing communication. It has made life easier while communicating with far away family members and friends, at little cost informing larger groups of people, quickly checking opinions by means of a mailinglist. E-mail has broadened our scope and has made the pace of life a little faster.

The world wide web has made tremendous amounts of information available through one single window, the browser. Easy and mostly for free. It has made lives easier when checking a train time table, buying a book, planning a holiday. It is an encyclopeadia for any imaginable question. The overload of information is not always a blessing. In the process of searching, we loose quite some time surfing.

The fascinating world of the internet is accessible from every corner of the world and connects every corner of the world. It has made our world small and big and the same time.

The internet is a network for digital communication. Sending information, receiving information, exchanging information, pushing information, pulling information, searching information, sorting out information. The communication all happens between computers, digitally, until it reaches our eyes, ears or, for see-impaired, fingers. For us it is information, for computers it is data, digital data: 0100011100010100100010100011. We thank the computer for the communication increase, we curse the computer for the information overload. We wish the computer would better sort out the information before presenting it to us.

The more meaningful data are to a computer, the better it can process, store and re-use the data received. The more a computer knows about the data, the better it can position them within the data already available. Meaningful data allow for better comparison.

Structure lends meaning to data. A computer recognises data according to the position a piece of data takes within a structure. We are here only concerned about data and structures which in the end are of use also to us, as a user. We are not concerned about data and structures which exist exclusively for the benefit for the computer, like protocolls, error messages, port numbers, internal addresses and other technical data. So in fact we are looking at data and structures which are meaningful both to the computer and to us, as a user.

Information contained in a database or a spreadsheet qualifies as structured data both for the computer and for us. A text document also has a structure with meaning to us. This meaning is, however, not conveyed to the computer. The structure is expressed in the lay-out appearance, which has a meaning to us, but no meaning, or at least not a comparable meaning, to a computer. Such data we call "unstructured data". For data in a text document to gain meaning for a computer, they need to be accompanied by an extra structure.

What happens if a computer receives data that are neatly structured, but with a structure that is unknown to that computer? The computer will discard the structure information, to the extent it cannot interpret that information. As a result valuable knowledge about the data will be thrown away. The data have become meaningless for the receiving computer.

In conclusion: to enable the computer to better sort out the information for us, we need to come to grips with two challenges: creating common structures for unstructured data and ensuring that computers understand each others data structures. To both challenges the obvious answer seems to be: create a platform of standard structures through which computer can exchange data. A platform where computers speak the same language.

HTML and XML

Traditionally information is exchanged in the form of a document. The world wide web has continued this tradition. It takes advantage of a simple but effective lay-out tool: HTML. HTML helps to present digital information in a nice and clear form - a document. To a computer HTML is of little help to understand the data. So XML was created to fill this gap and inform the computer about the data it receives. With XML the computer can better process and store data and it sends data which are better understood by other computers. XML is a means to describe the structure of data. It does so by encapsulating data between an opening <element> and a closing </element>. The name of the element describes the data it encapsulates: <date> 03042003 </date>.

It would be counterproductive if for each set of data a separate structure needs to be created. So in practice standard structures are created. If one wishes to encapsulate information in a structure, "marking it up", so as to turn it into data, which can be interpreted by a computer, one just needs to choose from one of the standard structures. The same applies to data, which are already structured, but the structure is described in a format which is particular for the application which handles the data, for instance SQL for a database. The SQL needs to be translated, transformed to a general readable form, XML.

Increasingly HTML documents are (partly) created from data contained in a database at the time of downloading. HTML is not capable of communicating the context of a piece of data within the database it originates from. The big win is that XML is capable of communicating this knowledge, without preventing the data from being presented in an HTML document form. So the data are meaningful to the computer and at the same time meaningful to us as information.

The use of computers is older than the internet. Networks turned standalone computers into standalone computer systems. The internet offers the possibility to turn stand alone computersystems into communicating systems. The interesting thing about XML is that it does not dictate existing systems to change to XML, it just expects a system to be able to receive data in XML, translate into its own format, store and process and then to spit out the data in XML again.

Now, what?

Now, after this quick review on XML, the challenges referred to at the beginning, seem to have been surmounted: every computer should just spit out its data in XML based on a standard structure. That way any computer can understand all of the data floating around in the digital world. Nothing stands in the way for a free flow of meaningful data ...

Wrong! In fact we have only passed on to the next level of incompatibility. The expectation that a few standard structures will do the job, amounts to the expectation, that we are able to describe our world, our reality in an encompassing way. One may object that we would only look for describing structures for digital information. This digital information, however, is vast in quantity and has inherited the complexity of our world. Many centuries mankind has been working on describing of our world as a whole. It will take a few more. As long as this goal has not been reached, it is also not within reach for XML.

 

Cutting up the world

So what is the solution for the complexity? We should just cut up our reality in separate fields and make structures for each of these fields. For the moment putting aside the dilemma, how to move data between those fields, we will now have a closer look at efforts in this direction in one field, the legal field. Is it be possible to make all encompassing standard structures for the legal field? In other words: can we organise all past, present and future legal data in one or more structures, so as to ensure its free flow around the world?

Standardisation and legal structures

The USA, LegalXML, is in the process of trying to create one structure for all legal documents of a particular kind, e.g. contracts, court filing documents, statutes, transcripts. Whether this is possible still needs to be proven. At the best structures will be created, which are able to serve the legal domain of the USA. It is not a solution for international legal data exchange. Having observed the great difficulties in the USA in achieving general standard structures, it has been concluded that in Europe it would be irrealistic to strive for one structure per document type. The European legal landscape is by virtue of its number of jurisdictions, its legal systems (roman, napoleontic, common law), number of languages and (legal) culture too diverse to undertake, with any chance of succes, a similar attempt as in the USA. Rather the view is, that it is better to allow and encourage a greater number of standard structures to be created. Structures which are created for very specific, actual and practical purposes. So more of a bottom up approach. On the one hand, each of these structures may cover a smaller amount of legal data, on the other hand each structure is likely to be of more direct practical use. From an organisational point of view an added advantage is that smaller communities, with similar interests, can agree on a standard structure more quickly. This approach is promoted by LEXML, the European Forum for XML in the legal domain. Other parts of the world will take their policy decisions concerning the standardisation proces in the legal domain. Beyond doubt we will be faced with a wide variety of structures for legal data.

Interface needed

"Cutting up the world" offers temporary relief only. In the end one is faced with many, many different structures. The data stemming from these different structures don't compare, are incompatible, cannot be exchanged. Unless we have an interface .... An interface, which translates data to data, structure to structure, to make data comparable. To take the easiest example: the interface should tell that <date> in the first structure is the same as <datum> in second, is the same as <DATUM> in the third and as <fecha> in the fourth structure. This is an easy example as it shows a one-to-one translation ("mapping"). More often than not one will be faced with one-to-many, many-to-one, or worse, many-to-many mappings. More about that below, under "Archetypes".

 

Legal RDF Dictionary

Description of the Legal RDF Dictionary

The interface which does the mapping goes by the name of Legal RDF Dictionary. It is a dictionary as it translates data from one structure to the other. It is called RDF, as its core is written in the XML language "Resource Description Framework" ("RDF"). Any computer language which syntax is able to describe "multiple inheritance" will in principle be suited to describe the type of relationships needed for such an interface. The reasons why RDF was chosen are:

  • it is XML
  • it is a W3C standard
  • it is the cornerstone of the semantic web
  • it is simple and straight forward
  • it does not have an overhead of unneeded features, but nevertheless,
  • it has the benefit of extensions like RDF Schema, DAML+OIL
  • it openly shows the mappings
  • it allows for multiple inheritance
  • it can be made available on the web, public
  • it can handle XML structures, just as well as other structures
  • it is namespace adressable
  • it is XPATH adressable
  • it allows each dictionary interface to use parts of other dictionary interfaces
  • it allows for the building of a network of dictionary interfaces, the RDF Dictionary network

The term "RDF Dictionary" has been coined by John McClure. His RDF Dictionary differs somewhat from the RDF Dictionary described here. The basis and the goal of these RDF Dictionaries concepts are, however, the same. As laid down in the joint Statement of John McClure and Murk Muller of May 2001, the compatibility of the two approaches is ensured. The Legal RDF Dictionary is open source and licensed to the public (see IPR notice in the preliminary draft preliminary draft). Any contribution to the Legal RDF Dictionary is made on this basis. Also any mapping made with help of a Legal RDF Dictionary presumes the right of anyone to make use of that mapping according to the public license.

The RDF Dictionary concept is applicable on many levels: from the level of one small particular domain, or a small geographic area, to a national level, bilateral level, going on to an international, supranational and finally global level. It is possible, even desirable, that on all of these levels RDF Dictionaries will come into existence. These Dictionaries complement and reinforce one another, by forming a network and sharing the same architecture. Each RDF Dictionary can take advantage of the work which has been done for other RDF Dictionaries by the simple, but very effective, name space mechanism provided by XML/RDF. The architecture aloows for organic growth for each of the RDF Dictionaries individually, as well as the the network of RDF Dictionaries. For the world wide web the network of RDF Dictionaries will provide for a network of structures on top of the network of hyperlinks.

Archetypes - we need fuzziness to achieve precision

One to one translation of terms originating from different jurisdictions often lacks the desirable precision. The Legal RDF Dictionary therefor uses the concept of "Archetypes" to achieve a more precise translation, although from computer perspective it introduces some degree of "fuzziness". For instance, the German term "Urteil", the similar Dutch term "vonnis" and the English term "judgement" or "verdict". A set of Archetypes defines aspects of these terms, like "decision", "in writing", "public", "appealable", "enforceable, "preceeded by proceedings". The particular term, for instance "Urteil", is mapped to those Archetypes which apply to it. If in an XML instance document one finds a term "Urteil", by virtue of the instance document having been marked up according to a datastructure which is mapped to the network of RDF dictionaries, one is able to establish a precise as possible meaning of that term in the context of one's own legal system. How? The mechanism to establish the meaning uses the fact that datastructures, stemming from ones own legal system are also linked to the RDF Dictionary network. The mechanism is illustrated with the example of Urteil/vonnis/judgement/verdict in a first draft of a legal RDF Dictionary to be found at http://www.lexdata.org/dictionary/ .

Beyond RDF

At this point of time the Legal RDF Dictionary does not describe the legal system in any way. It is concerned with nothing else than ensuring that data of an existing structure can automatically be translated to data of other existing structures. It therefor stays, as it were, within the digital world. It does not make any declaration about the real world or the legal system as a whole. The binding link between the structures are the Archetypes. The Archetypes are, at this moment, a mere list. There is no order, hierarchy or thesaurus like structure to it. Should, at a later stage, it be felt desirable to extend on the structure of the Archetypes and to make a bridge to the legal system as a whole, perhaps by using existing legal classification schemes, a good candidate to describe the structure of the Archetypes would be topic maps, which also has an XML expression syntax: XTM.

Opinions on the Legal RDF Dictionary

Since one and a half years now the Legal RDF Dictionary has been available as a method to make public the mapping between legal data structures, a method which at the same time offers the possibility to process data originating from a multitude of different data-structures in the legal domain. Written publications and oral presentations throughout Europe have made lay and expert audiences aware of the Legal RDF Dictionary. The legal RDF Dictionary has met with strong interest and support in the commercial, governmental and academic world, both from the technical and from the user side. It is generally seen as an

  • effective
  • simple
  • flexible
  • transparent and
  • public

method to link and map data structures. It is recognised that the RDF Dictionary will allow computers to receive, handle, store, compare and redeliver XML data, stemming from different structures in an accurate way. Also it is seen as an advantage, that the RDF Dictionary architecture allows for an organic growth, both for each RDF Dictionary individually and for the network of RDF Dictionaries as a whole. During the two years no concurring concept to deal with the issues addressed by the Legal RDF Dictionary has appeared.

 

LEXDATA.org

Goal - FREE FLOW OF LEGAL DATA

To channel the interest and support and to encourage a widespread use of the RDF Dictionary method in the exchange of legal data, it has been decided to found LEXDATA.org. The goal of LEXDATA.org is the FREE FLOW OF LEGAL DATA. It pursues its goal both by standardising the Legal RDF Dictionary and by applying the Legal RDF Dictionary to a "Schema Repository", containing legal data structures.

Stakeholders

LEXDATA.org is an organisation which is controlled by a limited amount of "Stakeholders". The highest possible degree of objectivity is achieved through a balanced spread of interests represented by the Stakeholders. The number of Stakeholders is kept small enough to ensure quick decisions, but large enough to ensure influence on widespread acceptance of the Legal RDF Dictionary. Stakeholders are expected from the public sector (in particular Ministries of Justice and e-Government) and the private sector (in particular legal publishers and legal technology providers). Becoming a Stakeholder is a privilege endowed with resulting responsibilities. In return a Stakeholder takes part in a leading development within a community of front-runners, resulting in a global standard for the exchange of legal data. Long term investment decisions can be made with greater certainty. Stakeholders are either funding or non-funding. Non-funding Stakeholders can be parties which are not able to contribute in funds, but are able to pledge a comparable amount of man-power and/or other contributions comparable to the monetary contribution of a funding Stakeholder. Funding Stakeholders become co-owner of the Repository and of the first Legal RDF Dictionary. A Stakeholder co-decides on the Archetypes, which will set the example for others. The Repository with the Legal RDF Dictionary will be the first major hub for legal data. The hub opens a myriad of possibilities for data-enhancing for the public good and for commercial exploitation. The Stakeholder will have a competitve advantage by laying the corner stone of the network of legal structures in the digital world.

Standardisation Participants

Any entity or person may apply for participation in the Standardisation Process. Acceptance will be based on qualification and expected contribution to the Specification. A Standardisation Participant is expected to be aware of the practical nature of the Legal RDF Dictionary, therefor avoiding academic discussions. A Standardisation Participant may apply for a patronage by a Stakeholder, to contribute to the participation fee. Help in partnering is offered. If and under what conditions a Stakeholder does take patronage, is at his discretion. It is expected, that Standardisation Participants will in principle be largely from the academic world.

The Standard

Standardisation is effected by writing a Specification (version 1.0), which describes the method in detail, according to the accepted requirements and practices of drafting a "Technical Report" und its progress to the status of "Recommendation". The Specification will be the result of discussions primarily conducted through mailinglists, but accompanied by face-to-face meetings. The standardisation proceedings will adhere to a strict scheme. Once the Specification has been agreed upon, it may be submitted to one or more long standing standardisation bodies for a short review, acceptance, further publication and public support. At present following bodies have been suggested: OASIS, W3C, CEN, CENELEC and ISO. This issue will, however, be decided and negotiated at the time the Specification has been finalised

The Repository and first Legal RDF Dictionary

The Repository is, against a fee, in principle open to any structure which contains legal elements. A main requirement will be, that the submittor will have to map his structure to the other structures through the Legal RDF Dictionary of the Repository and according to the method as described in the latest available draft-Specification. The list of Archetypes, the vocabulary of the legal RDF Dictionary, will be created on an ad hoc basis, according to the needs of the structures submitted.

LEXDATA.org will not be involved in developing tools, other than a few simple open source tools for presentation, navigation and basic processing of data. The structure is put in place by LEXDATA.org, development of tools is left to others.

Murk Muller

LEXDATA.org is set up and will be lead by Murk Muller. By birth a Dutchman, he practices as attorney-at-law in Berlin, Germany. Since twenty years he has dedicated himself to the efficient use of the computer in the legal domain. Fluent in a number of languages, he has practiced as an attorney in various parts of the world. Murk Muller is often invited as a speaker and author on mattters of XML and the law. LEXML, the forum for XML standardisation in the legal domain in Europe, was initiated by him. Murk Muller has created the concept of the Legal RDF Dictionary. He has lead a consortium, consisting of European Governments and Universities leading in the field, bidding for EU funding for an RDF Dictionary project. The independence of Murk Muller has enabled him to bring together parties all across the legal domain to work together on harvesting the fruits offered by XML.

Taking part in LEXDATA.org

Requests to become Stakeholder or Standardisation Participant may be submitted to info@lexdata.org. Please include relevant information on the background of your interest, for evaluation of possible eligibility.

mm version 1.1.2003