Standards development roadmap: American judicial opinion metadata


Markup standards are cumbersome to develop, agree on, and implement.  They are also divisive: the expense and difficulty of elaborate standards argues for simplicity, but accuracy and depth of representation argue otherwise.  Like all complicated things, they benefit from a one-step-at-a-time, layered approach.  In the XML world, a really fine example of this approach is the markup standard for scholarly materials from the Text Encoding Initiative.  This page exposes a layered approach to judicial opinion metadata.

Originally, the idea was to produce a multilayered standard that could be used in conjunction with the OAI Protocol for Metadata Harvesting (as a series of progressively more elaborate metadata formats, in OAI terminology).  But the underlying thinking is more useful than that, forming the basis for work with metadata implementations that can be (eg.) carried around in the document as RDF, a series of <META> tags, or what have you.  This document was originally written with OAI-PMH in mind, and that probably still shows in spots.

The Big Idea

The idea, from the beginning, has been to build something simple that can then be made as elaborate as is needed. As it turns out, the OAI-PMH standard helps with this. It requires a very basic mapping of your metadata into unqualified Dublin Core; beyond that, you can use OAI to promulgate any assortment of metadata you want, as long as you provide an XML schema for its validation. This suggests a plan:

  • First, develop the very simple mapping of caselaw metadata into unqualified Dublin Core that OAI-PMH requires. That's done, and you can read about it here. This is the so-called Level 1 schema. It does a pretty good job of providing basic metadata but it's ruthless about what it leaves out. Its priorities are also cheerfully and unashamedly biased in favor of the common law system in general and the American Federal court system in particular.
  • Second, develop an XML schema that will represent caselaw metadata without the compromises that were necessary in the first scheme, but still stay within reasonable bounds. By "reasonable bounds", we mean "using only data that most courts and publishers can easily cross-walk from their existing publishing and case-management systems". That means, on the one hand, that we should work on (eg.)more sophisticated representations of the participants in a case, but (on the other hand) stop short of representations that require a lot of work in standardization, such as representing the procedural posture of a case. This Layer 2 schema can be thought of as a "workhorse" schema -- easy to make work with existing data collections, not imposing huge burdens, and avoiding lengthy standards-wrangling in the interest of getting something out there that people can use. Layer 2 is intended to do a better job that Layer One could at
    • representing all the actors in a case -- judges, parties and their representatives
    • pulling in other sorts of case related literature such as briefs
    • answering any systemic objections to Level One's cheerful bias in favor of the common-law system in general and American courts in particular.
  • Third, develop a series of idiosyncratic schemas that represent whatever people want. Good targets for these Layer 3 schemas would be
    • representation of "nationalisms" having to do with procedural matters or with the structure of the court system
    • more sophisticated representation of procedural posture in general
    • more sophisticated representation of citation and other crossreferencing systems

Some implications

Also part of the development roadmap, but not addressed here, are all those things that it starts making sense to do once we have a fair number of participants and harvesters that are building large-scale applications. Examples would include:

  • a system for name authority control (or at least consistent external identification) around legal people in particular but perhaps also parties
  • a registry for courts and court names, to be made part of the identifier scheme.