Citations, AALL citations and primary keys

Citation formats are an odd sort of thing: people have very different expectations as to what they really are and what they are supposed to do. Debate about this is more common than it used to be because as we adopt electronic sources as our main resource, the nature of what we read has changed. So far, we have dealt with the situation mainly by ignoring the situation and sticking to old rules. Rarely do we get to the thought of what a citation can do, if given the chance. This is something which warrants exploring.

First, like I said, a lot gets said about what a citation actually is and what it is supposed to do. For the purpose of understanding the actual role of citations in the legal world, however, a great deal of this discussion is irrelevant. We need to address how citations are currently used in practice if we are to use them in constructing a useful online resource. We need to work from a functional analysis.

As currently used, a typical citation to an American court case consists of the caption of a case, and a reference to the volume and page number of a print report series, for example: State v. Jones,127 N.J. 411. To this is often added a year and perhaps a clarification of the court involved, but the citation itself is this initial string.

As a functional matter, a court decision citation shows us where to look for that decision, should we have that print reporter. Since those volumes are no longer used very often, the citation is more significantly used as a well recognized basis from which to cross-reference this material from diverse sources. Finally, a citation typically imparts a vague but somewhat useful bit of information about the document, (i.e. Jurisdiction, timeliness (by the value of the volume number), and level of authority).

It is important to note that since the page in a reporter volume may contain multiple documents (in the instance of table cases, etc.), The caption is necessary to uniquely identify the document. Logically, caption and page number together do not guarantee uniqueness. As a practical matter, however, the occurrence of identically named cases on the same page of a reporter is so small as to be insignificant to human readers of print. So, it has worked.

When designing a digital repository for court decisions, this old citation scheme still works to the extent that it can be used as a common reference point, but it cannot be used as the type of unique identifier that is needed to construct a useable primary key for a metadata system. This is counterintuitive, but it is a fact that needs to be addressed.

Keep in mind that the citation's primary function was never to act as a unique identifier as such, but as a page reference. In addition, the specification of the reporter also gave some indication of authenticity of the text. Over the years, however, the authenticity aspect of citations has fallen by the wayside. Even before electronic legal research, the standard Bluebook citation evolved early on to refer primarily to an official or common source. Even where multiple sources were specified, there is little indication of exactly what text the author had accessed. Publishers facilitated this breakdown by including page references to the “official text,” with the result that it was impossible to say which print edition a lawyer accessed. So, although a citation may have been a reference to an “official” source, in reality, the text actually quoted was not guaranteed to be.

With the advent of electronic research, this problem has become exacerbated. Westlaw and Lexis are not, and have never been considered an “official source” for any court. However, they are the de facto source for much, if not most legal research done today. Nevertheless, the standard citation for a reference to material read from Lexis or Westlaw is the reference to the official print source.

So much is well known and accepted. It is therefore dishonest to insist on reference to an official print source as some sort of guarantee of authenticity. No one is looking at the print.

What remains is nevertheless useful. The official print citation is still a well recognized common starting point for researchers to cross reference the various versions of a court decision that are available. In addition, the nature of official reporter titles typically indicate other useful information, such as the jurisdiction and competence of the court.

The key idea, however, is the fact that the court decision citation, as currently used, is only a point of cross-reference. No more than that. Many will argue about authenticity and the convenience of a definite place reference, but in today's environment, those arguments are specious. No one is looking at the definite place reference aside from law review cite checkers. And even they will stop in the near future.

On the down side, representing that is what a citation is. It is expensive for new entrants to the publishing field to reproduce. It is anachronistic. It feeds the monopolistic trends that ought to be fought. Something new would be nice. Maybe something better.

Electronic Publishing and the Computer Heads that do it

For computer people and to some extent in the eyes of the lawyers who use them, citations ought to be and are thought of as a unique identifier of a court decision. A presumptive primary key for a metadata table. Most programmers I know have been perplexed and disappointed when they discovered it is not. When they find that not all decisions even have a citation (at least not in the sense that I am using the term here), it is disappointing indeed.

So, without a citation as a primary key, we cast about for alternatives. It quickly becomes apparent, however, that there is no one piece of data in a court decision that is necessarily unique. In fact, even combinations of data elements are subject to duplication.1

The above statement is important. Think of any combination of metadata that would be common to any court decision. Even a combination of date, docket and document-type (i.e. An order, memorandum, opinion, etc.) is subject to duplication in the normal course of judicial business. The fact that it is very rare that a court would issue two opinions in the same case on the same day makes current schemes workable. But, particularly on the trial level, such things do happen. So, no matter what, an additional data element needs to be added to the native metadata in order to insure uniqueness.

The unavoidable bottom line is that there is simply no existing unique identifier for court decisions. And, in order to construct a workable online system for storing and retrieving them, a unique identifier is essential. We have to make one.

In the world of libraries as well as database programming, this situation is actually a regular occurrence. The simple solution is to make something new that is unique. In library work, we would call this an accession number. Database programmers usually handle this with something like an autoincrement field. They are the same thing. An arbitrary identifier that will differentiate each instance.

Given that this is what needs to happen, what kind of identifier should we adopt? At the simplest level, a simple autoincrementing accession number would work. At that point, users could have all the traditional metadata information that has always been used to locate a decision, which albeit imperfect, will work just as it always has. The storage and retrieval systems will also have what it needs to differentiate documents. And, if users were to see the accession number, it would just be an odd piece of information that need not be considered by them. In fact, if one examines the databases of court decisions that are available online right now (including Westlaw, Lexis, etc.), everyone, in one way or another, is deriving and assigning the equivalent of an accession number to their documents. In the case of the courts and most free websites, one will note filenames that consist of an apparently random string either with or without other data that may be interpretable.

In the case of Lexis and Westlaw, the use of an accession system is particularly significant because those companies have advocated the adoption of their accession numbers as an alternate citation when citing their material. The fact that these schemes have, in fact, been adopted as such by the legal community makes this even more significant, particularly in light of the fact that these citations impart only the year of accession and the vendor. They indicate nothing about the court or jurisdiction, but they are still accepted as a de-facto standard (see The Bluebook, Rule 18.1.1).

So, we have a situation where we have the need and the opportunity to establish a common system for assigning unique identifiers for our court decisions. If it is of a proper quality, it can act as an alternative citation to existing citations.

What we need is something that contains enough information, and is also short and simple enough for the citation to be easily memorable. In theory, it could be a compilation of all the available information about the decision, but that would be too long. Neither is this necessary. Since metadata will be stored about the decision, it is not necessary to repeat that information in order to construct a unique identifier. A long and confusing citation will also be confusing, subject to error in both construction and subsequent use, and will hinder readability when used in legal writing. For all those reasons, a long and complex citation method would probably not see wide adoption. Simple is needed. Of course, simple will require compromises with the amount of information imparted, but that is par for the course. Something that tells us something, and which is unique to the document will be better than anything yet. And, if it simple, short, and will not interrupt the text of a legal document, it may well be used. Finally, as an open standard, such a citation should be useable as the basis of a cross-reference mechanism between citations in a document and the documents which are cited. In the context of an online system, this means it should operate as the basis of a link resolver.

AALL Citation Format

The AALL citation format was designed and intended to fit all of the requirements of a reformed citation and a primary key. They make use of abbreviations that are already common in legal writing, so their adoption would be fairly easy for the legal community. Their format, year, standardized abbreviation, accession number (e.g. 2008 PA Super 42), is easy to read, and imparts more information about a decision than a classic print citation (i.e. the year of decision as well as the court). The accession numbers are also a huge advantage because of their own simplicity. The intent of the AALL drafters is that the courts adopting media neutral citation make a practice of assigning these numbers to decisions as they are issued. The numbers themselves are a simple sequence, and so assigning them is a simple matter. Even should a court decide not to do this, however, an individual publisher or group can easily do it as well, assigning numbers to decisions as they are received.

Note that such a practice is very fault-tolerant. Since the accession number is a mere sequence, it does not, in itself, have special significance beyond being unique. So, for example, if a publisher were to discover additional documents they had not initially gathered, they can merely be assigned the next available numbers in the sequence. One can imagine a court clerk doing the same on occasion.

A more complex primary key would be both wasteful in that it would repeat information already present in the rest of a metadata record. A certain amount of such information is useful to provide readability and memorability, but brevity is equally important. The practical solution is to keep the new data as short and simple as possible, and limit the repeated data to a bare minimum of key data. I suggest that the year of the decision and a standardized abbreviation for the court fits the requirements of brevity and essential-ness very well. The simple sequential accession number is as simple and understandable as such a thing can be. It is simple, yet both flexible and unbounded.

Finally, as with any such plan, external support from the user community is key. In the case of AALL citations, the law library community and the American Bar Association have endorsed the scheme. In addition, there are 16 American jurisdictions that have already implemented it. In a form only slightly different, the same basic citation style has also been adopted in BaiLII (Britain and Ireland), AustLii (Australia), CanLii (Canada), HKLII (Hong Kong), NZLII (New Zealand), and SAFLII (Southern Africa).2 It is a proven and workable scheme.

For the purposes of online law publishing, there are very good reasons why the primary key of a metadata system should also be a workable citation. The most obvious is the most compelling: it makes link resolving and automated hypertext linking predictable and easy. Given that it is possible, we ought to insist on it.

1Date is issue and docket are close, but still not necessarily unique.

2The only significant difference is that these jurisdictions place brackets around the year. They have the appearance: “[2008] CanSup. 85”.