Citation metadata for court decisions presents special challenges for reference managers like MLZ. The persuasive force of statutory law depends in part on consistent and logical presentation, and as a result it is often (relatively) easy to derive minimal but accurate descriptive metadata from statutes published to the Web.
Court judgments are a very different story. Here, the authoritative force of the document derives more immediately from the police power that stands behind it. Court orders obviously need to be readily identifiable on their face: but where court administration is decentralized (as in the US), courts as a class have no special incentive to standardise the presentation of descriptive metadata—the construction of systems for referencing precedent can be left as a task for the legal profession and surrounding community to sort out. The result is a good deal of variety in the arrangement of these details on the page. People cope with this just fine: but extracting uniform structured metadata from the text of court judgments can be hard work for a computer.
The pages supplied by Fastcase do not contain structured metadata per se, but the service does impose a degree of organization on court-supplied document details: essential identifying information is centered (in HTML) and it is arranged in a consistent order. We leverage this predictability with a few simple heuristics to identify and select the metadata elements noted in the illustration to the right. These have the following characteristics:
- Case name
- A simplified case name is taken from the “citation” line displayed above the text proper.
- Court hint
- The “citation” line also provides an abbreviated indicator of the court deciding the case. For Federal cases only, this is mapped to the MLZ jurisdiction variable.
- The “floor” of the header is assumed to be the decision date. Multiple dates (such as the date of argument) may be listed, and the decision date may or may not be preceded by a descriptive label. The “best” date is identified by heuristics coded into the translator: scraped items should be checked for accuracy against the text.
- Reporter cites are typically listed first in the header information, in the customary volume-reporter-page format. Certain reporters in fact represent vendor-neutral citations, and these are handled specially by the translator (see below). Separate items are created in the MLZ database for each listed reporter, mutually linked via the Related tab in the MLZ item info panel.
- Court name
- To identify the court name we scan up from the dates, stopping where a line containing “court” or “panel” is encountered. For state court decisions only, the state is also determined from the court name. The collected lines are then combined and stripped down to a minimal description: this field too should be checked for accuracy against the text.
- Docket no.
- The docket number is the least regular of the metadata elements, and the last thing that we hunt for, after the dates and the court name have been set aside. Where an order applies to multiple cases, the docket numbers of all will be pulled in by the translator. Heuristics are again involved, and this field should be checked for completeness and accuracy. Note that vendor-neutral cites for New Mexico cases are listed in the docket number area of the header information, and are exceptionally interpreted as as such by the translator.
The end result of all this jiggery-pokery is a rather clean parse of the case metadata into MLZ fields.  The jurisdiction is identified down to the state or Federal district level, as shown in the illustration to the right. The metadata generated by the translator is sufficiently precise to fashion correct citations in documents in each of the MLZ legal styles, when supplemented by style-specific adjustments applied via the Abbreviation Filter.
MLZ case items fall into one of three categories, depending on the metadata they contain. An item representing a nominate report of a US judgment has values for Reporter, Reporter Volume and First Page. A vendor-neutral item has no reporter name, but has values for Year as Vol. and First Page (being the serial number or other identifier used in the state’s vendor-neutral citation scheme).  An unreported case has none of these fields, but should contain a Docket No.
In addition to individual cases, the translator can be used against search listings, such as the one shown to the right. The logic applied is the same as for individual cases—each case page is read and parsed individually behind the scenes. The response time of the translator in this mode may be somewhat slow, but the end result is the same, and the browser will not freeze while a multiple scrape is pending.
A text attachment is generated for each case handled by the translator. When multiple items are generated for a single case, the attachment is placed on the vendor-neutral item if one is available, and otherwise on the item representing the first-listed report. Following the practice in other translators, the attachment identifies the text as obtained from Fastcase, with an embedded link in the page header and footer to the original copy online.
The MLZ citation styles do not yet provide complete support for all jurisdictions and courts covered by this translator, but the metadata it returns is of sufficient quality to permit rapid extension of the styles in response to user demand. As a further side note, appropriate transformation of the contents of the Court field in formatted citations may require adjustments in the Abbreviation Filter, the first time an item is used with a particular style.  For style adjustments or advice on use of the Abbreviation Filter, feel free to contact me on Twitter as @fgbjr.
Finally, I should point out that I have no financial interest in or affiliation with Fastcase itself: I am offering up this code, like the other translators (and MLZ itself) out of a desire, as a researcher, to improve the tools available to researchers. If you would like to strengthen that sentiment, buying a copy of the MLZ book “Citations, Out of the Box” is one way of achieving that objective.