Modular legal styles: an MLZ how-to

As outlined in the last post, MLZ now supports modular code for legal styles. This is a major development that promises to dramatically simplify support for law across all CSL styles over time. This post lays out the basics of the new architecture.

The text below is a technical description that assumes a basic familiarity with Citation Style Language (CSL). If this is your first encounter with CSL, the learning curve is not steep. General guidance notes for CSL style authors are available on zotero.org. With that, the CSL Specification, the CSL-m Specification Supplement, these notes, and a copy of MLZ (the unofficial Zotero variant with multilingual and legal extensions), you have everything you need to set about implementing a legal style for your own jurisdiction.

Overview

To get started thinking in modular terms, imagine that we want to define a set of citation macros to be copy-pasted across a number of different styles. Our copy-paste code should provide the basic arrangement of our citations, but we want them to adapt to the “look and feel” of the style into which we insert them.

This is the problem of legal referencing exactly; legal citation styles are typically incorporated by reference into a “main” style. The Chicago Manual of Style (CMS), the style of the American Psychological Association (APA), and many others provide basic guidance on citing the law, but refer authors to specialist style guides such as The Bluebook: A Uniform System of Citation [paywall] for the details. Each legal citation guide focuses on a particular jurisdiction, in turn referring the reader to other guides for the specifics of “foreign” citation conventions. [1]

To adapt eclectic legal citations to the context of a “main” style, editors may apply back-referencing conventions that differ from those dictated by the legal style itself. Concerning the use of id. versus ibid., the use or non-use of the so-called “five footnote rule,” the method of back-referencing to earlier cites, and other details, the editor may (and should) apply a set of “localized” conventions to legal citations embedded in a given style.

To return to our copy-paste idea for CSL style code, let’s start with the assumption that any reasonable citation can be broken into four elements: (1) a title; (2) a main element; (3) a locator; and (4) a tail element. If we can generate these elements in the correct form for our legal style, we can compose a finished cite. [2] Macros for each of them are the building blocks from which we will construct our “portable” CSL code.

[1] In an odd twist, the Bluebook itself is an outlier that offers its own short summaries of foreign citation conventions, without reference to the foreign guides on which they are based. Although in some sense “uniform,” this material is best passed over in preference to direct application of more accurate and up-to-date stylesheets for individual jurisdictions.
[2] This a tiny white lie, actually. Several variants of these macros are needed to make things completely portable. More on that below …

Designing templates for a main style

Since we plan to paste our law macros into arbitrary styles, we will want to give them distinctive names to avoid namespace clashes. Legal style modules in MLZ do this with a juris- prefix, so we will use that in our initial examples.

Full-form citations

For our first look at the “localisation” problem, let’s compare the respective forms for citing an English case in the style of the Oxford Standard for the Citation of Legal Authorities (OSCOLA) and the Bluebook. The two are nearly identical, but the latter is more fond of punctuation marks, and wants a comma to follow the case name:

OSCOLA
Donoghue v. Stevenson [1932] AC 562 (HL) 564
Bluebook
Donoghue v. Stevenson, [1932] A.C. 562 (H.L.) 564

CSL-m can strip periods (or not) when rendering a macro; but to control that pesky comma, we need to compose the elements with different delimiters. We can solve that problem by composing our standard macro elements with slightly different templates, like this:

OSCOLA
<group delimiter=" ">
  <text macro="juris-title" strip-periods="true"/>
  <group delimiter=", ">
    <text macro="juris-main" strip-periods="true"/>
    <text macro="juris-comma-locator" strip-periods="true"/>
  </group>
  <text macro="juris-space-locator" strip-periods="true"/>
  <text macro="juris-tail" strip-periods="true"/>
</group>
Bluebook
<group delimiter=", ">
  <text macro="juris-title"/>
  <group delimiter=" ">
      <group delimiter=", ">
        <text macro="juris-main"/>
        <text macro="juris-comma-locator"/>
      </group>
      <text macro="juris-space-locator"/>
      <text macro="juris-tail"/>
  </group>
</group>

In the examples above, note the macros juris-comma-locator and juris-space-locator. The sole purpose of these macros is to render the CSL locator variable with appropriate labels and other decorations: if the variable is not available, they render nothing. The citeproc-js processor limits the locator variable to a single use within each cite, so if the first macro (juris-comma-locator) is composed to render only when a comma should join it to juris-main, our template will produce correctly formatted cites. Always.

The code above will work just fine for full-form citations, but as every Bluebook-trained American law student well knows, that is the easy part. For subsequent references, we must adapt the elements to fit the back-reference conventions of our main style (which might be Bluebook, but might be something else).

Immediate back-references

The simplest back-reference in most styles is a reference to the immediately preceding source. The specific form of the reference should follow the rules of the main style, and again practices vary. A back-reference citing paragraph 35 of an English case cited in vendor-neutral form would appear as follows under OSCOLA and Bluebook rules:

OSCOLA
ibid [12]
ibid § 345
Bluebook
Id. at [12]
Id. § 345

In this example, the label (Ibid. or Id. is supplied by the main style. The locators together with their styling ([12] and 345) are to be supplied by our copy-paste English law macros. The interesting bit is the connector “at,” which is added in the Bluebook style only if the locator has no other label. To accomplish this effect, we define a pair of “bare” locator macros in our copy-paste code, as follows:

<macro name="juris-locator">
  <choose>
    <if locator="paragraph">
      <text variable="locator" prefix="[" suffix="]"/>
    </if>
    <else>
      <text variable="locator"/>
    </else>
  </choose>
</macro>

<macro name="juris-locator-label">
  <choose>
    <if locator="paragraph page" match="none">
      <label variable="locator" form="symbol"/>
    </if>
  </choose>
</macro>

In our main styles, we can again use slightly different templates to render the locators appropriately. In the Bluebook style only, we will need a small macro to insert the “at” term:

Bluebook
<macro name="at-mac">
  <text value="at"/>
</macro>

Once that is in place, we can use the following templates to render our standard macro elements appropriately in each of the main styles:

OSCOLA
<group delimiter=" ">
  <text term="ibid"/>
  <text macro="juris-locator"/>
</group>
Bluebook
<group delimiter=" ">
  <text term="ibid"/>
  <text macro="juris-locator-label" alternative-macro="at-mac"/>
  <text macro="juris-locator"/>
</group>

The alternative-macro attribute is a CSL-m extension to the official language, which calls the macro named in its argument when the primary module macro produces no output. Using the constructs above, our copy-paste macros can be used to produce correctly formatted back-references in both styles. Always.

Other back-reference forms can be fashioned in a similar way, using our fixed set of “portable” copy-paste macros as building-blocks in simple templates.

Standard macros

The full set of macros needed to build a legal citation for any context works out to the following (applying the same juris- prefix that we used in the examples above):

juris-title
juris-title-short
juris-main
juris-main-short
juris-comma-locator
juris-space-locator
juris-locator
juris-locator-label
juris-tail
juris-tail-short

Jurisdiction modules

Now comes the fun part. If we prepare a legal style composed exclusively using the macros listed above, the citeproc-js CSL processor used by MLZ can load it into a main style when it encounters items from that jurisdiction. Because we have separated the formatting requirements of the legal style from those of the main style, cites will render correctly across all styles that call on the legal style module. Always.

A jurisdiction module is an ordinary CSL or CSL-m style that defines all of the macros above, and no others. It must be a valid CSL (or CSL-m) style, and so must contain a citation node, and may contain a bibliography node. It may be run directly as a style in its own right for testing purposes, but as jurisdiction modules only provide formatting for legal references, it will not normally be used directly in production. In MLZ, legal style modules are loaded on demand when the processor determines that their code is required.

Module loading

In MLZ, legal items have a mandatory Jurisdiction field, populated from a controlled list of identifiers built from the Legal Resource Registry (LRR), a companion project to MLZ. An LRR jurisdiction identifier is a colon-delimited string. It may be followed by a court or institution identifier, separated by a semi-colon, but only the jurisdiction portion is used for style resolution purposes. As an example, the following identifier specifies the (District Court for the) Middle District of Tennessee in the United States:

us:c6:tn.md;district.court

To associate a law module with a particular jurisdiction, its ID and filename must adhere to a fixed schema. The filename must be composed as follows:

juris-<LRR-jurisdiction-id>[-variant].csl

The LRR jurisdiction identifier is mandatory; an arbitrary descriptive variant name is optional, and need not be used for unique styles that serve the target jurisdiction. A standard module for the District of Columbia (not the Federal Circuit) and its Bluebook variant would be written as follows:

juris-us:dc.csl

juris-us:dc-bluebook.csl

The ID of a jurisdiction module (in the info node of the module style) must follow the schema, using the root URL of the CitationStylist project, and dropping the .csl extension:

<id>http://citationstylist.org/modules/juris-us:dc</id>

Apart from these naming and metadata requirements, and the restriction of macro definitions to the list above (all of which must be defined), a jurisdiction module is just a standard CSL or CSL-m style, and can be validated and run in the usual way. If installed as a style in MLZ, it will be called upon automatically to format legal references from the target jurisdiction.

Jurisdiction resolution

When the processor encounters a macro with the juris- prefix, it will search for a suitable module based on the jurisdiction ID of the current item, and the preferred module variant, if any. Preferred variants can be set (optionally) as a comma-delimited list in the main style, via a style-options locale node:

<style-options jurisdiction-preference="babyblue,bluebook"/>

Modules are searched for among the installed MLZ styles in the following priority order:

  1. For each jurisdiction preference …
    • … a match attempt is made using the item’s jurisdiction ID …
    • … elements are dropped one by one from the end for each successive attempt …
    • … until a final attempt using the single top-level jurisdiction element.
  2. If a match is found, the module code is loaded and used to render the item.
  3. If a match is not found, the next preference is attempted.
  4. If all preferences (including the empty preference) fail, the style macro code is executed.

In lieu of conclusion

If you have reached this point in the post, you will understand the excitement that I feel over the implementation of modular CSL support for legal referencing. Styles prepared for the requirements of a specific jurisdiction can focus on getting the details right within their limited scope. If relied upon locally, there will be strong incentives to maintain quality. With modularity, styles from multiple jurisdictions can be combined in a single document, transparently and without conflicts, and legal support can be added to any of the 1,180 unique citation styles in the CSL repository with minimal effort. It’s a big win all around.

In terms of concrete benefits, CSL support for law opens the prospect of pushing reference managers and other third-party support tools into the legal research mainstream. While that is a very good thing if you happen to be involved in such a project, we can also anticipate public benefits from several vectors of innovation that have heretofore been in a state of stagnation:

Drafting efficiency
There is a reason why professionals in fields other than law have gravitated toward automated referencing systems over the past 30-odd years. Given quality metadata, generating citations automatically frees the researcher to concentrate more deeply on content. When automated referencing is tied to a well-designed research support platform, the gains are greater still.
Collaboration
MLZ is the first research tool to bridge the technological divide separating the law from other disciplines. It inherits from its pater familias Zotero a rich capacity for collaboration, which with the arrival of legal referencing support will reduce the barriers to research collaboration between lawyers and members of other disciplines.
Dissemination
The parsing of citation strings out of text documents has for years been a rite of passage for anyone engaged in legal archive development. In the U.S. jurisdiction, there have been important advances in recent years thanks to efforts by Eric Mill, Mike Lissner, Alan deLevie, and others. Robust parsers are essential for retrofitting legacy documents for electronic linking; but there are better ways for born-digital content. There is an attraction in principle to enriching official documents with embedded metadata at source; and modular style support brings that possibility one step closer.

Finally, this new offering puts paid to the silly notion that there is any sort of proprietary interest in citation styles. Referencing systems are just vehicles for enabling discourse. Like any other form of language, they do not belong to anyone—and the sooner we get past the misguided assumption that they might do, the better off we will all be.

Posted in Announcements | Leave a comment

Recent changes to Multilingual Zotero and friends

Six months have passed since the last post here. Work on MLZ has been moving forward apace in the meantime, and it’s time for an update on recent changes, of which there have been quite a few. Some of the recent developments are definitely worth writing home about, so we’ll start with those:

Info Pane Reimplementation
The original MLZ user interface for the rightmost column was written before I fully understood how the Zotero code worked—and it showed. The behaviour of the creator fields was irregular, tabbing failed in some places, and switching away from a field or the panel would sometimes drop edits in progress. Apart from the annoyance of little bugs, the code itself was difficult to follow, and so hard to maintain. I finally bit the bullet and rewrote it from scratch. It is now more stable, more intuitive, and less of a pain to deal with at the code level.

Both of the field language menus are now presented in a unified right-click popup that behaves the same for creators and ordinary fields (formerly “Change language” for the headline field was a left-click menu on ordinary field labels, and a left-click submenu on creator field labels—if you didn’t notice them there, you are probably not alone). The naming and operation of the menus has also been refined considerably. Here is an overview:

Hide/Reveal
When MLZ is first installed, the list of variants in the Languages preference panel is empty. In this state, the multilingual menus are suppressed: right-click on a field label does nothing. When languages are added, the menus wake up immediately. This was the original intention, but I’m not positive that it was working (and working smoothly) in recent versions. It is now.
Field Removal
Saving a variant field with no content will remove it. This works both for ordinary fields and creators. As in the previous version of the interface, attempts to delete the content of a headline field will be blocked until its variants have all been removed. (Well, you can remove the first or the last name of a two-field headline name, but not both).
Set Field Language
“Set Field Language” is the new name for the “Change Language” menu on headline fields. As you would expect, it, um, sets the field language, which shows up as a rollover tooltip. When languages are active, this menu is always available on multilingualize-able fields, even if they are empty (so it is possible to set the language of a field before typing into it).

The menu shows only those choices that make logical sense in the current context. With an empty field language setting, the menu lists only those choices that have not yet been claimed by a variant field. After a field language is set, the menu shows all languages. If an existing variant is chosen, it will be swapped with the headline field together with the field content. To signal the difference in behaviour, variants in use are shown in roman type, and unused choices in italic.

Change Language
A “Change Language” menu is available on variant fields. If no language has yet been set on the headline field, the menu will only display if it offers at least two choices (this is meant to prevent gridlock, but I am not sure it’s the right behaviour, and I will be happy to receive complaints and suggestions). Only first-class languages and variants derived from the language of the headline field are shown—so for example, romanized Japanese will not be offered as an option if the headline field is set to English. Variants already claimed by another field are, of course, excluded.
Add Variant
The “Add Variant” menu is available only on headline fields. It shows all variants that have not yet been claimed by a subordinate field.
Jurisdiction Identifiers
The Jurisdiction field was added to MLZ quite some time ago, loosely based on the draft URN:LEX specification. I have had the good fortune to be supported for membership in the OASIS LegalCiteM working group during the past year, and discussions there drove home to me the importance of a stable, well-ordered set of machine-readable identifiers for courts, in particular—and forced me to realize the flaws in the identifiers I had assigned ad hoc in the existing MLZ implementation. The end result was a push to collect an initial body of information for a workable set of worldwide identifiers in a “Legal Resource Registry” (LRR) now housed on GitHub. The initial data in the LRR draws heavily on work by Michael Lissner of CourtListener at the Free Law Project, and Harry Moers of the World Law Guide at Lexadin.

The identifiers listed in the LRR can be exported in machine-readable form for use in MLZ. This is the new fuel for search-as-you-type menus in the Jurisdiction and (on Case items) the Court field. Apart from providing a more complete and accurate set of identifiers, the user interface performs validation of the Court field against the selected jurisdiction, has a helpful dropdown menu of conforming court names, and allows free-text input for novel courts and edge cases.

The user interface shows only human-readable descriptors in both fields, but the citation processor receives the underlying identifiers in the “jurisdiction” and “authority” fields for use in condition statements. When rendered directly in a citation, the identifiers are converted to human-readable form, and appear as such in the Abbreviation Filter. This gives us the best of both worlds: structured data for the machine, and more accessible descriptions for those of us who are not machines.

Deployment of the new identifier suite was accompanied by a remapping of existing identifiers in user databases. There will have been a few glitches, and many records may show previously recorded court names as “invalid” (with a yellow background), but for the most part things should just work.

Modular Legal Styles
Stable jurisdiction identifiers have made it possible to realize a long-standing plan to introduce modular jurisdiction support in the citation processor. Anyone familiar with legal styles generally is painfully aware of the excruciating level of detail required for any single jurisdiction. A glance through the Bluebook shows the degree to which citation conventions vary in ways little and large across jurisdictions. It has been clear for some time that cramming a universal set of legal citation rules into a single CSL style is not viable. The new modular jurisdiction support will make it possible to delegate style development to contributors who are familiar with the requirements of individual jurisdictions, to allow them to work independently, and to draw on their style work across the full suite of CSL styles. This is an exciting prospect that will be the subject of a separate post.
The Abbreviation Filter Rides Again
I boasted here a few months ago about the latest improvements to the Abbreviation Filter (AFZ). The plugin required a few adjustments to cope with the new jurisdiction identifiers, and in the course of digging into the code, I ended up refactoring significant parts of it. The AFZ code was written from scratch, at a time when I had only a weak sense of how JavaScript hangs together as a language. It was messy, and the user experience was rough. It’s still messy, and it’s still pretty rough, but it’s better than it was. Small steps.

As with jurisdiction identifers in MLZ itself, jurisdiction identifiers already recorded in AFZ databases should migrate more or less smoothly to the new human-readable strings. There may be glitches, but most mappings should be accommodated by the upgrade and just continue to work.

AFZ is now accessible, without odd glitches, from the CSL Editor display of MLZ (we can get it working there in official Zotero as well, with a small patch that I hope will be accepted at some point).

Miscellaneous Small Fixes
A few things have been fixed or added since the last announcement. Among the highlights of this important if slightly less glamorous list are:
  • The addition of “originalDate” and “dateAmended” fields in MLZ, with appropriate mappings to CSL. This change was just banged in because the fields proved necessary for items I was working with.
  • A small speedup when dragging items, done by arranging to call the processor just once for the citation and bibliography texts that it stores on the clipboard, rather than twice (a pull request for the fix was accepted for official Zotero as well). Thanks are due to Bruce Rusk of the Department of Asian Studies at University of British Columbia for calling attention to this item.
  • An option to suppress trailing punctuation on note styles that include it by default was added to MLZ back in September. This allows the mixture of MLZ-generated footnotes and references inserted directly into discursive footnotes created by the word processor, without modifications to the style—a long-standing need for users of note styles. Thanks for this change are due to feedback from Rónán Kennedy of the National University of Ireland, Galway.

That’s the news. I’m very pleased with the latest round of changes. They navigate us past some very difficult milestones, and open a clear path for future development. With modular legal style support, in particular, I have high hopes community contributions of local style modules. We’ll see how things develop, but the prospects look pretty bright.

Posted in Announcements | 4 Comments

MLZ: a fond farewell to the “New” tag

Some users may have noticed that items added to MLZ from the Web automatically acquire a New tag. This was implemented by a change registered on 20 February 2013, a little over a year ago.

The tag is meant to make it easier to find library items that have not yet been curated. With a little more reflection, I would have realized that there are better ways of doing that. To serve its intended purpose, the New tag must be deleted on each newly added item, as it is curated. When it is not deleted, the tag loses its intended meaning throughout the user’s database. This is not a good thing.

Further, I have recently learned that the New tag is causing a bit of a nuisance for the maintainers of Zotero’s archive of page translators. Translators prepared with MLZ will work fine in official Zotero; but for translator tests to pass in official Zotero, the gratuitous New tag added by MLZ must be removed from the output record.

Since the tag isn’t particularly necessary, and is now known to be causing headaches, I’m removing it from MLZ. Existing New tags will remain; but the tag will not be added automatically to new items.

If you have been relying on the tag to maintain your library, I think you’ll find that sorting items by Date Added is a simpler and more efficient of spotting items that may need attention. If the removal of this feature in an inconvenience in other ways, give me a shout, and we’ll see what we can work out.

Frank Bennett

Posted in Announcements | Leave a comment

CSL: Dynamic linking in citeproc-js

This post will be of interest primarily to developers using citeproc-js to format citations for delivery in dynamic Web pages or an API.

It is also kind of big deal.

CSL provides a set of common tools for the formatting of human-readable citations. Open linked data provides a mechanism for enriching published material with supplementary information. Citations are, in effect, a pre-Web implementation of linked data; and developers have quite naturally expressed a desire to join these two facilities. The benefits are fairly obvious: by providing both human-readable citations and machine-readable reference links through a single tool and data source, we can save ourselves a great deal of unnecessary work.

It has taken awhile to get to this point, but the latest release of citeproc-js supports dynamic linking to individual item fields in rendered citations.

Links can be applied to field content by adding a variableWrapper() method to the sys object (the first argument to CSL.Engine when instantiating a processor instance):

sys.variableWrapper = function (params, prePunct, str, postPunct) {
  if (params.variableNames[0] === 'title' 
      && params.itemData.URL
      && params.context === "bibliography") {
    return prePunct 
       + '<a href="'
         + params.itemData.URL 
       + '">' 
         + str
       + '</a>' 
         + postPunct;
  } else {
    return (prePunct + str + postPunct);
  }
};
var citeproc = new CSL.Engine(sys,CSL_STYLE_IN_XML);

prePunct contains any space or leading punctuation that precede the rendered field content. This will include any literal text added via the prefix or suffix attributes of the CSL rendering node.

str contains the field content itself.

postPunct contains any punctuation that follows the rendered field content.

params is an object carrying context-specific information about the reference being rendered. It has five eight nine keys:

itemData
The CSL JSON input data for the citation. Ordinary numeric and text variables have simple string or number values. Name variables yield an array of name objects. Date variables yield an object in the CSL JSON input format. Dump the content with JavaScript JSON.stringify() to explore the detail of the input object.
variableNames
An array containing the names of the variables rendered in str. For ordinary text fields, variableNames will always have a length of exactly one. Name nodes (cs:names) may render multiple variables.
context
The context within which the item is being rendered: citation or bibliography.

xclass
The class attribute of the style being rendered: in-text or note.
position
The position of the citation being rendered: one of first, subsequent, ibid, or ibid-with-locator.
note-number
In note styles, the number of the note containing this reference. Default: 0.
first-reference-note-number
In note styles, the number of the first note containing this reference. Default: 0.
citation-number
In numeric styles, the number of this reference in the bibliography. Default: 0.
mode
The output mode of the processor: either html or rtf (wrapper handling is currently disabled in plain text mode—but if you would like to see it hooked up, just let me know).

With suitable input, the variableWrapper() method can be used for various things: decorating citations with cross-links within the document; adding external links to ORCID profiles or author home pages; links to a bibliographic database or publisher service; or what have you.

Note that cs:names nodes are wrapped as a single element containing all names rendered by the node. This is a design choice: some names may not be shown in a printed citation, so author linking from printed citations should be done in a pulldown menu, which can be built from from params.Item using params.variableNames.

Note also that this method covers only variables rendered in the citation. This doesn’t quite go the distance for full RDFa support (which is definitely an objective). For that, we will need a separate method to provide the outer wrapper, a tracking object to identify elements of the citation that are not rendered, and maybe some other bits and pieces—input from people who know the ins and outs of RDF and RDFa would be most welcome on that score.

Posted in Announcements | Leave a comment

MLZ and the Abbreviation Filter: updates

As I wrote in the last post, a typo in the code of the last Abbreviation Filter release prevented the lists from being recognized against the MLZ styles. This has now been fixed, and with the latest release of MLZ and the Abbreviation Filter, we introduce some further improvements. Here is a brief run-down of the changes:

Document compatibility with Zotero and Mendeley
Sebastian Karcher recently pointed out that documents drafted with MLZ would likely break when run against official Zotero (he was absolutely right—this was an issue that I had missed). Given that one of the core aims of the MLZ project is to provide a smooth path for collaboration between scholars in different disciplines, this was a serious shortcoming. It has now been remedied: documents written in MLZ should be editable in both Zotero and (I believe) Mendeley. When the document is returned to MLZ, the multilingual data embedded it contains should be intact, and extract correctly to the database if the item has gone missing in the interim. The compatibility code is new, so document sharing among the three platforms should be approached with caution; but in my initial testing it seems to work quite well.
Bundled abbreviation lists
The lists bundled with the Abbreviation Filter (AFZ) have been broken out into rough categories, which can now be selected for import (or reimport) via the AFZ popup. The available lists include empties for abbreviations and abbreviation word-and-phrase hints, which can be used to disable abbreviations for the currently selected style. We also offer a list of abbreviations for scientific journals, which was supplied to the project months ago by Alberto Battistel. To import an external list, click on the label next to the selector to toggle the old-style file picker.
Missing items
MLZ inherits the ability to edit document containing unavailable references from official Zotero. I recently discovered that the Abbreviation Filter plugin broke this functionality, and threw up a large and cryptic error message when missing items were encountered. Although the error could be avoided by disabling AFZ, it certainly looked intimidating, and may have led users to assume that the document itself was broken. This bug has been fixed in the latest release: documents with missing items should work normally with AFZ installed and enabled.

Assuming things remain quiet on the bug front, I’m quite happy with the latest round of changes. The Abbreviation Filter is now a much more flexible and a friendlier tool, and document compatibility opens up possibilities that hadn’t even crossed our attention a couple of years ago. Small but important steps to enable research communities to achieve a higher level of discourse, on their own terms.

Posted in Announcements | Leave a comment

Abbreviation Filter issues

The latest version of the Abbreviation Filter has two significant bugs:

Style Assignments
There is a typo in the code that assigns the pre-installed lists to non-existent styles. So although the new jurisdiction-suppression feature makes the legal styles much more convenient to work with, the abbreviations themselves have gone AWOL.
Missing Items
While testing some upcoming changes to MLZ document integration, I discovered that the Abbreviation Filter will crash any document containing abbreviations not found in the user’s own database. This is a long-standing bug that had gone unnoticed and unreported, but it is obviously a serious one.

I am working to address both of these issues in the next release, which will also offer support for post-installation configuration of individual styles.

My apologies for these glitches. The fixes should be out within a couple of days.

Posted in Announcements | Leave a comment

The Bluebook: A Plot Summary

I here entreat those who have any tincture of this absurd vice, that they will not presume to come in my sight.
Lemuel Gulliver

This post closes a chapter in the tale of legal style support in Multilingual Zotero, so that the next can begin. The story began a little over five years ago, with a purpose that was quite simple, if crazily ambitious: to build a platform with automated citation support to serve researchers in any field, from any jurisdiction, handling resources in any language. Zotero and the Citation Style Language (CSL) provided a solid keel on which to build, but there was a lot of building to be done: no one had ever launched a multilingual reference manager, or explored the full complexities of legal referencing in a citation formatter. I would have to make my own mistakes, and if the project were not to have an unhappy ending, to learn from them.

There have been many transient missteps in software along the way, but the mistake that has stuck to the project like a barnacle occurred two years in, on 19 July 2011, when I opened the Terms of Use to the online version of the leading American legal citation manual, and found this:

  • You may not modify, publish, transmit, reproduce, create derivative works from, distribute, perform, display, incorporate into another website, or in any other way exploit the information contained on the Site, in whole or in part.
  • You agree not to use or display the trademarks BLUEBOOK ONLINE, THE BLUEBOOK or THE BLUEBOOK: A UNIFORM SYSTEM OF CITATION or any confusingly similar or dilutive words without our prior written consent.
  • You may not restrict or inhibit the functionality or use of this Site or the use of this Site by any other visitor or subscriber, including, without limitation, by means of “hacking” or defacing any portion of the Site.
  • You may not use the Site for any unlawful purpose or for any commercial purpose other than your own law practice.
  • As a subscriber to the Site, you may add Bookmarks and Annotations as needed for your own personal use and share these items with other subscribers to the Site who are members of your Group but you may not give, store or forward to others, in any form, any substantial portion of the information contained on the Site. Except as expressly provided by this Agreement, any use of the Site and its content is strictly prohibited without our written consent.
  • While using the Site, you agree to comply with all applicable laws, rules and regulations.

As Professor Lessig writes in his Foreword to “Citations, Out of the Box“, we ingest this kind of verbal razor-wire every day. A person ordinarily clicks I Accept and moves on; but when I came upon this passage for the first time, that person was not me. I was inclined to read with respect and deference, for two reasons that seemed compelling at the time.

First, for the previous two years, I had been working intensively with a highly motivated team to design and implement revisions to the CSL language. Specification discussions are a special flavour of discourse, with a very low tolerance for ambiguity and contradiction. A specification is intended to be a full and final expression of agreed requirements, and as such it must be clear what each statement it contains does and does not mean. Sustained exposure to that discipline had, I suppose, altered my reading instincts for the worse.

Second, I was in the process of circulating a book proposal, for the work that eventually became “Citations, Out of the Box”. Reference manager technology is unfamiliar ground for most lawyers, and a book-length publication would be an opportunity to lay out the basics to this new audience. It would be a major step for the project, and I was keen to avoid surprises for potential publishers.

From either perspective, the passage quoted above is an alarming nest of broadly cast restrictions. It can be read to prohibit a variety of actions that a reader might perform without thinking twice. Writing software to implement the rules of the style. Reading particular copies of the Bluebook while writing a book of one’s own. Referring to “The Bluebook: A Uniform System of Citation” by its own title. Granted, such readings conflict pretty sharply with common sense; but I thought it best to seek clarification (and truth be known, I was a little annoyed). So I sent off an email to the editors, explaining what I planned to do, and asking for confirmation that this would not violate the Terms of Use displayed on their website.

While waiting for a response, I received an offer of contract from a major U.S. legal publisher. This was a cause of great celebration in our household, until the following email arrived, from the Bluebook Chair at the Harvard Law Review:

Thank you for your inquiry. I apologize it has taken so long to reply to your message, but this mailbox is not checked as regularly (especially during the summer) as you expect.

As for your substantive questions. We do not accept your suggested Terms of Use or interpretation of the Terms of Use. If you have already purchased a key to The Bluebook Online and no longer wish to maintain your subscription, we would be happy to cancel it and refund your money.

Well, I had asked, hadn’t I.

It made no sense to me that the copyright associated with a volume standing on the library shelf could be extended so dramatically by the happenstance of signing up to read the same text online. Assuming that there had been a local misunderstanding, I pointed out this contradiction, in a followup note sent by registered mail to the four law reviews that operate the Bluebook. In due course, I received a response from my original correspondent:

Our position remains that the terms of use for the Bluebook Online are as they are stated. That position is shared by the other schools who are joint copyright holders in The Bluebook and on whose behalf I am speaking.

So now the annoyance was mutual.

When I wrote to Professor Lessig with news of this conundrum, he was equally puzzled, and kindly agreed to test the waters in a direct conversation with the editors. That did not go well: in addition to reconfirming that I was not welcome to subscribe to the online version, the editors apparently doubted whether readers of any copy of the Bluebook were free to cast its rules in software. I will confess that, although I am the author of a program that is used daily, to automate hundreds of citation styles, for an audience numbering in the millions, without anyone’s prior written consent, this latter possibility had never occurred to me.

As appropriate under the terms of the book contract awaiting signature on my desk, I shared this chain of correspondence with the publishers. Their response was cautious. Thinking that the ALWD Manual might serve as an alternative, I appealed to its editors, only to be informed that the Bluebook’s position sounded pretty attractive to them as well. Other possible adjustments were explored, but we were unable to find a satisfactory way of insulating the project from destruction (at least in the unhappy event that it proved successful). The deal fell through.

And so, with publishers leery of ominous signals emanating from Gannett House, the world’s first book on legal and multilingual reference management was relegated to DIY status. I dusted off my Python and LaTeX skills. I built my own typesetting platform. I learned to work with the node.js programming environment. I built a comprehensive style testing framework, and integrated it with the typesetter. I got very little sleep, but eventually, with the support and encouragement of a large and diverse circle of friends, I completed the book.

Some time after, Carl Malamud for Public.Resource.Org approached the Harvard Law Faculty, and through it the Harvard Law Review Association, concerning the stifling impact that the Bluebook’s restrictive stance was having on innovation. The approach was supplemented by a bundle of materials, including a tiny fragment of the Bluebook itself, a copy of the MLZ American Law Style, the MLZ abbreviation mapping files (reflecting various Bluebook tables), as well as items from other sources. Through the law offices of Ropes & Gray, the Association expressed a strong concern over the posting of the Bluebook text. Concerning the portions of MLZ code (which had been explicitly referenced to this website), counsel for the Association indicated that:

[T]he law reviews reserve their rights with respect to these particular files as they consider the issues posed by all such works.

And there the correspondence ends.

Taking stock today, five years since the commencement of citeproc-js development, three years since the initial query to the Bluebook editors, and two weeks plus a period of gestation since the statement quoted immediately above, I can say a few things with certainty. From time to time and in various contexts, the Bluebook editors have suggested that actions related in some way to their citation manual require their prior written consent; invitations to clarify the scope of those suggestions and their legal foundation have been rebuffed; the editors do consider that reading the text of The Bluebook Online with intent to write CSL style code would be a breach of its Terms of Use; the MLZ software and its styles have been publicly available for quite some time; and the editors have never demanded that support for their style be removed from MLZ.

The long period of constipated fretting over the series of Bluebook pronouncements described here has led me to conclude that it resembles nothing so much as that famous Tale of a Tub, a finely constructed work of literature flung out “by way of amusement” for persons of a serious disposition, but also known on good authority to have been “hollow, and dry, and empty, and noisy, and wooden, and given to rotation.” While this thoroughly gripping semantic adventure has had its moments, sanity requires that we return at long last to the foundations of the language, as it is normally used for communication between members of our species.

The software module once referred to as “the MLZ American Law Style” will now be known as “the MLZ Bluebook Style”; it will be described as “an unauthorized implementation of ‘The Bluebook: A Uniform System of Citation’” because that is what it is; and it will be adapted to cover the 20th edition of that citation manual, when said citation manual eventually reaches the bookshelves.

Posted in Announcements | 16 Comments

Improvements to the Abbreviation Filter for Multilingual Zotero

The Abbreviation Filter (AFZ) is an essential companion to Multilingual Zotero (MLZ), particularly for legal writing. In addition to precise, style-specific control over journal abbreviations, the AFZ plugin supplies the human-readable form for a variety of jurisdiction codes, and applies the abbreviations specified in the so-called “Bluebook” to individual words and phrases within case names. The AFZ plugin must be installed alongside MLZ in order to generate accurate legal citations.

While the Abbreviation Filter provides powerful facilities for fine-tuning citation output, previous versions of the plugin were not particularly user-friendly. The abbreviations popup was accessible only via the Classic View of the Zotero word processor integration plugin. One of the most common demands—suppressing one or more particular jurisdiction names across all citations—could only be met via obscure computery incantations in the abbreviation fields. Finally, abbreviation lists were shipped separately from the plugin itself, could only be installed via the word processor plugin, and via an interface that was clunky and difficult to control. All in all, it is understandable that many MLZ users have missed out on the “convenience” of this extension.

Many of the plugin’s infelicities have been addressed in the latest release, and I highly recommend giving the tool a spin, if you haven’t done so already. Here is a run-down of the changes:

Embedded abbreviation lists
Abbreviation lists for the MLZ family of citation styles are included in the plugin, and import automatically when it is installed or upgraded. The import process takes a few minutes to complete; wait for the progress bar to finish, and you will be ready to go.
Jurisdiction suppression
When writing about a particular jurisdiction (such as the United States), cites to primary legal sources of that jurisdiction (statutes, regulations, law cases) should be omitted. This can now be controlled on a per-style basis from within the abbreviation popup.
Improved import interface
To import an abbreviation list in previous versions, we had to select the import mode before selecting the file, which was inconvenient and confusing. In the new version, we select the file, then choose the mode, then run the import, as in a normal computer program.
Abbrevs. button
The Abbrevs. button is now accessible from the “csledit” pane (opened via the advanced pane of MLZ preferences); from the Quick Format word processor integration popup (i.e. the popup that is the default insertion method in Zotero); and the Classic View (the default insertion method in MLZ).

As a teaser, here is a view of the new popup.

Abbreviations popup

Give it a try!

Posted in Announcements | Leave a comment

Multilingual Zotero: extracting embedded references

The concept

This week, I sat down and implemented a feature in Multilingual Zotero (MLZ) that I have been hankering after for quite awhile. This post contains some geekish background discussion, but bear with me; if you wade through it, I think you’ll find that the feature introduced at the end scores high enough on the Scale of Nifty to justify the effort.

Zotero groups are a powerful and increasingly important channel for collaboration. Smooth collaboration is of particular value to MLZ users, who frequently have occasion to share resources with colleagues from other language domains. Zotero and Mendeley, competitors though they be, performed a great service to us all with the introduction of document-embedded metadata, which went live in Zotero with the release of version 3.0 on January 30, 2012. With embedded metadata, shared documents do not break when they contain references to which one collaborator has no library access. Things do not break—but they do stop working in some ways, and that is the problem I have tried to address.

If Zotero encounters an inaccessible item when refreshing citations in a document, it looks for an in-document copy of the item metadata. If a copy is found, Zotero uses it to create a surrogate Zotero item in memory, which enables dynamic reformatting of the reference in citations and the bibliography. However, the surrogate reference is not saved to the user’s local database, and it does not appear in Zotero itself. If metadata is missing from the reference, it cannot be added. If there are typographical errors in the reference, they cannot be fixed. If the reference is editing directly in the document, this freezes its form, so dynamic updates no longer work; and if the document is then sent back to the original author, the altered reference must be sought out and restored, and the original item edited to reflect the desired change. These steps are not needed if all references in the document are in a shared library to which all group members have access, but private library items can easily creep into a document during editing.

With embedded metadata in place, the obvious next step is to supply first-class Zotero references out of the document when an item is not available locally; and that has been part of the plan from the beginning. This seems like it would be simple to do—just run the save() method on the surrogate Zotero object. This can be done, but raises a wee logical problem that defeats it as a solution.

Zotero applies a local identifier to items in its database. Actually, an item has several identifiers, but all of them are pegged in one way or another to a particular library. The “best” one is a URI that points to the item within a given library or group, so let’s work with that. Here is what would happen with the “simple” approach of just saving the surrogate item in the normal way:

  1. User A creates a document, adds a reference from their personal library, and sends the document to User B.
  2. User B refreshes the document, which automatically creates a new item in her own personal library.
  3. User B corrects a misspelling in the metadata of the reference, and sends the document back to User A.
  4. User A refreshes the document. If things are set up properly, Zotero intelligently identifies the original item in User A’s database. (If things are not set up properly, it will create a new item, which User A must merge by hand with the original.)
  5. When User A prints the document, the reference will contain the spelling error.
  6. When User B prints the document, the reference will be correct.

With MLZ, the situation is more serious, since User A may be relying on User B to supply correct transliterations and translations on multilingual references (or vice-versa), and if these don’t turn up as expected, confusion will ensue.

A better approach is to remap embedded references for both collaborating users to a shared group library if they are not otherwise available. When all users in a team have access to the items from which a shared document is built, the scope of the collaboration naturally extends to the curation of metadata. The references that they produce together for use in their project become a resource that they can recycle in separate projects of their own. Everyone wins.

Shared library preference

To set this up, I have added a shared library selection widget to the new “Project Name” pane of MLZ Document Preferences, as shown to the right. The pane is specific to individual documents, and is accessible only through the word processor plugin. Clicking on the “No group selected” button will open a list of group libraries to which the current user has access (the user’s own personal library is not included in the list, for the reasons outlined above). If the document has been set to use a group to which the user does not have write access, or of which the user is not a member, the widget provides a suggestion to contact the group owner and arrange for access. When a group is once selected, the “release for editing” box must be ticked to change the setting. This minor nuisance encourages users to work against a common library, since that makes the sharing of reference data transparent and hassle-free. Once a group is selected, inaccessible references contained in the document will be written to the group as required, including any MLZ field variants set in the item.

That’s all there is to it. Our students here in the Nagoya University Faculty of Law will be using this new facility to build shared research libraries in the coming months, which will put the concept to the test. If you try it out and have any questions or run into difficulties, get in touch; we have a strong local interest in getting it right.

Posted in Announcements | Leave a comment

Multilingual Zotero: tagging cited items

Project name tagger in action

Since the public release of Zotero in 2006, there has been a steady trickle of requests for a means of identifying the items cited in a document or project within Zotero itself. Discussion of the issue has tended to focus on “document collections” as a potential solution. The idea would be to offer a special collection in the user’s personal library that contains all of the items cited in a given document. It is a testament to the Zotero user experience that this seems like it would be an easy thing to implement: under the hood, it is anything but.

The main difficulty is that Zotero (and so MLZ) permits items from multiple libraries to be cited in a single document. In contrast to the existing Duplicate Items and Trash folders, therefore, a “document collection” folder might contain items that are outside the current library. In a Zotero database, items contained in different libraries are separate copies, even if they contain identical metadata. This being the case, if an item is cloned from one library to another in order to form a unified collection of references for a particular document, the collection will not contain the original item, but a copy; and edits to the copy will not be reflected in the document when it is refreshed. It would surely be possible to implement a “live” connection between the item shown in a document collection and an original source item located elsewhere; but it would not be simple to do.

Pressure to address this use case has been growing in the home of MLZ, here in the Faculty of Law at Nagoya University. The majority of students in our postgraduate programs hail from jurisdictions in East, South East and Central Asia, and their work is inherently comparative in scope. We have a plan to deploy student-curated research libraries for specific subject areas, both as a study aid and as a publication of the Faculty. Since MLZ is widely used among our students, the obvious way to proceed is to fuel these new library projects with materials cited in student theses; and that workflow will run much more smoothly if cited items can be easily identified in MLZ itself.

Project name in DocPrefs

In order to get a working solution in place with minimum effort, I opted to just mark items in place with a special-purpose tag. With the latest version of MLZ, the Document Preferences popup has a new tab for “Project Name”, containing a single field. When a value is entered in the field, a special-purpose tag is applied to all cited items whenever the citations are refreshed or updated in the document.

The tagging operation is cumulative: if a project is composed of several documents (chapters, say), setting the same project name in each document will result in a single tag that calls up all references cited in any of the documents. As a side effect, references removed from a document will not lose their project tag (to produce a “clean” set of current references, you can delete the tag, then open and refresh each of the target documents). Project tags cannot be colorized, and they cannot be renamed from within MLZ. Where an ordinary tag of the same name exists, a renaming or colorizing operation on the ordinary tag will not affect the project tag.

The one small glitch with this setup is that an ordinary tag and a project tag of the same name cannot be selected separately: selecting one will also select the other. After investigation, I concluded that this small anomaly is unlikely to cause serious confusion, that it is easily avoided by renaming the ordinary tag, and that providing for separate selection of the two tag types would require too much code complexity to be worth the candle.

So there you have it. I’m looking forward to getting our subject-area teams started with their libraries, and I hope that other MLZ users will also find this facility useful. In due course, if all goes well, perhaps this approach will find a place in mainstream Zotero.

Posted in Announcements | 5 Comments