Multilingual Zotero: fixes for language support

Quite some time ago, I received reports from Mac users that the language variant pulldowns in MLZ either failed to open or opened erratically.

At the time, I traced the failure to a change in the Firefox internals that caught out Mac users before taking hold in my own environment (Linux). The essential change was deprecation of the document​.popupNode property that was used to grab the field on which to open the menu, in favour of a new popup.triggerNode property. I picked this much up from the Mozilla bug tracker, without paying sufficient attention to the details of the new method. Such is the stuff of which confessions are made.

In initial trials of the triggerNode property on my own system, I couldn’t get it to work, and concluded (wrongly, almost to a certainty) that it had not yet been implemented in the version of Firefox that I was using under Linux. So I “fixed” MLZ with code like the following:

var triggerNode = document.triggerNode ? document.triggerNode : document.popupNode

Right. I assumed that the developers had introduced a new property, set on the same object as the old property, and with the same content as the old property, changing only its name. I did not stop to ask, “Why have they done this?” I should have, because of course they had not. The triggerNode is not a property of document, it is a property of the popup itself — the target node of the event that opened it. Instead, I should have tried something like this:

var triggerNode = event.target.triggerNode

This works under Linux today, as it surely would have done a year ago. Ouch.

The latest MLZ release (ver. 4.0.18m437) includes new code for the language pulldowns. I haven’t yet tested the revision on Mac OSX directly, but MLZ should now be working correctly on all platforms. If you try it and have difficulties, please let me know

My attention was brought to the persistence of this problem by a private fault report from an MLZ user, to whom I owe a word of thanks. There are Mac users among our own students, and I will be very happy to give a progress report at the next Multilingual Zotero workshop here at Nagoya.

Posted in Announcements | Leave a comment

Multilingual Zotero: Fixes of the Fortnight

Official Zotero added syncing of full text index tables this week. Today, MLZ follows suit: we are now level with Zotero version 4.0.14. That’s the main news, but there have been a few other bugfixes to MLZ in the past two weeks, which are summarised below. Most are related to multilingual functionality, which is starting to attract the attention of style developers.

Multilingual Zotero

Zotero 4.0.14 release
Merge recent Zotero changes through the version 4.0.14 release. The displayed version of MLZ has been amended to reflect the current official Zotero release on which it is based.
Sync mappings
Two small changes catch a potential sync error and a sync failure. A missing field mask for websiteTitle for the videoRecording type has been added; and a sync mapping for testimonyBy on the hearing type has been added.
Language preferences
Document language preferences were failing for publisher and place. This has been fixed.

citeproc-js

Terms in multilingual sort keys
Sort keys were being generated with the default locale of the style always. To get a correct sort in multi-layout styles, terms from the layout’s master locale (instead of the global default locale) must be used. Thanks to Ming-Li Wang for diagnosing the problem.
Automatic CSL-m mode
Force CSL-m mode on valid CSL-m styles (i.e. those with version 1.1mlz1).
Fix title-short outside of MLZ
In MLZ, the content of the Short Title field is ignored when <text variable=”title” form=”short”/> is used on the legal_case type. This is what we want in MLZ, but it violates the official CSL schema. The behaviour has been rectified for official CSL styles run outside of MLZ.
Test of genre field content
To resolve an MLZ-specific issue relating to localisation of the email, interview, podcast, radio-broadcast, and television-broadcast types, a test for content automatically inserted into the genre field by these types has been added to the CSL-m schema, and implemented in the processor.
Labels on nested name elements
The processor was quashing the labels on names elements nested within a macro. This has been fixed.
Redundant calls to substituteStart() and substituteEnd()
The processor was adding redunant calls to substituteStart() and substituteEnd() on macro blocks. There were no known side-effects, but the unnecessary calls have been removed.
Multilingual supplement affixes
A bug that erroneously trimmed the suffix on the style element when a multilingual element with affixes is added has been fixed.
Add default-locale-sort attribute (CSL-m only)
CSL-m styles with multiple locale-specific layouts require a global setting for the sort locale. The new attribute supplies this value and the processor makes use of it.
Sort fixes
There were some infelicities in the processor’s sort function. These have been fixed.
Capitalisation
Recognize full stops inside quotes for term capitalisation purposes, and leading full stops on a prefix for delimiter purposes.
Greek entries
Extend the pattern match used for converting names to initials to cover Greek names. The family of recognised Greek characters has also been extended.
Posted in Announcements | Leave a comment

Multilingual Zotero: Our Week in Code

MLZ users may have noticed a flurry of updates to the client during the past week, with fixes
for a number of long-standing bugs, and extensions to allow smoother operation with several
styles and workflows. Here’s a run-down of the adjustments.

citeproc-js

Fix name-as-sort-order=”first”
With institutional names, name-as-sort-order=”first” was being applied independently to each set of personal names before each institution. The processor will now apply the attribute only to the first-occurring name. Thanks to user mlwang for reporting this fault.
Localise symbolic form of “and” in names
User and style developer mlwang uncovered the need for a local form of symbolic “and” (ampersand) in styles using Asian scripts. A localised term has been added to CSL-m locales, and the processor now calls it in MLZ.
Control for double-spaces before leading apostrophe
The processor was throwing a double-space before a leading apostrophe, reported by user johndd. Now fixed.
Abbreviation hints
Abbreviation hints are appropriate to the titles of legal item types, but were being applied to the titles of ordinary items as well, where they are not desired. Hints now apply only to the legal types. Fault reported by user mlwang.
Support Japanese imperial dates
For a Japanese legal citation style in development, it proved necessary to render (modern) imperial dates from Gregorian date source. An extension to CSL-m was has been introduced for this purpose.
Parallel merge constraints
The merging of parallel cites was over-aggressive. Merge is now avoided where Short Title or Jurisdiction differ between the items. Fault reported by user jack.orford.
Greek name handling
Conversion of Greek given names to initialised form was failing. Fault reported by user zoyiap.
Title case
The code behind text-case=”title” had become excessively complex. It’s been simplified, and a list of “stop-words” prepared by adamsmith for the CSL development group has been adapted to the function. Capitalisation is now performed consistently following colon, exclamation point and question mark.

Multilingual Zotero

Items with more than 10 authors
Yesterday I discovered that MLZ would crash if ten or more authors were set on an item. This has been fixed.
Creator move up/down
The move up/move down selections in the left-click author label menu had no effect. This has now been fixed, and moves are working again.
Avoid possible tag conflict on save
MLZ user duncdrum reported a sync failure caused by an attempt to save an already-existing tag. The steps to reproduce the error are not known, but code has been introduced into MLZ to render it harmless.
Sync error on Hearing type
The Testimony By creator field was not being stored correctly for sync operations. Error and its cause identified by user mlwang.
Add US bankruptcy courts to the Jurisdiction pool
The machine-readable identifiers underlying the Jurisdiction field are a more certain way to generate court and jurisdiction hints in formatted citations. As the need arises, this “pool” of identifiers is being extended. US bankruptcy courts and several other jurisdictions (including the former USSR) were added to the pool.
Interview and Letter in UI
The interviewer name or the sender was favoured in the centre panel listing. Preference is now given to the interviewee and the recipient.
Jurisdiction for Journal Article
In civil law jurisdictions, published comments on court judgments are a well-recognised category of content, with discrete citation formatting requirements. Adding the Jurisdiction field to this item type allows us to handle this type of material correctly in citation styles.
Recipient for Book and Book Section
Adding a Recipient creator to these types allows proper formatting of festschrift volumes and chapter contributions.
Volume Title for Book and Book Section
This field has been proposed in CSL discussions. A clear need for it emerged in casting a Japanese legal style, where kouza and commentary volumes carry the “set” name as their primary title, supplemented with volume numbers and separate titles for individual volumes.
Merge latest changes from Zotero 4.0 official branch to MLZ
This is a routine thing, to keep us level with Zotero 4.0.
Posted in Announcements | Leave a comment

Multilingual Zotero: update for Firefox 24 + American Law Style

This is just a brief community announcement.

Early this week, I started getting a error when attempting to set the American Law style in a word processor document. The “cause” turned out to be an upgrade to Firefox 24: in version 23 the style was loading just fine. The log files showed the failure as a recursion error — too many nested operations for the JavaScript interpreter to handle.

After some poking around in the style, I concluded that there was no easy solution there, and I began doing background reading in preparation for filing a bug report to the Firefox team. That turned up a telling comment (or two) to the effect that any code that throws a recursion error against a production browser unquestionably has “serious problems”. The JavaScript intepreter was giving up the game in a code loop that builds the “running” version of the style, and it does rely on recursion. I hadn’t touched this code in nearly four years — I struggled with it in the early work on the citeproc-js, and it would be fair to say that I didn’t particularly know what I was doing in there.

That’s the embarrassing bit. The brighter news is that I spent today rewriting that bit of code, and it’s turned out rather well. I’ve bundled up the revised CSL processor and installed in in a fresh release of MLZ, so if you are an American Law style (aka Bluebook) user, you should be able to avoid any issues by updating the client.

Apologies for any inconvenience — we will now return to our regularly scheduled programming.

Posted in Announcements | Leave a comment

Getting Started with the Free Law Ferret

Photo by ChrisP

The Free Law Ferret plugin for Firefox is a legal research companion to the Multilingual Zotero (aka MLZ) reference manager.

The Ferret reads US case law citations from any page viewed in the Firefox browser. Citations can be saved directly to MLZ; the corresponding court judgements are opened automatically in separate browser tabs, and can be appended to their MLZ items with a couple of clicks apiece. Items can be tagged locally for quick targeted retrieval, annotated for future reference, and accurately cited into new documents with a single click.

The Ferret is the first tool of its kind: if you work with US legal materials, it may save you valuable time in the collection of cited cases and the organisation of arguments.

Setup and Usage

Note: To use the Ferret, you must first have a recent version of Firefox installed. Unlike its cousin the official Zotero, MLZ is available only as a plugin for the Firefox browser: there is no standalone version. The Ferret depends on MLZ, so installing Firefox is a must. If you don’t already have it, click on the logo to install.

Once you have Firefox in place, install the Multilingual Zotero and Free Law Ferret plugins, in that order. Clicking on the logos to the right should do the trick.

Firefox will ask for permission to proceed with the installation and offer to restart the browser for each plugin. Just say “yes”: no further configuration is required to set things up. Once both plugins have been successfully installed you’re ready to go.

The Ferret reads citations from ordinary web pages (not PDF files, at least as of this writing). The sample to the right is from CourtListener, but the text of judgements obtained from Google Scholar, Justia, Lexis or WestLaw will suit just as well. The Ferret will process pages from online legal guides, and presumably The Bluebook Online itself (I am unable to confirm the last item, as the editors have warned me off opening an account there, for reasons that they have yet to fully explain).

Before using the Ferret, wake up the MLZ library by opening it at least once. You can either use the Shift-Ctrl-Z hotkey combination (Shift-Cmd-Z on a Mac), or click on the MLZ icon in the lower-right corner of your browser. Repeating the operation will close the library view. (The illustration to the right shows a sliver of my own library: when MLZ is run for the first time, it will contain only the Zotero Quick Start Guide.)

Right-click the mouse to open the context menu. A “Free Law Ferret” menu option should appear at the bottom (highlighted in the illustration to the right). Click on the menu option to run the Ferret across the page.

After scanning the page, the Ferret will display a list of citations it has found, as shown to the right. Select the cases that you would like to open for viewing. To create MLZ items when cases are opened, tick the “Save selected items to MLZ” option above the “OK” button (this is generally desirable, so tick the box now).

A search for each selected case will be opened in a fresh browser tab. The illustration to the right shows a search of the CourtListener database. This is the Ferret’s preferred source for cases: if no results are found there, the will fall back to Google Scholar. In both services, the citation of the case being searched for is displayed at the top of the page (item [1] in the illustration). If the CourtListener search is overbroad (unlikely, but possible), you can adjust the search terms by editing the case name words used (item [2] or the jurisdiction searched for (item [3]). In the illustration there is only one search hit and the citation matches, so we can just click through to obtain the text of the case.

When the case itself is opened in its browser tab, the Ferret will check whether a record of the case exists in the MLZ database (if the “Save selected items to MLZ” option was ticked, this will always be true). If a record is found, a green icon will appear in the address bar as shown in the illustration to the right. Clicking on the icon will automatically attach the text of the case to the local MLZ item. The icon will then disappear.

Once the case text has been saved with the green icon, repeating the steps above will open the local copy, rather than performing a search on CourtListener (or Google Scholar). Local copies can be opened without access to the Internet.

That’s pretty much the Ferret in a nutshell. MLZ has powerful facilities for organising, sharing and citing materials (which it inherits from the parent project at Zotero proper) and these are well worth exploring. Guidance on the use of Zotero is available from the Zotero website. For information on the extended features of MLZ, please refer to the book Citations, Out of the Box, available in print form on Amazon, or as a downloadable PDF. Before you dive down the rabbit ferret-hole, there are a couple of points to note:

  1. A set of MLZ styles with legal citation support are available here on the CitationStylist website. You should use these for legal writing, rather than styles from the official CSL style repository. The selection is much smaller, but the MLZ styles are able to do a better job of formatting legal references.
  2. When seeking support on the Zotero forums, be sure to put [mlz] in the subject line of your post, so that volunteers and core developers who follow forum traffic will know that your question relates to MLZ rather than the official version of Zotero.

I hope you enjoy the Ferret. I’m rather fond of the little critter myself; it’s nearly housebroken, and I’m looking forward to seeing what tricks it will be put up to by those who take it out for a walk.

Frank Bennett
Nagoya, Japan

Posted in Announcements | 1 Comment

MLZ style for Japanese tech writing: SIST-02

Japan Science and Technology Agency, Standards for Information Science and Technology

Before the arrival of Multilingual Zotero, reference managers uniformly assumed that a single citation style should be applied to all references. This assumption breaks down when the items to be cited are in multiple languages that differ significantly in their typographic conventions. Mixed-language citation is common in most non-English jurisdictions, and mixed-format citation support should be considered a threshold requirement for tools intended for use in Asian (and Middle-Eastern) publishing.

Multilingual Zotero is designed to accommodate language-dependent citation formats in a single style, through its extended CSL-m version of the Citation Style Language. Until today, this capability had been exercised only by a few intrepid users, and in the processor test suite: I hadn’t gotten to the point of drafting such a style myself. That’s changed now, and it gives me great pleasure to present the MLZ SIST-02 style, an implementation of the SIST-02 citation style maintained by the quasi-non-governmental Japan Information Science and Technology Agency (Kagaku gijutsu shinkō kikō: 科学技術振興機構).

Minami-Sanriku Disaster Prevention Headquarters

As if to remind me of the importance of getting out more, I learned of the SIST-02 style at the first full meeting of Code4Lib Japan held August 31 to September 1 of this year in Minami-Sanriku, Miyagi Prefecture (宮城県南三陸 — an area still struggling to recover from the devastating earthquake and tsunami of March 11, 2011). The meeting was memorable and inspiring in a variety of ways, not least of which was the decision by the organisers to hold the event at this location. C4Ljp is an active community, and core members have been closely involved in helping local libraries restore services following the disaster. Their dedication, and the perseverance of local staff, are a model to us all. I was also fortunate to meet Dan Chudnov, Director of Scholarly Technology at George Washington University, who was guest of honour at the event. The SIST-02 style presented here is based on work by Kyushu University librarian Eriko Amano, which came to light during an excursion to local library facilities following the meeting. The short lesson from this experience can be summarised as follows:

Means of Communication Score
Internet Borg-knowledge 0
Face-to-face conversation 1

The original version by Amano-san was cast in official CSL, and covered a subset of the SIST-02 style for Japanese references. I have taken the liberty of recasting the code in CSL-m for use with Multilingual Zotero, testing revisions against the copious examples provided in the SIST-02 style guide. The result is available for reference as a set of MLZ SIST-02 sample references, with links to the relevant section of the style guide (to confirm operation of the style) and corresponding Zotero library entries (to illustrate correct input conventions) for each tested reference.

The new style is available from the CitationStylist site, for use with the latest versions of Multilingual Zotero and the Abbreviation Filter (the latter is needed for correct formatting of patent references). It does not currently have general support for legal referencing, but as our first fully tested multilingual style implementation, it will provide a sound foundation for further style development in the Japanese jurisdiction.

Posted in Announcements | Leave a comment

Free Law Ferret: first-phase enhancements

Update: The latest release offers the option to save references directly to MLZ. To use the option, you may need to have the latest (very new) MLZ release installed, so it is worth updating that as well. Mainstream Zotero may not work — give it a try, let me know how you get on. The Ferret will avoid throwing duplicates when items are created from scraped cites, and parallel references in scraped data will be reflected in the Related tab of the parallel items in MLZ. Linked Data from unstructured plain text!

It’s been a busy few days since the Free Law Ferret plugin peeked out of its burrow. Short release cycles (sometimes several versions per day) became a habit with me while coding the citeproc-js citation processor: when thousands of users are affected by a known bug, it’s hard to avoid the temptation to squash it. I suspect that the Free Law Ferret has so far a much smaller audience, but hope springs eternal, and old habits die hard.

The flurry of ideas in the conclusion of the last post has continued to nag at my imagination, and I’ve racked up a string if plans in the project tracker. The most interesting bits in there so far are the direct item save option for MLZ, and proper resolver support. Feel free to add to the list.

Enough has changed in the past few days to justify a short list of improvements. If you took the Ferret out for a walk and found it less thrilling than expected, here are some morsels that might tempt you to install again and have another go:

Display citation above search listing
One of the things that most bothered me during early testing was the need to dig back through the original document to check whether cases shown in a search listing (from CourtListener) corresponded to an actual citation in the document. This is no longer necessary: citations applying to the case are listed in a fixed header above the display window of the search listing for immediate reference.
Recognition of year-before-cite styles
Digging round in Google Scholar, I found a surprising number of Federal judgments with citations in the form “Smith v. Jones, 9th Cir., 1978, 123 F.2d 456″. The Ferret now does a tolerably good job of picking up both the year and the court name from cites in this form, with correct identification of the party names.
Citation samples wiki
The repository on BitBucket now has an open wiki page with a short list of sample citations known to work, and another of citations known to fail in some way. If you run across cites that the parser handles poorly, feel free to add to the page.
Improved citation parsing
There have been quite a few small tweaks to the parser, resulting in significant improvements in overall recognition and accuracy. The parser will continue improving gradually over time.
Please-wait widget
Since large documents can take awhile to process, I’ve added a “progress meter” to show that it is actually doing something. The widget does not show actual progress, but if the parser breaks in some way, it will throw an error—if it says it’s working, you can trust that it is working on the document, and will eventually return.

That’s the story so far. If you try the Ferret and like it (or think you might, if this-or-that were fixed in it), send me a postcard. Seriously. My address is:

Frank Bennett
Faculty of Law
Nagoya University
Furo-cho, Chikusa-ku
Nagoya 464-8601
JAPAN

I’ve put in quite a bit of time over the past several years on Multilingual Zotero, citeproc-js and now the Free Law Ferret, and it’s nice to have a bit of physical evidence that the work is connecting with fellow tillers of the legal field. If you are moved to buy a copy of the MLZ book, Citations, Out of the Box (2013), I will of course be particularly chuffed!

Posted in Announcements | Leave a comment

Free Law Ferret: document-to-cited-cases in a click

Photo by John Mauremootoo

In the last post, I speculated on the possibilities of a free-text citation parser by Mike Lissner at the CourtListener project, and tentatively introduced a rough-and-ready port to JavaScript. I threw a label on the ported code, expecting to change it soon.

As luck would have it, soon has arrived already: “Juriscraper”, it seems, is the CourtListener library for scraping court websites, and doesn’t actually have anything to do with extracting citations embedded in individual cases. Oops. So I put on my copywriter’s hat, and chased up a brand new name. So let’s try that again….


Say hello to the Free Law Ferret: a Firefox plugin that has emerged from the CitationStylist skunkworks with a ferocious curiousity and a full set of tiny adorable bibliographic teeth.

The tool depends on some code supplied by Zotero or MLZ so you need to have one of those installed in Firefox for starters. Then install Free Law Ferret. Next, visit a page in your browser. Any page that contains citations to US case law will do, including court judgments from the service of your choice (the case shown to the sample to the right is from Google Scholar, but the source really doesn’t matter). Right-clicking in the page will bring up the context menu with “Free Law Ferret” at the bottom as shown to the right. Click on it and see what happens (it may take several seconds for the parse to complete on a large document). Bear in mind that the code for this is only a couple of hours old as of this writing: if you get an error, let me know and I’ll check into it.

The Ferret will scan the document in the browser window (be it law case, legal brief, blog post or whatever), and present a list of citations in a dialog box like that shown to the right. Note that the parser presently supports US case law only: cites to the courts of other countries, to regulations, to statutory law and to international instruments and tribunals will not be recognized. Select cites in the dialog and click OK to search for each case in the CourtListener repository and open it in a separate browser tab. If the search for a case fails, you can either broaden the search terms in the CourtListener page, or search for it (manually) elsewhere.

You may notice that the cites in the list are more cleanly formatted than those in the original document. In a 2007 interview, Dan Chudnov (citing Dan Hillis) mentions that one of the tasks of library service is to restore information to metadata that has been corrupted by “noise”. [1] This happens even with electronic records: data gets mistyped, things get entered in wrong field, data can be corrupted in transfer or when migrating across platforms. Handwritten citations, on which the US national legal infrastructure largely depends, offer an exceptionally rich variety of opportunities for variation and error. The solution, as Dan indicates, is to leverage the pieces we can trust, and refresh bad metadata from reliable records that can be inferred from what we have to hand.

This is exactly what the CourtListener citator does, and it explains the uniform appearance of the listed references. The tool contains a large pool of variants of the standard reporter abbreviations, derived from painstaking corpus analysis. These are mapped back to their canonical forms by the parser. The parser then explores the text before and after the match in search of data for associated case name, volume, page, year and (optionally) court name. When sufficient details can be identified, the citation details are finalised in canonical form, and a clean citation can be reproduced from the cleansed record. The citation details can also be used to compose a search query for arbitrary search engines, such as CourtListener itself: and voilà, case text on screen.

The specific query composed by the initial version of Free Law Ferret actually ignores the core citation details themselves. The details used instead are: the case name with any abbreviated terms removed; a date range beginning on January 1st of the year given in the citation and ending on December 31st of the following year; and the courts of the specific jurisdiction, if known. The date scope is extended in this way because a case may be published in the year following the date of decision, and a citation might contain either date—there is no way to be sure, so we look for the case under both.

These search terms are obviously a heuristic. The core citation metadata would seem to be more precise, but in a strange twist, official citations are not useful for retrieving cases supplied directly by courts themselves, because judgments are finalised before their volume and page location in the relevant commercial reporter (published primarily by Thomson Reuters) are known. Things are moving along in the free law sector, and public mapping tables to connect official citations to official judgments will eventually become a reality. In the meantime, however, you must take appropriate steps to confirm that a judgment retrieved via Free Law Ferret is in fact the judgment intended by the underlying citation.

Judgments of some courts are not yet covered by CourtListener search query interface, and these are excluded from the results returned by Free Law Ferret. At the state level, the missing jurisdictions are: Alabama; Colorado; Connecticut; Delaware; Florida; Georgia; Iowa; Illinois; Kansas; Kentucky; Louisiana; Massachusetts; Maryland; Maine; Minnesota; Missouri; North Carolia; New Hampshire; New York; Ohio; Oklahoma; Pennsylvania; Rhode Island; South Carolina; Tennessee; Virginia; and Vermont. Quite a mouthful but a dramatic expansion of CourtListener coverage is in the works, and will be released in the rather near future. When it comes out, cite recognition in Free Law Ferret can be expanded accordingly.

The initial version of Free Law Ferret is sufficiently functional to be useful, but there is plenty of scope for improvement. The query mechanism that has been hard-coded to CourtListener is a primitive implementation of openURL. Properly extended, it will be possible to support fallback to additional free-access services (such as Google Scholar and court sites themselves), as well as to commercial providers such as Westlaw, Lexis and Fastcase. With modest cooperation from archive maintainers, adding new services can be made instantly configurable, and the priority of multiple services can be made controllable by the user. There is also plenty of scope for improving the interface, and interoperation with Zotero or MLZ (which is all but non-existent in the initial version).

Many possibilities—so many, in fact, that I would like to close by inviting anyone with an interest to fork the Free Law Ferret code repository on BitBucket and submit pull requests back to the project. There is more work here than I can imagine at the moment, and contributions will be most welcome.

Update: Minor rewrite of introduction (2013-08-21).


[1] W. Daniel Hillis, The Pattern on the Stone: The Simple Ideas That Make Computers Work (London: Weidenfeld and Nicolson, 1998), referenced in Dan Chudnov, interview by Jon Udell, February 16, 2007, 26:20–28:20, http://blog.jonudell.net/2007/02/16/a-conversation-with-dan-chudnov-about-openurl-context-sensitive-linking-and-digital-archiving/ (accessed August 20, 2013).
Posted in Announcements | 3 Comments

Juriscraper meets JavaScript

JuriscraperJS repo page

As I mentioned in the last post, Mike Lissner at the CourtListener project has written a really splendid parser for case citations. I’ve taken the liberty of recasting the code of the current version from the original Python to Javascript, and put it up on BitBucket under the tentative name “JuriscraperJS”. To visit the repository, click on the illustration to the right. The JavaScript code covers only the text scraping element of the full CourtListener Juriscraper system. I haven’t spoken with Mike yet; if there is a risk of confusion we might end up calling the refactored code something else.

The JavaScript object weighs in at 3,762 lines, most of which is taken up by detailed structured descriptions of individual reporters. To take the code out for a trial run, you can install the ExecuteJS plugin in Firefox, open an ExecuteJS dialog, and paste the code from juriscraper.js into “JS-Code to execute” box. You can uncomment the block of code at the bottom of the file for a quick demo: it assumes that you have Zotero or Multilingual Zotero installed, and that you have a means of reading the console log.

The output of the sample code is shown to the right. As you can see, the scraper produces a nice fine-grained record of basic metadata. Given an openSearch query channel such as that demonstrated in the previous post, this data can be used to retrieve a copy of the case. With a relatively small amount of work to add a supporting user interface to Multilingual Zotero, one click can call up a list of cases cited in a judgment, and a second can call up the case text itself.

Some further work will be needed to tie this together, but just as site translators provide a uniform interface to disparate content on the Web, client-side citation parsing can provide a uniform link layer on case law drawn from scattered sources. This could get interesting.

Posted in Announcements | Leave a comment

MLZ: citation-to-case in two clicks

Zotero Opensearch page

In preparation for a short talk at the upcoming Code4Lib Japan meeting, I have been easing back from several years of (perhaps excusable) obsession with citation patterns and screen scraping, and reflecting on how to pitch MLZ to a local audience of information professionals. It’s a real shift of gears. The immediate objective is to avoid coming across as a hobby programmer with a bee in his bonnet, but I will very soon be back at the research chalkface myself, so there is also some more mature self-interest in the mix. The Code4Lib event will be attended by trained library programmers,[1] and I will want to learn as much as I can from their experience. Getting a loose grip on what the world of published resources looks like to a librarian seems like a prudent place to start with that plan.

These ruminations drew me back to an idea that I had several years ago, to explore ways of supporting legal research through Zotero’s Locate Manager. This is essentially just a small amount of clever glue code that allows MLZ or Zotero to import a predefined openSearch template, and use that as the basis of automated queries based on the metadata of a specific item in the local database. The query is fired off, and the browser takes care of the rest. I hadn’t taken the time to figure this out, and it struck me at the time that harnessing it for legal resources would be one of those Big Projects—something calling for Fundraising, Team Building Efforts, Management Skills and Serious Knowledge of XML—something to be addressed at an ill-defined point in the future, when nature blesses me with ample free time and a skill set well beyond anything I am likely to muster in this lifetime.

Zotero search engine menu

To my vast and utter surprise, it was actually quite simple to set up a basic openSearch resolver bridge in MLZ. You are looking at one now, in fact: if you have Firefox with Zotero or MLZ installed, open the client browser pane over this page using Shift-Ctrl-Z [2] and click the Locate Manager button as shown in the illustration to the right. The Add “Google Scholar (court judgments)” menu item will extend the Locate Manager with a simple resolver that searches for cases on Google Scholar based on their title. A bit of a letdown after the buildup in the first two paragraphs of this post, I know, but that really is all there is to it.

Zotero search engine use

Using the newly added resolver is another dose of anti-climax. Click on the Locate button after selecting a Case item in MLZ and choose the Google Scholar (Law) option from the pulldown menu. If there is a case by that name in Google Scholar, it will open in the browser. The resolver is defined according to the openSearch standard, and it’s not particularly an MLZ thing: it can be installed in mainstream Zotero as well, following exactly the same steps.

Since the existing MLZ site translators for case collections (Google Scholar, Fastcase, BaiLII) already fetch a copy of scraped judgments, you might be wondering what there is to be excited about here (other than the fact that it is easy and, you know, that it works). Taken by itself, it’s not much: but combined with developments afoot in other projects, it could get interesting quite quickly.

JuriScraper (by Mike Lissner), and on the CourtListener service for which it was built, seem natural companions to MLZ with openSearch resolver support. JurisScraper is able to reliably extract citations from the plain text of court judgments. The code is in Python, but if recast in JavaScript to run in the browser, MLZ could apply it locally to build a menu of cited cases from any text, using a resolver pointed at CourtListener (and elsewhere) to obtain the text behind the cites. Some extension work in MLZ would be needed to make that happen, but it is definitely doable. This would effectively convert any US court judgment from any source into a document with live links to the text of cited cases, and that, I think, might be useful.

Building out MLZ resolver support along the lines described above may help to forward the purposes of the UniversalCitation.Org discussions of 2011. Even the very simple title-only openSearch bridge provided by this page has immediate utility to users. A practical desktop tool that leverages available public resolvers shows their value in a way that most users can readily understand. There are powerful countervailing interests, of course, but a trouble-free demo or two may help renew the push toward vendor-neutral citation and direct-from-government publication of judgments. In the meantime, each openSearch bridge that covers a particular service is a building block toward the construction of a more general resolver service down the road.

At this point, the core features of MLZ are getting pretty stable, and I’m starting to feel (myself, privately, here in my cubicle) that there may be a real prospect of contributions by others, particularly on site translators, resolver support, and style development. We’ll see how it goes, but for today, I feel that we’ve taken a little step forward.


[1] Including Dan Chudnov, whom I have long followed, and who I now notice is a member the Zotero Advisory Board.
[2] Ctrl-Alt-Z if not using the very latest version of Zotero or MLZ.
Posted in Announcements | Leave a comment