Multilingual Zotero: extracting embedded references

The concept

This week, I sat down and implemented a feature in Multilingual Zotero (MLZ) that I have been hankering after for quite awhile. This post contains some geekish background discussion, but bear with me; if you wade through it, I think you’ll find that the feature introduced at the end scores high enough on the Scale of Nifty to justify the effort.

Zotero groups are a powerful and increasingly important channel for collaboration. Smooth collaboration is of particular value to MLZ users, who frequently have occasion to share resources with colleagues from other language domains. Zotero and Mendeley, competitors though they be, performed a great service to us all with the introduction of document-embedded metadata, which went live in Zotero with the release of version 3.0 on January 30, 2012. With embedded metadata, shared documents do not break when they contain references to which one collaborator has no library access. Things do not break—but they do stop working in some ways, and that is the problem I have tried to address.

If Zotero encounters an inaccessible item when refreshing citations in a document, it looks for an in-document copy of the item metadata. If a copy is found, Zotero uses it to create a surrogate Zotero item in memory, which enables dynamic reformatting of the reference in citations and the bibliography. However, the surrogate reference is not saved to the user’s local database, and it does not appear in Zotero itself. If metadata is missing from the reference, it cannot be added. If there are typographical errors in the reference, they cannot be fixed. If the reference is editing directly in the document, this freezes its form, so dynamic updates no longer work; and if the document is then sent back to the original author, the altered reference must be sought out and restored, and the original item edited to reflect the desired change. These steps are not needed if all references in the document are in a shared library to which all group members have access, but private library items can easily creep into a document during editing.

With embedded metadata in place, the obvious next step is to supply first-class Zotero references out of the document when an item is not available locally; and that has been part of the plan from the beginning. This seems like it would be simple to do—just run the save() method on the surrogate Zotero object. This can be done, but raises a wee logical problem that defeats it as a solution.

Zotero applies a local identifier to items in its database. Actually, an item has several identifiers, but all of them are pegged in one way or another to a particular library. The “best” one is a URI that points to the item within a given library or group, so let’s work with that. Here is what would happen with the “simple” approach of just saving the surrogate item in the normal way:

  1. User A creates a document, adds a reference from their personal library, and sends the document to User B.
  2. User B refreshes the document, which automatically creates a new item in her own personal library.
  3. User B corrects a misspelling in the metadata of the reference, and sends the document back to User A.
  4. User A refreshes the document. If things are set up properly, Zotero intelligently identifies the original item in User A’s database. (If things are not set up properly, it will create a new item, which User A must merge by hand with the original.)
  5. When User A prints the document, the reference will contain the spelling error.
  6. When User B prints the document, the reference will be correct.

With MLZ, the situation is more serious, since User A may be relying on User B to supply correct transliterations and translations on multilingual references (or vice-versa), and if these don’t turn up as expected, confusion will ensue.

A better approach is to remap embedded references for both collaborating users to a shared group library if they are not otherwise available. When all users in a team have access to the items from which a shared document is built, the scope of the collaboration naturally extends to the curation of metadata. The references that they produce together for use in their project become a resource that they can recycle in separate projects of their own. Everyone wins.

Shared library preference

To set this up, I have added a shared library selection widget to the new “Project Name” pane of MLZ Document Preferences, as shown to the right. The pane is specific to individual documents, and is accessible only through the word processor plugin. Clicking on the “No group selected” button will open a list of group libraries to which the current user has access (the user’s own personal library is not included in the list, for the reasons outlined above). If the document has been set to use a group to which the user does not have write access, or of which the user is not a member, the widget provides a suggestion to contact the group owner and arrange for access. When a group is once selected, the “release for editing” box must be ticked to change the setting. This minor nuisance encourages users to work against a common library, since that makes the sharing of reference data transparent and hassle-free. Once a group is selected, inaccessible references contained in the document will be written to the group as required, including any MLZ field variants set in the item.

That’s all there is to it. Our students here in the Nagoya University Faculty of Law will be using this new facility to build shared research libraries in the coming months, which will put the concept to the test. If you try it out and have any questions or run into difficulties, get in touch; we have a strong local interest in getting it right.

This entry was posted in Announcements. Bookmark the permalink.

4 Responses to Multilingual Zotero: extracting embedded references

  1. Benjamin says:

    Does this still work in juris-m?

    • admin says:

      I think it does not work currently. It served us well in the last round of Masters paper submissions, to build an archive of materials cited in a selection of submitted papers; if we decide to attempt extracting cites in the next round, I may look at the code for it again to see if I can get it going.

      Preserving this functionality in the Zotero-5.0-based version of Juris-M will be quite a challenge, though, since there will be big changes in the way tags are handled. Zotero proper has a plan for dynamic document collections, so it’s likely that I’ll just wait for that to arrive, after the Zotero-5.0 migration.

  2. Benjamin says:

    it’s just so that the word plugin does not work with MLZ anymore (reporting Zotero out of date).

    Would you be so kind as to explain to me the exact steps to extract the references used in a document? Somehow this is not working for me properly. I tried everything I can imaginge. Here is my problem:

    I have a very large (500 pages) word document with loads of Zotero citations. This was done by someone else. I now want to layout this in LaTex and for that goal I need only the references used in the document to create a bib file and then insert a bibliography in the latex document. I have access to the Zotero database of the original author, if that is important.

    Here is what I tried:
    In word (using up to date zotero standalone), I switched from fields to the other option and saved the document as odt (I also tried opening the docx in LibreOffice). I then switched to MLZ (converted the database etc.).
    Then I opened the document (also tried with smaller parts of the original document) in LibreOfiice and set the document preferences back to references. I created a group in MLZ. When I now choose the group and click “releas for editing” nothing gets inserted in that group (I made sure, that my library is completely empty). Items only get added, when I insert a new citation (using the database of the original author).

    I hope this is understandable. My guess is, that somehow the fields in LibreOffice get broken. Is there an older word plugin that works in MLZ and my goal can be reached in word?

    Any help would be apreciated!

    • admin says:

      Ben: MLZ has moved to a new home, and has a new name, at

      If you install Juris-M, you should be able to work with the word processor plugins.

      Some of our students here have had problems running Juris-M Standalone with Word for Windows, but I have not had outside error reports, so it may be an issue with their Word or Windows installations.

      From your description, I am not sure whether getting the WP plugin working will solve your problems. If you need specifically to extract and tag references, there may be a problem with the functionality currently; the last time I tried it, on a large document with only a few Zotero references, it did not seem to work, so there is a good (or bad) possibility that it needs some attention. I’m pretty full up with other work at the moment; a couple of weeks is probably the horizon for me to do some debugging on it.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>