Bibliographic Polymorphism
Last time I talked a little about the difficulties MARC presents for modern applications. It’s too sophisticated to be considered just structured (or even marked-up) data, yet not semantically contained enough to be a language. But sometimes I wonder: if the MARC effort were undertaken by computer scientists today, would it turn out an actual programming language?
There’s neat computational stuff in MARC that prompts these daydreams. One example is the 880 Alternate Graphic Representation field, which you can see in action in records like this one from the Library of Congress:
The title and publication information are presented in the original Urdu script as well as the anglicized transliterations. In the MARC record, the original script data is recorded separately from the title and pub fields, via 880 fields:
In the above:
The 3-digit tags at the start of each line identify the field whose data that line contains. The Title Statement field has tag 245, and 260 is for publication-related information.
The two characters following the tag are called indicators (and are not important for today's post).
The string that makes up most of the field is the tag data. It is broken up into subfields by a separator character (in this case, |) followed by a single character identifying the subfield, called a subtag. For example, the 245 field above contains 4 subtags, |6, |a, |b, and |c.
The meaning of a subfield is field-dependent; the |a in the 245 means something very different from the |a in the 260. So, how do the 880s work? In the example above, both contain |a but in the first instance, it's interpreted as the main title, while in the second, it's the place of publication.
The 880 spec has only one subfield defined, |6, whose contents determine the meaning of the field. In the first 880 above, the |6 contains "245", meaning that this 880 takes on the shape of a 245, and hence the |a it contains is interpreted as 245's |a.
Why not just record this data in a 245, then?
The 880 tag itself provides an important context -- namely, that the data it contains is in some other language/script. Treating it as a separate type thus confers the ability to work with this representation -- whether customizing display logic, including or excluding it from search indexes, etc. -- independently from the regular bibliographic data.
It's useful then to think of 880 not as a datafield itself but as a generic container, parameterized over datafields. That is, an 880 field in the MARC record corresponds to a value of type 880<A>, which provides an A datafield along with some information about the alternative representation contained within that datafield (such as the script used). It's both conceptually and practically much cleaner than defining an 880 counterpart for each datafield in the MARC spec.
(For completeness: There are other aspects to the 880, such as the fact that it not only corresponds to another datafield type, but is (usually) linked to a specific instance of that datafield within the MARC record, but these are irrelevant for this discussion.)











