Wednesday, 11 April 2012

ISBN, ISTC, and ontology

There are a lot of books. You just won’t believe how many hundreds of thousands of millions of unique books there are. According to Google, as of 2010, there were 120,864,880 unique books in the world. Even counting the number of books in the world becomes difficult when we try to establish a precise definition of ‘book’. In everyday language, the word ‘book’ is used for a couple of distinct concepts: an entire textual work or a single manifestation of that textual work. In the sentence “Michael Crichton wrote a book about dinosaurs”, the word ‘book’ refers to the textual work itself: the definition of ‘work’ is discussed later on but basically this sense of ‘book’ can be considered to be a set containing every copy of the words Crichton set down. Whereas in the sentence “Where did Simon leave his Jurassic Park book?”, the word ‘book’ refers to one well-read, paperback manifestation. Ambiguity of language – the scourge of modern philosophy – makes it difficult to define what someone means when they talk about ‘a book’: we rely on context and the fact that the human brain doesn’t usually notice these distinctions.

To help with the Sisyphean task of organising books, humans (or librarians anyway) use number systems. There are unique international standard identifiers which are agreed upon by those who work with books – publishers, booksellers, librarians, etc. As a librarian working with e-resources – e-journals and ebooks – I do a lot of work involving these various standard identifiers: ISBN-10s, ISBN-13s, ISSNs, eISBNs, eISSNs. Our LMS and our OPAC use the numbers for all sorts of functions: they’re one of the main identifiers that the link resolver software uses; they connect the coverage in the coverage database to bibliographic records; they provide a search parameter for library users.

The humble ISBN: a philosophical minefield.
But like different uses of language, different standards refer to different things and imply different philosophical positions. Different standards which appear similar actually imply different answers to the question ‘What is a book?’ This is highlighted in the distinction between the ubiquitous ISBN – ISO 2108 – and the relatively new standard-on-the-block, ISTC – ISO 21047.

ISBNs refer to single manifestations of a textual work. One ISBN belongs to only one version of a work – 9780099282914, for example, refers to one paperback edition of a book published by Arrow Books in 1991. A novel like The Brothers Karamazov has dozens of ISBNs, each referring to a different manifestation of Dostoyevsky’s work. Cloth cover, hardback, paperback, electronic, first edition, second edition, reissue: all these manifestations of the same work will receive different ISBNs. The ISBN standard therefore carries an implicit definition of the word ‘book’ as a set composed of exactly similar printings. According to ISBN, ‘book’ doesn’t refer to an individual copy or the work as a whole but rather refers to a single version, printed or electronic. In philosophy of art, this is a species of nominalism: in philosophical parlance, a book is a collection of concrete particulars.

The ISBN (and ISSN for journals) and its implicit nominalism usually work well enough. But occasionally there are practical grey areas that can be vague and confusing. When the same journal has separate ISSNs for its printed and electronic versions and publishers fail to distinguish between the two. When a journal changes its title every few years and receives a brand new ISSN for each regeneration. When a work gets reprinted so often in different versions that it ends up with dozens of ISBNs. Gaze ye upon this record that I catalogued the other day and behold the horror of multiple undifferentiated 020 fields. As a user-focused librarian, I include all the identifiers in order to help library users find the ebook even if they search for the print ISBN. But as an aesthete and lover of minimalism, I find this record to be cluttered and aesthetically unpleasing.

Hence ISTC. ISTC (International Standard Text Code) is a new standard identifier for publications. It assigns one number to all manifestations of a textual work. A single ISTC number – A02-2009-00000A3D-1, for example – will refer to all the versions of, in this case, Little Brother by Cory Doctorow: the electronic versions (in various formats), the printed versions (hardback and paperback), and any other versions that may emerge. The ISTC should therefore make it easier to link together and subsequently find different versions of the same work. Instead of knowing which particular ISBN refers to which particular version of a particular book, the user only needs one number to find multiple versions of the same work.

From the ISTC Agency website:

An ISTC does not “belong” to a single author/publisher; rather, it “belongs” to the work it identifies. This means that the same ISTC number should be used to identify the same content even when it is being published by a different publisher and/or in a different publication format.

By including the ISTC of a textual work in the list of attributes of each actual product (e.g. each book) that it is published in, it is then possible to search for, and find, only that specific textual work among many products. This is the case even though some products with different content might have very similar or even identical names, and even though some products containing the desired content have entirely different names.

The ISTC is not intended for identifying manifestations of a textual work, including any physical products (e.g. a printed article) or electronic formats (e.g. an electronic book). Manifestations of textual works are the subject of separate identification systems.

Rather than the nominalism of ISBN, the ISTC standard adopts a form of idealism. In philosophy of art, idealism argues that creative works are mental entities: maybe a certain set of experiences or memories in the mind of the author or reader; maybe some sort of non-physical entity in a Platonic Heaven. As defined by ISTC, a ‘book’ – or, more accurately, a ‘work’ – is divorced from its physical manifestations and is a set of all the objects – print or electronic or other – in which it is manifested. ISBN may refer to something intangible but at least that set is based on firm physical characteristics: the ISTC goes one step further and makes the bold philosophical move of identifying a ‘work’ with something entirely intangible.

A Platonic Heaven for books?

…And in doing so invites all the accompanying philosophical problems. Without diving down into the murky metaphysical depths, we can skim the surface and pick off some problems which are of practical concern to librarians:  

  • What is a ‘work’? Is it purely textual? Or does it include audio and/or visual material? Is the screenplay for a movie the same ‘work’ as the movie itself? Probably not but there is a case to be made that they are sufficiently similar particularly in the mind of the person or persons who created it.
  • How does one define and distinguish separate ‘works’ or, as the ISTC Agency refer to it, "the same content"? Is the second edition of a book a completely separate ‘work’ or not? How divergent from the original would a second edition have to be to be considered distinct? What about an old work with new content like a new edition of The Brothers Karamazov with a foreword comparing it to Jurassic Park? In the original version of The Hobbit, Gollum amicably allows Bilbo to take the ring: does this significantly different plot development make the original Hobbit a different work to subsequent versions?
  • What about serialised works? Would the ISTC for Great Expectations apply to a set containing the original separate serialised stories in various copies of All the Year Round?
  • To what extent does authorship determine the status of a ‘work’? In Borges’ short story, Pierre Menard, Author of the Quixote, a writer, Pierre Menard, rewrites Don Quixote using the exact same words as the Cervantes original. As a satire of literary criticism, the narrator argues that, because of the context and historical situation of the author, Menard’s version is significantly different despite having exactly the same content.

These problems are off the top of my head and maybe I’m overthinking the whole issue. Maybe the people in the ISTC Agency have considered all these issues and have decided that they are devout Platonists. Maybe their dedicated team of crack philosophers has determined the answers to all the above questions. But it seems that, despite its flaws, ISBN has a far more secure Aristotelian foundation. I daresay that if I were in the ISTC Agency, these philosophical conundrums would be haunting my night-time imaginings.