January 31, 2014 by Darren
On January 14th, I invited you to share a journey with me; an adventure deep into the heart of metadata. I told you a bit about the history of metadata and how it’s used for books, that time the printing press changed everything and, after that, when the internet changed everything again. And then I made a promise that the following week I’d be back to share more metadata history with you. I broke that promise.
I gave you up, I let you down, I ran around and deserted you.
To make up for it, I’m going to share (almost) everything I’ve learned about metadata over the past two weeks.
Metadata: the data we use to describe other data, like books, films, photographs, and websites
Metadata has been around ever since we started collecting large volumes of books in one place, beginning with the Library of Alexandria in the 3rd Century BCE. Librarians used the Pinakes system to keep track of things like the author’s name, his educational background, where he was from, the title of the work, what it was about, etc. Whether all you’ve got is a list of titles on your bookshelf or a comprehensive catalogue covering everything from the writer’s birthplace to the number of words in the manuscript, you’ve got metadata.
The Elements of Metadata (Most Commonly) in Use Today:
In the tradition of the librarians of Alexandria, we still collect data like title and contributor, and we sort our titles according to subject (usually). But since so many books are now published each year, we’ve created industry standards that help keep everyone involved—publishers, vendors, retailers, libraries, etc.—on the same page. The most integral element of modern metadata is the ISBN—the International Standard Book Number. See, books are products like any other object manufactured and sold, and they require an ISBN to be identified among the multitudes of other products that make up the book marketplace. ISBN is assigned by a regulatory agency, and allows publishers and retailers to monitor inventory and efficiently identify specific products during business transactions. A similar and less known standard is the ISTC—the International Standard Text Code. The ISTC identifies the work of a book. For example, take a look at one of our backlist titles, Onward to the Olympics. The work of the book, being the manuscript itself and its actual contents, would have one ISTC, while each edition of the book (hardcover and paper) has its own ISBN—because each edition is a separate product.
(An aside on ISBN:
This 13-digit code need not frustrate or intimidate you. There’s actually a method to this madness, you just need to get to know it a little better. The 13-digit ISBN is made up of five separate elements:
- Prefix Element: The first three numbers of the ISBN insert the code into the global product ID system. Right now there are two prefixes available, 978 and 979.
- Registration Group Element: This number refers to the country, geographical, or language area of the publisher.
- Registrant Element: These five numbers identify a particular publisher or imprint. Some publishers may have more than one registration element.
- Publication Element: This element refers to a specific title.
- Check Digit: The check digit completes the ISBN, serving as a secondary error check for systems using ISBN and is automatically calculated by the preceding digits of the ISBN.
[Worth noting: ISBN in Canada is assigned by the Standards Council of Canada, in the US by the American National Standards Institute and in the British Commonwealth by the British Standard Institute.])
Still with me?
Other metadata points typical on our books’ pages include:
- Contributors’ roles
- Page length
- Contributors’ biographies
- Cover art
…You get the idea.
Electronic communication in the book business necessitated not only a need for rich metadata for books, including the abovementioned ISBN, but it also created a need for a standard way of communicating metadata from the publisher all the way to the end user (you, you amazing book reader).
Say hello to ONIX.
ONIX is not a Pokémon. What it is, is an XML-based international standard for representing and communicating book metadata electronically, usually using FTP (file transfer protocol). ONIX allows for global communication across languages and boarders of book product info, using ISBN as a match point for reference. It enables various systems to interact and engage with each other, using a shared language with common grammar, definitions and structure.
ONIX is a product of EDitEUR, an international organization started in 1991 to coordinate the development of infrastructure standards for selling books, serials and e-books online. Thanks to ONIX, manual file processing is reduced, accuracy of data transmission and interpretation is improved, and processing speed quickens. We could go pretty deep down the ONIX rabbit hole, but we won’t. Let’s just say that thanks to ONIX, a huge amount of metadata pertaining to thousands of books finds its way to the screens of many a consumer.
The driving force behind the creation of all these standards and practices was the rise of internet bookselling. Before Amazon, Barnes and Noble, Kobo, or Apple, book data transmitted electronically was mostly used for business transactions, inventory management, etc. The really meaty data, which you find on our book’s pages here on our website, was still largely being communicated by print.
When retailers started selling books online, they needed rich metadata from publishers in order to create their “digital shelves.” And to stay competitive in a disruptive market, retailers began displaying metadata directly to consumers. For the first time, metadata was a direct component of the consumption process, essentially becoming the browsing and discovery experience for anyone who shops online, and changing the way publishers had to craft their metadata.
Today, when you find a book online it’s because of the metadata. Retailers like Amazon use sophisticated algorithms that rely on metadata to decide which books to show to who, so publishers have to be more conscious of how they construct it than ever before. It has to have the right keywords, it has to appeal to a consumer, it has to have the correct structure for that particular retailer, and it has to be accurate.
So that’s where we’re at with metadata right now. It really is every publisher’s favourite thing to love hating, but without it you’d never find our awesome books.
What will metadata look like in the future? It’s tough to say—the thing is you never know what you need until you need it, so the rule of thumb is to collect as much as possible. That being said, it isn’t inconceivable to imagine a future where entire works are mined for data, including elements like style, voice, tone, keyword density, etc. And it isn’t that hard to imagine an evolved metadata environment, taking the “upstream/downstream” model (metadata predominantly flows from the publisher, “upstream” to the consumer, “downstream”) and turning it on its head, allowing publishers to use retail/vendor metadata to inform future business decisions. We might even get real-time metadata analysis, with publishers split-testing different elements to see what works best online.
But that’s just speculation, and this has been a very long blog post about metadata.