Metadata

Alan Gilchrist (Cura Consortium, Brighton, UK)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 24 July 2009

447

Keywords

Citation

Gilchrist, A. (2009), "Metadata", Journal of Documentation, Vol. 65 No. 4, pp. 708-711. https://doi.org/10.1108/00220410910970339

Publisher

:

Emerald Group Publishing Limited

Copyright © 2009, Emerald Group Publishing Limited


According to David Evans (2008), a futurologist with Cisco:

By 2010 all the information on the internet will double every 11 hours. Ten years from now, it will double every 11 seconds. That will fundamentally change how we search for information.

Whatever the accuracy of this prediction, there is no doubt whatsoever that the exponential growth of the Worldwide web has had a most profound effect on professional conceptions. Metaphors such as “The Web is an enormous library” and “Metadata is just another name for cataloguing” now ring hollow. Yet, as the authors of this excellent book point out, “the first metadata scheme targeted for Internet resources [was] the Dublin Core Metadata Element Set...proposed in 1995”, which fact may have generated these metaphors. More recently, the extraordinary phenomenon of social networking has created not only a huge increase in web content, but is creating ways of communicating and sharing which are perceived to be undermining the traditional facilities for browsing and retrieving information. Weinberger (2007) argues that there are “three orders of order”: the first where physical objects are ordered (e.g. shelf arrangement), the second where surrogates are ordered (e.g. the card catalogue), but that now we are faced with having to order the bits into which content has been digitised. Weinberger then says “The power of the miscellaneous comes directly from the fact that in the third order, everything is connected and therefore everything is metadata.” In other words, he claims that we are now breaking away from hierarchical order to the infinite linking to be seen on the Internet, linking between resources with whatever labels anyone wants to place on electronic resources.

The authors of Metadata reflect an aspect of this resource variety in Part I of their book, which surveys the various metadata standards that now exist for application for general purposes (Dublin Core, Metadata Object Description Schema (MODS) and the MARC family); as well as a range of specialist standards for cultural objects and visual resources, educational resources, archives and preservation, rights management, scientific material, multimedia objects, and finally metadata describing agents – that is people and groups of people.

It is no insult to the authors to say that reviewing this book is rather like reviewing a railway timetable. It is a solid piece of work that must have required much effort to compile, designed as “both a textbook and an instructional guide for practitioners”. It is to be hoped that the authors are prepared to work on later editions in this fast changing world. In its 365 pages, the book contains no less than 111 illustrations, 25 exhibits (taken direct from web sites), ten tables, a seven‐page Glossary, a Bibliography of nearly 400 references, and a seven‐page Index.

Part II contains the meat of the book, covering the definitions and discussion of Metadata elements, Element sets, Value spaces, Application profiles and Crosswalks. Here, we enter the often subjective world of the information architect, and the authors are helpful and honest in their appreciation of the difficulties faced by those working in this area. The section opens with these words “Intelligent [metadata] decisions are integral to successful implementation of the project. In the previous chapter we introduced some standards utilized by different communities; however, in most cases, the development of a digital collection does not start (or end) by adopting an existing standard without some adjustment or adaptation”. This potential divergence from proffered standards raises important issues to do with quality and effectiveness, and interoperability, issues which the authors deal with in later chapters. Their Glossary defines “Element” as “a formally defined term used to describe one of the properties of a resource of a particular type or for a particular purpose. For example, the “publisher” of a book, the “format” of an electronic file, or a “restoration date” of a building”. The Element set is then the complete list of defined elements. Each Element may then have a “Value space”, which in turn is supported by (usually external) encoding schemes, which may be “syntax encoding schemes” (e.g. the convention for the format of a date – notably different in the UK and the USA), or “vocabulary encoding schemes”, such as the Dewey Decimal Classification (DDC) or RFC 4646 – Tags for identifying languages. This last feature is, of course, a huge area in its own right, and the authors sensibly confine themselves to giving examples and a few references.

It follows that the larger the element set, with its accompanying value spaces, the more complex the system becomes and the danger of ambiguity grows, particularly when elaborated with refinements or sub‐elements. For example, the Dublin Core Metadata set (version 1.1) contains just 15 elements, but in the UK. The Office of the e‐Envoy produced an ambitious and mandatory metadata standard based on the Dublin Core, to be used by all public bodies on their public‐facing web sites. This standard contained (in version 3.1), 25 elements, many of which were subdivided into sub‐elements, numbering over 80 in all. One element “Type”, carried an Encoding scheme containing over 80 descriptors, such as Article, Report, and Policy. The Element ’Subject’ was divided into several refinements, including Classification – using as a mandatory encoding scheme, the Government Category List (GCL), specially prepared to support the standard; and allowance was made for a second refinement, ’Keywords’, which was open to users for the encoding of their resources using their own thesauri. They were also then invited to map such thesauri onto the GCL. The point of including this brief case study is to underline the remarks made by the authors concerning the usual need for determining purpose, understanding user needs and for careful design.

This Section closes with three important chapters: the first dealing with Application profiles, noting that an enterprise‐wide metadata standard may well need to adapt its implementation to satisfy the requirements of different application, such as a web site directory taxonomy and a business classification supporting a file plan; the second with Crosswalks, defined as “a mapping of the elements, semantics, and syntax from one metadata scheme to those of another”; and the third with metadata encoding schemes, most notably XML in some of its manifestations.

So far, the book has concentrated on applications within the enterprise, but Part III enters the open environment, an area which surely will become a growth industry. Mention was made above of that misleadingly simple word ’mapping’, which computer specialists often seem to regard as simple string matching, whereas the information architect must pay attention to the underlying concepts. Zeng and Qin recognize the growing importance of interoperability when they say “It is becoming generally accepted in the information community that interoperability is one of the most important principles in metadata implementation”. One can go further by saying that the Semantic Web is most unlikely to succeed unless the whole range of tools is effectively deployed, including metadata sets, encoding schemes, XML and/or RDF, and exchange formats such as SKOS. Clearly, though, interoperability is now far too complex a problem for most individual agencies to tackle, and so a number of metadata services are evolving. The authors present valuable descriptions of metadata repositories and metadata registries, the former being used to support federated searching and the latter providing a central facility for information architects to access metadata tools. An important example of a metadata repository given in the book is the National Science Digital Library (NSDL) established by the National Science Foundation in the USA. This facility is described on the NSDL web site as “...organized access to high level quality resources and tools that support innovation and teaching at all levels of science, technology, engineering and mathematical education”. Zeng and Qin add in their book that “In addition to the thousands of item‐level metadata records for individual items in the repository, it also holds collection‐level metadata for each collection included in the NSDL”. An enterprise of this size and complexity clearly requires governmental funding. The metadata registry, on the other hand is defined by the authors as “A formal system for the documentation of registered element sets, schemas, application profiles, encoding schemes, element usage information, and element crosswalks. A good example of such a facility is provided by the CORES Project, an international European consortium that created some foundation work for a metadata registry intended to support the Semantic Web. The Registry lists contributing agencies and their downloadable element sets, elements (a large number of these, which again emphasises the complexity of metadata work), encoding schemes, application profiles and element usages, this last giving details of how each agency is using the elements in their particular applications.

In conclusion, it is clear that Weinberger, and Zeng and Qin agree on the growing importance of metadata, but there are apparent differences in their enthusiasms. While Weinberger promotes social networking and the free use of tagging and linking, Zeng and Qin argue for the importance of standards. While Weinberger criticises the Dewey Decimal Classification (a de facto standard in large areas of the Western World) for its hierarchical approach, Zeng and Qin describe the growing availability of a vast number of encoding schemes. It is quite possible that these two positions are not incompatible, for social networking involves communities, and if each wants to promote some degree of interoperability within their community, then metadata standards become useful; whether they be a combination of Dublin Core and DDC for the public library community, or an oenological taxonomy for wine lovers. It can be expected that specialist metadata registries will grow and evolve to meet the needs of all sorts of communities, and that there will be a constructive junction of the freedom of social networking and the discipline supplied by standards.

This book is to be recommended without hesitation, and congratulations are due to Facet for re‐publishing the American edition; it deserves a wide audience.

References

Evans, D. (2008), “Good times, bad times”, Financial Times, 17 September.

Weinberger, D. (2007), Everything Is Miscellaneous, Holt Paperbacks, New York, NY.

Related articles