[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] The Oxford-Google mass-digitisation programme [an error occurred while processing this directive]


University of Oxford
Director of University Library Services and Bodley's Librarian

The Oxford-Google mass-digitisation programme:

contribution to the Opening Plenary Panel session of the CNI Spring 2005 Task Force Meeting, Washington DC, 4 April 2005

When Sir Thomas Bodley founded the Bodleian Library in Oxford over four hundred years ago, in 1602, he laid down the specific condition that this new library should not just serve his alma mater, but that it should also be a library for the worldwide 'republic of letters'. Ever since those far-off days, as the Library's earliest visitors books quite clearly testify, the Bodleian has honoured its Founder's wishes by opening its doors to all-comers, and by making its collections as accessible as possible to external readers of every kind. From its very foundation, in fact, the Bodleian was referred to as 'the publique library of the University of Oxford' - a fact which set it apart from almost every other library in 17th-century England, where most of the libraries then in existence were essentially private institutions, which existed to serve only their local communities of scholars, and whose collections were effectively closed off to the wider world of the intellectually curious. And today, the Bodleian continues to function as a 'library for the world', with more than 60% of its registered users having no direct affiliation with the University.

For most of the Bodleian's long history, of course, the accessibility of the Library's collections has been almost entirely dependent on the ability of its users to come physically to Oxford. But the emergence of the Internet, and the scope for creating digital surrogates of library materials for networked availability, have radically altered the paradigm for access to the Library, opening up a whole new meaning for the Bodleian as a 'library for the world' in the 21st century.

And it is into this historic context that Oxford's mass-digitisation programme with Google fits perfectly as a key modern element of what has been a strategic aim for the Bodleian for the whole of its existence: to bring its great collections to the wider world.

Five years ago, with the help of the Mellon Foundation, we established the Oxford Digital Library initiative, to exploit the Web in opening a public window onto a digital subset of the Bodleian's collections. And it was consistent with this that, about 2˝ years ago, we first began to talk with Google about the possibility of a joint programme of digitising large quantities of the Library's materials that were unreachable via the search engines.

We knew, of course, that Google's founders (and Larry Page in particular) nurtured a long-held ambition to 'bring the world's information to the world'. And we knew only too well that the Internet, for all its amazing achievements, was still desperately poor in terms of really quality information. We were also well aware that many students were increasingly relying on Internet search services as the first (and sometimes the only) port of call for their information needs. For us, therefore, it seemed a natural and logical step to begin a conversation with Google, to see if we could harness their technological expertise, and their huge financial potential, to find a win:win circumstance for both parties, in which Google could satisfy its ambition to extend its reach, by bringing large quantities of our materials into the electronic arena, where so many people were actually looking for information, instead of us merely expecting our local systems to be the centre of their information-searching universe.

As long ago as January 2003, therefore, we reached an understanding with Google's senior executives that we would work together to find a mutually beneficial way of making Oxford material available electronically both to Google and to Bodleian Library users. It remained only for us to work out what would we would offer up to Google for digitisation, and how, and when, and under what arrangements and conditions. All of which is now delineated in the 15-page cooperative agreement between Oxford and Google which was announced publicly on 14 December.

Regarding what would be digitised, we came very quickly to the conclusion that we should be concentrating, at least in the first instance, on printed books that were unencumbered with copyright restrictions. We chose printed books rather than manuscripts or other primary research materials for two main reasons: first, because we wanted to aim for critical mass as the best way of making serious inroads into our vast collections, with mass-digitisation techniques for printed books lending themselves most readily to the highest levels of throughput and giving rise to fewer concerns about potential damage to the originals. And secondly, we already had in place a number of arrangements for tackling the much more sensitive issues surrounding the digitisation of 'high-end', non-printed materials, for which we do not consider mass-digitisation procedures to be particularly appropriate.

And we chose out-of-copyright books because that meant that we could have a clear run at producing digital copies of millions of books without having any concerns over legal considerations. This was, if you like, in effect, our 'line of least resistance', and we remain convinced that it was the right decision for us in our present circumstances...

As far as the 'how?' goes: from the earliest days of our serious discussions with Google during 2003, it was clear that we both envisaged setting up a Google operation in Oxford, with the space and facilities to work, on a semi-industrial scale, through our vast collections of 19th and early 20th-century printed books. And, having concluded our formal agreement with Google in December, we are now jointly putting into place a Google mass-digitisation unit in Oxford, with the aim being, by the autumn at the latest, to begin digitising up to 10,000 printed volumes per week. And the only expense to Oxford in all of this will be the manpower costs of selecting the material for digitisation, and of nominating two members of existing staff as our operational and technical liaisons.

The 'when?' is covered by the early autumn operational start date for the Google unit in Oxford, and by the three-year agreement which extends to the end of 2007, but which is renewable, by mutual consent, on an annual basis thereafter.

As far as 'the conditions' of the agreement itself are concerned, these are relatively few and, from our point of view, not particularly onerous. Two copies of the digitised works will be made: one for Google, and one for Oxford; and the use and re-use of both these copies will be covered, on a non-exclusive basis, by the simple principle, stated up front in the agreement, that "the digitised works will be…included in Google's search services, and be available to Oxford for its purposes as a university". Google, in other words, will be free to exploit its digital copies of Oxford's materials in any way it pleases; while Oxford's use will be no more or less restricted than it is of those same materials in their physical form.

Google's copies will be made available via the Google Search and Google Print services. But, since our own digital infrastructure is not yet as robust or as sophisticated as we would like it to be, we will be taking initial advantage of Google's willingness to provide a dedicated website for our copies to be made immediately available to all accredited Bodleian users (probably at a url such as myBodley@Google), pending our ability to deliver local access to these materials from our own Oxford machines.

Each Oxford copy will consist of the image and OCR files, and also associated metadata linking the item to an existing machine-readable record in our catalogue database, together with an RFID physical volume identifier. It is our ultimate intention to make all our copies clickably accessible and downloadable direct from our University-wide OLIS system, and discoverable from a seamless Oxford interface which we are currently developing.

Additionally, too, we have agreed to protect Google's interests by cooperating with them to prevent robotic 'systematic' downloading of the materials from our own machines, and by agreeing not to distribute the materials to any of Google's competitors - both of which conditions appear to us, given the nature and size of Google's investment in the programme, to be perfectly reasonable.

For us in Oxford, the availability on the Internet, within three years, of the full texts of more than a million of our out-of-copyright printed books, through Google and our own website, represents "one small (but not insignificant) step" towards a 21st-century re-interpretation of our Founder's desire to make the Bodleian 'a library for the world'. But it is also, we believe, part of a "giant leap for mankind", in helping to 'bring the world's information to the world'. We are therefore very pleased to be able to bring our own few modest bricks to what promises to be a great new wall…

Reg Carr
April 2005