[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] The Oxford-Google mass-digitisation programme [an error occurred while processing this directive]


University of Oxford
Director of University Library Services and Bodley's Librarian

The Oxford-Google mass-digitisation project: How, why and what?

An EDUCAUSE Webcast, 15 June 2005

Slide 1

Like John [Price Wilkin], I want to describe the how, why and what of Oxford's deal with Google. But I guess we're about 12 months behind Michigan, and I want to come at it in a slightly different way. While my title slide is showing, I'd like to start with just a few words of background about the Bodleian Library, as I guess some of you out there might be wondering why a 400-year-old library sees it as part of its modern mission to enter into a mass-digitisation agreement with the world's leading Internet search engine service.

When Sir Thomas Bodley founded the Bodleian Library in 1602, he made it a specific condition that his new library should serve not just the University of Oxford, but also what he called the worldwide 'republic of letters'. So, from its very beginnings, the Bodleian Library has been open to all-comers; and its huge physical collections have been accessible to external readers of every kind. And to this day the Bodleian functions very much as a 'library for the world', with more than 60% of its currently registered users having no direct affiliation with the University of Oxford. And it's into this historic context that the mass-digitisation programme fits as a modern version of what has been a strategic aim for the Bodleian for the whole of its existence: to make its great collections available to the wider world. (I'll say a bit more about this re-interpretation of the Bodleian's historic mission later in this presentation; but, in my next slide, I want to say a few words about the mass-digitisation initiative from Google's point of view.)

Slide 2

So here's my 'take' on at least some of the rationale that lies behind Google's entry into the world of mass-digitisation. (These are my own views, of course; but they're based on personal experience of almost three years of discussions with Google staff.)

First of all, it's a well-known fact that Google's founders (and Larry Page in particular) nurture a long-held ambition to 'bring all of the world's information to the world'. And this is clearly reflected in the part of Google's mission statement that talks about 'organizing the world's information and making it universally accessible and useful'. Now Google has certainly done this very successfully as far as existing Web-based electronic information is concerned. But they are well aware, as the rest of us are, that the Internet, for all its amazing achievements, is still desperately poor in terms of really quality information. They know, as we do, that most of the world's valuable content is still 'locked up' in physical printed materials, and that a high percentage of this physical stuff is held and owned by publishers and by the world's great 'knowledge repositories', or 'libraries', as we know them more familiarly.

So it was bound to happen, sooner or later, that Google would set its sights on gaining access to this kind of stuff for the benefit of its worldwide audience. And so it was that, in 2002, Google opened up discussions with all kinds of publishers and libraries, to see if there might be a shared common interest in putting at least some of the publisher output and library materials into digital form for wider presentation through Google's search services. Froogle was the first substantial outcome from this new approach, and it was a 'no-brainer' for Google and the large sales catalogue companies to collaborate in putting mountains of sales catalogue material online. Other publishers, too, began to see that Google could help them to open up the global Internet market for their products, and so Google was able to conclude quite a number of agreements that permitted them to digitise particular kinds of books and journals, and to include them in their search results.

You've heard already, from John Wilkin, that Michigan University Library was the first large knowledge repository to be approached by Google in their efforts to make even more printed materials available to their users.

And all this, of course, was entirely consistent with Google's ongoing efforts to enhance their web-based services and to make them even more compelling to a larger market share of Internet users. In a severely competitive sector of the commercial Internet world, it was, and remains, an essential aspect of Google's drive as a company to add greater richness and coverage to their search engine results. And so they saw their ongoing discussions with publishers and libraries as an important means to populate their site with a hugely increased amount of digital stuff - and stuff of a higher quality too.

Slide 3

My third slide switches the focus, and looks at the mass-digitisation initiative from Oxford's point of view, as one of those major knowledge repositories.

For most of the Bodleian's long history, of course, the accessibility of the Library's collections has been almost entirely dependent on the ability of its users to come physically to Oxford. But the emergence of the Internet, and the scope for creating digital surrogates of library materials for networked availability, have opened up a whole new meaning for the Bodleian as a 'library for the world' in the 21st century.

In line with this, several years ago, we explicitly adopted a 'Hybrid Library' approach to the management of our collections and resources, with the ultimate aim being to manage and deliver all the information about our materials, and an increasing amount of the materials themselves, by electronic means. As part of this approach, five years ago, with the help of the Mellon Foundation, we established the Oxford Digital Library initiative as a strategic means to exploit the Web as a public window into a digital subset of the Bodleian's collections.

And it was consistent with this that, towards the end of 2002, we first began to talk with Google about the possibility of a joint programme of digitising large quantities of the Bodleian's materials that were unreachable via the search engines. At that time, we were unaware of Google's discussions with other libraries; but we approached Google because they seemed to be a natural source of potential support for what we were already trying to achieve.

In particular, we were well aware that many of our students were increasingly relying on Internet search services like Google as the first (and sometimes the only) port of call for their information needs. So it seemed a natural step to begin a conversation with Google, to see if we could harness their technological expertise, and their huge financial potential, to find a win:win circumstance, in which Google could satisfy its ambition to bring large quantities of quality information into its reach, and where we could greatly expand access to our own collections.

In January 2003, therefore, we reached an understanding with Google that we would work together to find a mutually beneficial way of making Oxford material available electronically both to Google and to Bodleian Library users. It remained only for us to work out what would we would offer up to Google for digitisation, and how, and when, and under what arrangements and conditions.

As far as what would be digitised is concerned, we quickly came to the conclusion that we should be concentrating, at least in the first instance, on printed books that were unencumbered with copyright restrictions. (We chose printed books rather than manuscripts or other primary research materials for two main reasons: first, because we wanted to aim for critical mass as the best way of making serious inroads into our vast collections; and secondly, we already had in place a number of arrangements for tackling the much more sensitive issues surrounding the digitisation of 'high-end', non-printed materials (like manuscripts), and we don't regard mass-digitisation processes as particularly appropriate for this kind of stuff in any case.)

And we chose out-of-copyright books because that meant that we could have a clear run at producing digital copies of hundreds of thousands of books without being held up by legal considerations. This was, if you like, in effect, our 'line of least resistance'; and we're convinced that it was the right decision for us in our present circumstances...

It was the classic 'win:win' situation, and we were both very comfortable with it.

Slide 4

My fourth slide, which I'll deal with very quickly, simply brings the story up to date as of December 2004, by which time Google was able to announce publicly that it had concluded five separate digitisation agreements - with the libraries of Harvard, Stanford, Michigan, New York Public, and Oxford (Oxford being the only non-North American library included in these arrangements).

With a number of local variations, dependent mostly on the type and copyright status of the materials to be digitised, Google was setting itself up to create digital full text versions of huge quantities of items, with OCRd text, and with indexes for search and retrieval via the Google Search and Google Print services; and by this means they would be able to offer up to vastly increased amounts of online information contained in hitherto inaccessible printed materials held in some of the world's greatest libraries.

Slide 5

And my fifth slide summarises what we in Oxford see as the main benefits to be gained from involvement in the Google programme.

It will help us bring a huge amount of Oxford content into the digital domain, for scholarly and public access and use. It will help us to attract very significant investment into our 'Hybrid Library', and to 'populate' it with electronic versions of locally-held materials. And it will enable us to supplement our ambitious plans to expand our provision of digital resources, as part of our more recent ELISO initiative (ELISO stands for 'the Electronic Library and Information Service for Oxford'). So our work with Google has many aspects to it, and they're all positive…

[I'll stop there, and hand back to Steve, to see if there are any questions or comments on my presentation so far.]

Slide 5

OK, thanks Steve. I'll resume my presentation with my sixth slide, which lists some other specific benefits that we expect to gain from our involvement with Google.

At the present stage of progress in setting up the mass-digitisation facilities here in Oxford, we expect to be operational from this October. Over the past few months we've been working closely with Google staff on the logistics for their operations in Oxford, and we've carried out test trials on the spot here, to satisfy us that both we and Google can handle the large amounts of stock from our extensive 19th-century printed collections. And we expect, once we're up and running, to receive from Google, free-of-charge, digital copies of perhaps more than a million of our books. Each Oxford copy will consist of the image and OCR files, and also the associated metadata linking the item to a record in our online catalogue. And it's our ultimate intention to make all our copies clickably accessible and downloadable direct from our University-wide OLIS system, and to make them discoverable from a seamless Oxford interface which we are currently developing.

Google's own copies of the digitised materials will be made available via the Google Search and Google Print services. But, since our own digital infrastructure is not yet as robust or as sophisticated as we would like it to be, we will be taking initial advantage of Google's willingness to provide a dedicated website for our copies to be made immediately available (probably at a url such as myBodley@Google), pending our ability to deliver local access to these materials from our own Oxford machines.

And this particular bullet point leads me on to another major benefit that we envisage will emerge for us, and that's the opportunity of working closely with one of the world's leading Internet companies. This whole thing can be seen as an important learning experience for both of us, as well as for our other library partners. We're very conscious here that we're not only making history as we proceed, but also that we're doing it with the help of a very smart technology company, for whom 'innovation is standard'. So we expect to learn a great deal as we proceed, through the sharing of technology innovation. It's a great enterprise on which we're all embarked, and we're very glad to be a part of it!

Slide 7

As far as the formal agreement between Oxford and Google is concerned, you'll all appreciate that the details of this have to remain 'commercial in confidence'; and, given Google's position as a commercial company, this is all perfectly right and proper. But I can at least give you this general overview of the basic terms involved in our collaboration.

In the first place, Oxford's current agreement takes the form of a three-year deal, extending to the end of 2007; but this is renewable, by mutual consent, on an annual basis thereafter. And, if all goes well, we hope that this will be just the beginning of a long-term relationship.

The agreement involves setting up a Google operation in Oxford, with the space and facilities to work, on a semi-industrial scale, through our vast collections of 19th-century printed books. And we're now well on the way towards setting this up, with the facilities being prepared as we speak, and with Google already in the process of appointing local staff to oversee their end of the operations.

One of the major benefits to us in Oxford, of course, is that Google is willing to pick up virtually all of the costs of the Oxford operation; and this, if you like, is the essential quid pro quo for us that makes it such a compelling opportunity. In effect, we're regarding this major financial investment by Google as a significant contribution to the achievement of our own strategic objectives in mass-digitising so many of our locally-held materials, and in bringing so much added benefit to the Bodleian's worldwide community of users.

Our own costs in relation to the programme are relatively restricted. We've already seconded a local Project Manager from our existing staff, and we will also need to bear the local staff costs associated with selecting the particular materials to be digitised. There will also be the notional costs of the Oxford staff time involved in the steering of the project, in the designation of a primary technical contact, and in the various working parties we've already established in connection with particular specialist and technical aspects of the project. But we consider these costs as a very small price to pay for the all the benefits we envisage.

Two copies of the digitised works will be made: one for Google, and one for Oxford; and the use and re-use of both these copies will be covered, on a non-exclusive basis, by the simple principle that "the digitised works will be…included in Google's search services, and be available to Oxford for its purposes as a university". Google, in other words, will be free to exploit its digital copies of Oxford's materials in any way it pleases; while Oxford's use will be no more or less restricted than it is of those same materials in their physical form.

And, finally on this slide, we're specially grateful to Google for meeting the University of Oxford's local concerns about the relevant legal jurisdiction governing our formal agreement. As the only non-North American library in the programme, we warmly acknowledge Google's flexibility in being willing to accept the jurisdiction of English, rather than Californian, Law in the unlikely event of any future contractual issues.

Slide 8

My penultimate slide (Slide 8) looks forward, beyond the immediate benefits of the Oxford-Google partnership, to a number of 'down-the-line' possibilities for further developments from our association with Google.

If all goes well with our collaboration over the next three years, we'd naturally like to think that the partnership might be extended, as we believe that both parties have even more to offer as the printed world becomes even more dramatically digital. We have about 14 million printed items here in Oxford, and it would be good to see a much higher proportion of these being out there on the Web, even if does take many years of hard work to achieve it. Of course, if and when we get to talking with Google about in-copyright materials, we hope that, by then, some appropriate arrangements with the rights owners may have been worked out. But all that's for the future, and meanwhile, we have a whole lot of work to do!

We're hopeful, too, that down the line it will become possible for us to share these full-text materials with our library partners, in some or all of the library consortia of which the Bodleian is an active member. I believe that, subject to the necessary legal agreements, this will become a reality, with even wider benefits for scholars throughout the world.

In Oxford's libraries, also, as elsewhere, digital preservation is a major preoccupation right now, and we will be looking to use our growing collection of digital surrogates as a lever to find long-term solutions to the challenge of preserving our digital assets in perpetuity, as we do with our physical materials. Ultimately, too, we will be looking to use our digitised book collections to reduce the dependency of our inter-library lending services on the physical copies, and as a means of enhancing the services we offer to the wider scholarly world.

Further down the line, too, we can envisage real benefits for the physical materials themselves, in the reduction of wear and tear upon them. We have not yet developed a policy for de-accessioning physical materials which are also available in digital form; and, as a legal deposit library, it is unlikely that we will be throwing away huge quantities of printed stock. But it may well become possible, as we develop shared retention policies with certain UK national partner-libraries, that we will be able to save some physical storage costs for some of the more common materials (and especially perhaps as the archiving of digital materials becomes more robust and reliable for the long-term).

Slide 9

And my final slide is by way of a somewhat self-congratulatory summary of this whole mass-digitisation enterprise.

We consider that we've made a 'good beginning'; and we're looking to build on that, and to try to spread the benefits of our work as widely as we possibly can.

In spite of its venerable age, the Bodleian Library is a forward-looking institution, and we are convinced that, for us, the future is virtual as well as physical. Our 'Hybrid Library' approach is based on the twin pillars of access and holdings, and we believe that our future success will be a combination of these, managed in ways which are both increasingly and deeply digital.

For us at least, the availability, within three years, of the full electronic texts of such a large corpus of our printed books, through Google and our own website, represents "one small (but not insignificant) step" towards a 21st-century re-interpretation of our Founder's desire to make the Bodleian 'a library for the world'. But it is also, we believe, part of a "giant leap for mankind", in helping to 'bring the world's information to the world'. We're convinced that it's the right way for us to go in staying true to our Founder's continuing vision…

Thanks for listening. And back to you, Steve…

Reg Carr
June 2005