[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] THE CHALLENGE OF e-SCIENCE FOR RESEARCH LIBRARIES [an error occurred while processing this directive]

DOMINUS ILLUMINATIO MEA

University of Oxford
Director of University Library Services and Bodley's Librarian

THE CHALLENGE OF e-SCIENCE FOR RESEARCH LIBRARIES

CURL Members’ Meeting Dublin, 26 March 2004
BODLEIAN LIBRARY

Slide 1

My brief today is to ask rather than to answer questions: to stimulate a group discussion about the 'big issue' of e-Science from the point of view of our research libraries, as a means of helping the CURL Board to decide if there's a way forward on e-Science for CURL collectively


I claim no special expertise in this arena; but I have at least had some involvement with discussions at national and institutional level about the potential implications of e-Science for research libraries. It has arisen for me in the context of my role, as Chairman of the JISC's Committee on Electronic Information, in the revision of the JISC Strategy, in the discussions of the JISC's Scholarly Communications Group, through my membership of the Research Support Libraries Group (RSLG, whose e-Science sub-group has prepared an extensive report), and in the meetings of Oxford's e-Science Centre Management Board, of which I have been a member since its inception.


But I want to stress that the only reason I'm giving this presentation now is not because I have any special insights into the challenge of e-Science for us all: it's simply due to the fact that I was rash enough to raise my head above the parapet when the CURL Executive Secretary was asking me for background information in preparation for today's session!


Like all of us here, though, I'm well aware of the increasing profile of e-Science (and the Grid) in the national research support agenda; and I'm happy to share just a few thoughts about its strategic significance, and to try to summarise, as a basis for discussion, what I see as the main aspects of the challenge of e-Science for research libraries like our own.


Slide 2

So what is e-Science?

Dr John Taylor (until recently the Director-General of the Associated Board of the Research Councils, who are the principal funders of the UK's e-Science initiative), has described e-Science as being "about global collaboration in key areas of big science and the next generation of infrastructure that will enable it".


In other words, e-Science is about the large-scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. And, of course, although Dr Taylor doesn't say so in this short 'definition' of e-Science, such collaborative global research activities will rely very heavily on access to large data collections, on large-scale computing resources, and on high-performance visualisation facilities.

So what's the challenge for us in all of this?


Slide 3

Well, for the purposes of today's discussion, I've summarised the challenge as I see it under these five separate headings: the background and the drivers; the 'stuff' itself; 'getting on board'; the big questions; and the 'enduring conundrums'.

So there are just five slides left in this brief overview, and I want to take you through each one of these very swiftly before we break up into groups and discuss what we think our approach to all of this might be…


Slide 4

Where then has it all come from? What's the background to e-Science, and what's driving it?

First and foremost, all the current e-Science activity is coming from a massive national investment. Large-scale national research funding has been put into the e-Science initiative, by the Research Councils, by OST, by the DTI, and by industry - a total of almost £300 million over 5 years, from 2001/2 to 2006/7. And to put that into some kind of context, that's slightly more than the whole of the JISC budget for those same five years, which of course includes the cost of the JANET network and a whole lot of other things besides. So the investment is on a fairly large scale…


And an integral part of the e-Science programme is the so-called Research Grid - a sort of parallel high-performance computing network to support e-Science, based around a dozen or so Regional Grid Centres, with the Operations Centre at the Rutherford-Appleton Laboratory (RAL), just south of Oxford. So, at the moment, there's a big parallel universe being set up out there that our research libraries are not (yet) a part of!


And we need to remember that even e-Science itself, and the Research Grid, are themselves only a part of the explosive growth of e-Research, which is being increasingly enabled by the Web and other developing technologies; and it must be a big question for us as to how far we think we may have anything to offer in support of all the new e-Research efforts that are being presently developed without our input. You may think that all of this is none of our business; and you may be happy that we should just carry on doing what we do to support the kinds of research that we know and love so well. But there's a whole new research agenda out there, and we should at least be clear about whether we think we should be trying to add value to it or not…


We need to take account of the fact also that most, if not all, of our own institutions are involved with these new research methodologies. Every one of the Research Councils is funding new e-Research initiatives, which our own academics are taking part in, and many of them are related directly to the e-Science programme. So, whether we like it or not, it's a local issue for most of us as to whether we're going to leave them to it, or whether we're going to try to support these things at our institutional level in some way or other.


It's worth noting, too, that one of the five strategic aims in the new JISC Strategy is "supporting research, and in particular e-Science, and helping to embed e-Science more widely across research". Our local computing centres, and our e-Science centres if we have them, are certainly going to be involved in this strategic aim; and we ourselves need to decide if we have a part to play in any of that. There's going to be a lot of money swilling around the system, too, and none of it will come to us if we don't throw our hats into the ring in some way or other…


And it's coming a whole lot closer to home, too, through the RSLG report and, potentially, through the Research Libraries Network once it's established. The RSLG report has a whole appendix on e-Science, produced by its e-Science sub-group. And if the seven existing sponsors of the RLN are successful in bringing the Research Councils on board with the RLN agenda (as we must hope they will), then it seems highly likely to me that e-Science/e-Research will necessarily become part of the RLN's work (and possibly in direct collaboration with JISC).

Those, then, are some of the principal drivers of e-Science as I see them.


Slide 5

But what about the other aspects of the challenge?

What about 'the stuff' itself?

Although it's perhaps not even the biggest potential challenge for us, what about the sheer quantity of the electronic information used and produced by e-Science - what Tony Hey has graphically characterised as the Data Deluge? How on earth is all that material going to be handled and managed? If some of us feel that we already have far too much stuff to deal with, we have just no conception yet about the vast amount of electronic data that's already being generated within the e-Science community. And there's very much more to come…


A quotation from Tony Hey may help to put this into some kind of perspective: "In many planned and future experiments, several orders of magnitude more data will be generated than has been collected in the whole of human history". One single e-Science project is currently producing datasets of up to 10 petabytes every year: and that's more data than the whole of the contents of the Bodleian Library, which has taken Oxford over 400 years to collect!


But the challenge doesn't end there! These data won't all be of the highest quality, and they won't come uniformly shrink-wrapped: e-Science is producing raw data galore, and it won't all be ready to use…

And it will come in an infinite variety of formats, some of which we've never even heard of yet!


And, if it's going to be controlled in any meaningful and usable way, it's going to need new forms of customised metadata, with standards that haven't been invented yet…

And if it's going to be searched and delivered effectively on a global basis, it's going to need a new and robust infrastructure. It will need entirely new search engines, and search tools that haven't even been created yet. Data-mining facilities will need to be developed, and a whole new range of middleware too.


And some of it, at least, is going to need archiving and curating, and on a scale which makes our present challenges look like child's play!

So is the challenge just too great for us? Should we simply leave it to the e-Scientists and their technology support staff to grapple with?


Slide 5

Well, I have to tell you that, if we are going to try to get on board in any way, there are even more aspects to the challenge that faces us!

For example, we're going to have to face the fact that many - perhaps most - of the scientific communities where all these data are coming from are already largely disconnected from what our research libraries currently provide by way of support. The Bioinformatics community, for example; the Particle Physicists; the Electron Microscopists; the Astronomers - the list goes on. Most of these communities already do much of their research almost entirely without our help.


So can we get closer to them? Or is there really nothing we can do to get on board with their work?

Why don't we try to make some inroads into the Research Councils' agendas? (Easier said than done, perhaps: even JISC sometimes struggles to get a look in!)


But we could certainly get a whole lot closer to the e-Science Core Programme. I know, for example, that Tony Hey, the Programme's Director, would welcome our input: he's already on record as criticising what he sees as our inertia; so we have a lever to pull with him at least.

But be warned! There's a whole new language out there; and we'll have to learn it. There are already e-Scientists who are actively engaged in creating new descriptive taxonomies to characterise and classify the outputs of e-Science; and we'll have to meet this semantic challenge if we're going to make any inroads into this parallel universe that's developing so fast…


And we'll need to gain new skills, and get a whole lot smarter at what we and our staff are able to do.

We'll need to tackle a completely new set of political and PR issues, too. As things stand right now, it's my impression that the e-Science community doesn't generally perceive that there is a role for the traditional research libraries in their work; so there'll be a whole lot of lobbying and convincing for us to do…


And of course, funding will be an important part of the challenge: we'll need to get our hands on new sources of funding if we're going to be able to resource any meaningful contribution to this new kind of information and research support. So how are we going to get such funds, and where will they come from?

And finally there's the question of whether we should 'get on board' singly and separately, at local institutional level, or collaboratively and nationally. And that's the particular challenge for us to consider today, as CURL…


Slide 7

So here are 'the big questions' for us to discuss among ourselves today:

  • Are we, as the UK's major research libraries, going to get involved in the support of e-Science? Or are we just going to let it happen without us?
  • Are we all going to try to get involved, or just some of us?
  • And if some of us are going to get involved, are we going to do it collaboratively, in the context of CURL for example, or are we just going to plough our own individual furrow?
  • And, if we do seek some kind of involvement in the support of e-Science, can we afford it?
Or, to put that same question the other way round: Can we afford not to be involved?


All of which leaves us with the big questions for today's discussion: who's going to step up to the gate, and how and when?

So let me leave you with my final slide, which I've called 'The enduring conundrums' (or should it be 'conundra'?). And I hope that these few headings will serve to summarise the issues for us to discuss right now


Slide 8

  • First of all: Are we up for it?
  • And if we are, how can we add value to the e-Science agenda?
  • And if we think we can add value, will 'they' (that is, the various e-Science communities) let us get involved? Do they even want us to? (The Director of the e-Science Programme has said he does. But how far is his personal view likely to be shared by the rest of his community?)
  • Is it really our job?
  • And finally (finally!): If we don't do it, who will?


Discuss!


Reg Carr
Dublin
26 March 2004