Appendix G: Assessment Criteria for Digitization

Stuart D. Lee, 7/1/99

Why do you need to assess material?

‘It is at this point expensive to select, create, and maintain digital resources, with the cost of image-capture accounting for less than a half of the total expense.’
‘Selection Criteria for Digital Imaging Projects’, Columbia University, http://www.columbia.edu/cu/libraries/digital/criteria.htm

There are many answers to this opening question. The most obvious reason for the need to carefully assess a collection and ultimately select the documents you wish to digitize, is that it is essential to recognise the full nature and scope of the project before it is undertaken. The comment above on Columbia’s page on selection criteria points immediately to one of the main problems with digitization projects: cost. It is a lengthy and financially demanding process to digitize a collection, or items from a collection, and the expenditure is not simply limited to the actual process of capturing analogue material into digital form. As the above comment indicates, the stages prior to – and post – digitization can often be the most expensive. In this sense then the initial assessment of the project can be seen to be extremely important. It is unsatisfactory to realise, part way through a project, that it cannot be completed, or even worse that you should have been concentrating on another piece of the archive or indeed another collection altogether.

In short, assessment is required on a macro scale to verify that the completion of the project is possible bearing in mind the constraints of time, money, and competing priorities for your limited resources. Following on it should allow you to make the correct choice of which collections you wish to digitize and moreover which items within that collection are to be targeted. Finally, your assessment should indicate that the project will be able to deliver the digital material to a satisfactory technical standard and is seen to be cost-effective.

Every project, large or small, needs to perform some form of assessment and selection procedure before it progresses:

‘Announced in November 1994, the National Digital Library Program (NDLP) has identified two hundred of the Americana collections at the Library of Congress as an initial pool of candidates for conversion to digital form. [Libraries and archives traditionally handle many historical and other special materials as collections rather than as individual items, particularly when the items are personal papers or pictures rather than bound volumes or published recordings.] Factors that influence selection for conversion include uniqueness of the materials, synergy with other activities in custodial divisions (such as preservation), the availability of suitable digitizing technology, and the value of the materials for education.’
Arms, C. R. ‘Historical Collections for the National Digital Library’ D-Lib Magazine (April 1996; http://www.dlib.org/dlib/april96/loc/04c-arms.html)

Previous Studies

Several other libraries and institutions have attempted to provide assessment and selection criteria for proposed digitization projects. Wherever possible these have been studied and used for this report. These include:

Classes of material

Traditionally reformatting assessments (Atkinson, 1986 (1)) have classified documents into such categories as:

In the past microfilms, microfiches, etc. have acted as the preferred surrogate for reformatting to. Digital files offer a potential replacement for these and also bring to the fore such advantages as being easier to access and from more remote sites, plus the potential to increase the scholarly use of the material by embracing the power of computers to search and analyse large collections. On the other hand, digital material suffers from the fact that file formats are often unstable and need to be monitored for repeated migration.

Yet the potential array of material that the person assessing a collection for possible reformatting to digital faces is daunting. Not only will they face articles, journals, large print runs, books, manuscripts, maps, photographs, records, cassettes, film, video, and so on, but also existing surrogates such as microfilm.

Perhaps even more pressing is the need to question the scope of the assessment. In some cases you may be faced with one collection and you are simply attempting to select items it. In a more complicated scenario you will be selecting items from different collections and virtually reassembling them. Alternatively, you will be choosing one archive over another. The most common scenario is a choice between several archives, and then the task of selecting items (‘cherry-picking’) from within the chosen collection as there are not sufficient resources to digitize the entire holding.

A typical example of cherry-picking would be the digitization of a rare manuscript of x-hundred folios of which only one or two are ever consulted. Bearing in mind all of the costs of digitization, when assessing the manuscript should you simply elect to digitize the high-demand folios or should it be the policy to digitize the whole manuscript (the availability of the other folios, for example, could lead to an increase in demand for other sections of the manuscript)?

On a general level there are a few answers to this:

In an ideal world with unlimited funding the third scenario would be clearly the recommended approach. However, in reality one is faced with limited time and money, and competing preservation priorities. Furthermore, the third scenario relies on the ability to define a collection, i.e. where it begins and ends. This is not always the case in large archives.

It is suggested that although the complete digitization of an archive is always the ultimate goal, it is often unattainable. Furthermore, although the second scenario retains a sense of fairness, particularly in a system of distributed collections, it could be very difficult to administrate. It is suggested that the first scenario is one which will be most commonly adopted, based on the ‘needs’ and ‘feasibility’ criteria above, but that wherever possible complete digitization of an archive should be aimed for.

If one is resigned to only being able to partly digitize a collection (e.g. high demand items, ones in need of conservation, etc.) then one will need to be even more clear about the selection criteria employed. One way would be to design a matrix of all the varying categories by which an item in the archive can be assessed, e.g. subject, author, genre, time period, size, medium, and so on. It could then be a simple matter of making sure that all representative categories are covered, to give a balanced view of the collection (a policy adopted by the British National Corpus). Pilot projects may also wish to select items which can present challenges for digitization, so that potential problems for a future more comprehensive project will be encountered early on.

Need and Feasibility

But where do you start? The matrix approach noted in the preceding paragraph is one method, but this will not always be suitable (especially if you are choosing between competing collections). Although there is no single definitive check-list which can be applied, the following discussion, weighing up the demand for the archive (‘need’) with the realistic chances of successfully digitizing it (‘feasibility’), should act as a good set of guidelines.

Tamara Swora in Selecting Library and Archive Collections for Digital Reformatting (MountainView, CA, 1996) argued that the primary focus for any assessment procedure should be that of increasing access to the collection. This was reinforced by the conclusions reached by the Focus Group, attached to the Library and Information Commission which met in October 1997 to look at selection criteria for digitization(2). They decided that:

‘Access’, however, is not always the over-riding priority. Sometimes you may be considering digitizing for preservation or conservation. More often than not the project will be combination of all three: access, preservation, and conservation. Needless to say, this will directly influence such things as the methods of digital capture you employ.

Specifically, assessment of an archive should look at the need to digitize and the feasibility of the project (in terms of technical ability, cost, IPR, etc.)(3). The former will be heavily influenced by the University’s priorities for digitization, which in turn may not always align with national priorities. Both, however, will have to take into account the over all cost of digitization. When assessing an archive, therefore, attempt to address the following questions and issues:

Feasibility

  1. Technical requirements needed to complete the project effectively:
    i) is there sufficient hardware to digitize?
    ii) is there sufficient software to digitize?
    iii) are there adequate storage and preservation facilities in place?
    iv) is there sufficient software and hardware to provide access to the documents, and can the documents be delivered at reasonable speed?

This will be directly influenced by the physical nature of the documents. At a basic level are you dealing with simple text-capture, images, or time-based media? Are you dealing with surrogates? On a more advanced level you will need to consider whether the collection will need to be supplemented to fill in notable gaps or omissions. Furthermore are the documents of such value, or in such a delicate state that they will need special preparation before digitizing?

Need

Understandably when looking at ‘feasibility’ appropriate technical expertise should be brought in, with particular reference to facilities already in place. Furthermore, in terms of ‘needs’ their should be direct consultation with academics, curators, subject-specific librarians, and experts in user requirements (e.g. reader services). However, to quote the findings of the the TASI/eLib/NFF Digital Image Library Discussion Group:

‘the group seemed to feel that there were no perfect answers, and that future generations might regard our choices as flawed, but there was no way round this.’

How to Proceed

Before beginning individual assessments of collections you should clarify the current level of digitization activity in your institution. You will need to discover what digital projects already exist, or have recently been completed, and what the potential for digitization is. Once you have done this you should proceed to the interview stage.

There are three types of interviews you will need to conduct. The first is at the institutional level, i.e. with the people or committees responsible for the development of infrastructures (e.g. networks, institutional policies, etc.). This will help you clarify where proposed digital projects will fit in with the overall structure, and to become aware of any institutional policies which may have direct effects on your assessments.

Second you should interview current or recently completed digitization projects. This will allow you to get feedback on problems and solutions already encountered, the scope of present commitments to maintaining existing digital collections, work-flows and facilities already in place, and overall evaluations of the strengths and weaknesses of the various approaches adopted.

Third you will need to assess potential collections for digitization. Here you will need to get an overall feel of the size and potential problems posed by the archive, the cost of digitizing it (or parts thereof) so that you will be able to assess whether the project should go ahead, consequently to prioritize it in relation to other projects.

Stuart Lee

January 1999


(1) Atkinson, R. ‘Selection for Preservation: A Materialistic Approach’ LRTS 30 (1986), pp. 34-62.

(2) The report of the focus group’s discussions point to many interesting problems and conclusions. ‘The first concern raised by the Focus Group was that in many local studies and special collections automated cataloguing will have to be a priority before digitization, and that for archives automating catalogues and indexes is definitely a first priority. The Group also found it difficult to discuss criteria for selection without continually coming back to the issues of indexing standards and metadata standards. Criteria for selection of content were identified broadly as issues of access and of preservation, but it was not felt that either broad category should be a priority, that both were valid reasons for selection. Photographs and images were singled out as the area where digitization is most effective and for which there is most public demand. There was support for the idea of producing what the public want, and what might attract funding and be marketable, possibly as a way of subsidising the cataloguing elements of projects.’ (Appendix D, Virtually New: Creating the Digital Collection: A Review of Digitisation Projects in Local Authority Libraries and Archives, 1997,a report to the Library and Information Commission prepared by Information North, http://www.lic.gov.uk/publications/virtually/index.html).

(3) Columbia University categorise their selection criteria into ‘Collection Development’ and ‘Handling and Use’, but broadly speaking these equate to Need and Feasibility.

(4) Conway, P. ‘Yale University Library’s Project Open Book’ D-Lib Magazine (February, 1996; http://www.dlib.org/dlib/february96/yale/02conway.html).