The medium on which the analogue material is presented to the digitizer needs to be considered, focussing on the ‘physical’ (as opposed to ‘content’) attributes of the material. The analysis of this material has to be done by the digitizer in collaboration with the curators of the material, and experts in conservation methods. For example a list of the most common physical attributes that need to be accounted for would be:
Physical constituency: Paper (matt and gloss), Vellum, Papyri, Microform and other Transparencies (e.g. 35mm slides), Glass, Three-Dimensional Objects (e.g. artefacts such as pottery, statues, book-bindings), Glass plates, Vinyl Records, Audio Cassettes, Audio CDs, Audio Tape Spools, Film, Video (NTSC/PAL/SECAM), etc.
Physical dimensions: With non-time based media the actual dimensions of the object are extremely important, i.e. it is difficult to digitize large maps or posters using conventional scanning equipment, and this may require creating a surrogate (e.g. a photograph) and scanning from that. With time-based media you need to consider length of clip, frame size, frame per second rate.
Physical robustness: Can the document be disbound, for example? Or is it so valuable or delicate that it needs to be digitized under certain conditions? For example, the Refugee Studies Programme Project at Oxford (akin to Yale’s Open Book Project) was in a position whereby it could disbind all its material and thus greatly increase the throughput by the digitizers. Whereas the ILEJ project at Oxford could not disbind any of the material (in that case 18th and 19th century journals), and its digitization process had to account for curvature of the page resulting from tight binding. At the other end of the scale are the requirements of the Celtic and Medieval Manuscripts Project (Oxford) had to work under, demanding the design and construction (from scratch) of special cradles to hold the manuscripts, and the buying-in of new lighting equipment.
In addition to the physical attributes the ‘content’ attributes of the document need to be analysed (to feed directly into ‘Benchmarking’ below). Expanding Kenney and Chapman’s original list to include time-based media, content attributes fall into the following categories:
Text/line art monochrome documents, i.e. with no tonal variation. Examples might be texts (such as the ILEJ journals), woodcuts, and black and white microforms. N.B. with relation to text all references in this article refer to scanned images of the text (which may then be subsequently OCRed) not keyboarded text.
Continuous tone varying gradation in tones, either monochrome (i.e. grey gradations between black and white), or colour. This would cover photographs, works of art, manuscripts, etc.
Half-tones spaced pattern of dots (either monochrome or colour). Used in line engravings and etchings.
Mixed contains two or more of the above.
Artefacts three-dimensional objects. Texture, shadows, etc., all need to be taken into account(1)
Audio spoken word, music, sound effects (or combination of all three). Either mono or stereo.
Film in most cases continuous tone (black and white or colour), but occasionally line-art for animations. Can include audio soundtrack.
However, there is considerable unease within the library sector at the prospect of relying on a digital copy as a substitute for other preservation formats(3);with a particular problem being the long-term institutional commitment to the maintenance of digital files. It is very rare to find any institution that has a fully comprehensive policy in place to guarantee the active migration and refreshment of digital objects to ensure longevity of access. Many of the variables involved in such a process are unknown as yet, and where they are known it is clear such maintenance involves considerable cost for the host archive(4).
Where digitization can help in ‘preservation’ is, of course, in the deflection of demand to view the original document. Most curators of rare or valuable material are acutely aware of the damage that repeated handling can do to the original document, and are constantly seeking to limit this access. In some cases, where the document is in a particularly bad state of repair (or is classed as being a security risk, e.g. due to loose leaves), the material may have to be withdrawn from general use, and might only be made available to satisfy the most pressing of research needs, or not at all in some cases. The availability of a high resolution digital surrogate can only be of help to the curator if it will act as another avenue of possibility for the researcher to use before handling the original. As P. Noerr (1998) notes:
Physical handling is one of the most destructive things that can happen to a fragile object. One of the best ways to preserve it is to limit physical access to it. This is a very strong case for creating a digital library
Yet this should not detract from the unavoidable truth that any copy, be it digital or microform, can only serve as a surrogate, not as a replacement for the original. Even with microfilm no surrogate has ever been regarded as a perpetual preservation copy of the original item and this rule should equally be applied to digitization. Above all, there should be no detraction from the continued efforts to preserve the original.
However, when it comes to scanning material for which no surrogate exists, particularly with material that is in need of preservation, things become more difficult. It has been observed already that microfilm, at present, for graphical and textual material provides the best certainty for preservation (if stored under ideal conditions). It is clear, for example, that a microfilm held under standard preservation conditions can last centuries, whereas the longevity of a digital file, bearing in mind the costs of maintaining its currency, the problems of migration etc., is uncertain (at best). Alan Howell, in his survey of newspaper digitization projects (1997, http://www.thames.rlg.org/preserv/diginews/diginews2.html#film-scanning) noted:
The most effective means to preserve the intellectual contents of newspapers is preservation microfilming. Ideally, reformatting should be undertaken when newspapers are acquired. It must be done before they become too brittle to handle - somewhere between 25 and 100 years depending on their initial strength, use, and the environment in which they are stored. If preservation reformatting is on 35 mm polyester-based silver-halide microfilm, and the film is processed to recognised international standards for chemical stability, housed in inert containers, and then stored under controlled environmental conditions, the microfilm is expected to last several hundred years. The microfilm can serve as a preservation master which can be scanned to provide access copies in digital image form.He concluded from his survey of existing projects that microfilming is ‘perhaps the most important’ preservation reformatting strategy, and this could increase bearing in mind improvements in optics, and standards for microfilming with a view to later digitization are improved. As Yale’s Project Open Book ‘Organizational Phase’ noted:
The first working hypothesis--that microfilm is satisfactory as a long-term medium for preserving content--builds on the features of microfilm as a long-lasting, inexpensive technology that is well understood in libraries. However, the linear nature of microfilm does not provide easy access. It is cumbersome to browse and read, it requires special equipment at a single location, it does not facilitate use of an item's internal structure, and it does not produce high quality paper copies. (http://www.clir.org/cpa/reports/openbook/openbook.html)In the case of black and white/greyscale material, microfilming seems relatively clear-cut. Colour images, however, present an extra dimension to the problem as it is clear that colour images held on film degrade much quicker and new copies have to be taken every couple of decades. Yet it has been observed that there have been considerable technical improvements in colour microfilms recently which will serve to increase their longevity. More importantly, in common with all microfilms, they do not have be refreshed for at least twenty years (i.e. their maintenance is relatively low), and it is just possible that they could miss out on one cycle of refreshment without causing too much concern. Digital images, however, probably need to be looked at with a view to refreshing/migration every three to five years and if one of these cycles is missed it could be disastrous (sometimes termed the ‘fast fires’ of digital obsolescence, as opposed to the ‘slow fires’ seen in acid-based printing methods).
However, in the digitization arena there are two approaches to the use of film. As the University of Columbia state in their ‘Technical Recommendations for Digital Imaging Projects’:
Scanning can be done directly from the item or a film intermediary can be made and scanned. Film intermediaries include most commonly 35 mm slides, 4 x 5 transparencies, microfilm, and single-frame microfiche. If properly made and stored, the film intermediary can act as a preservation copy of the item.
The quality of the intermediary will have a direct impact on the quality of the digital image. If the intermediary is poorly made, scratched, faded, or out of focus, the scanned image will be inferior. If the intermediary is of high quality, the scanned image will normally also be high quality. It is best to use camera negatives whenever possible. Every time a slide or other type of film is duplicated, it loses detail and resolution, and the resulting scan is poorer quality.
(http://www.columbia.edu/acis/dl/imagespec2.html)The question that needs to be asked (for new projects with a preservation element built in) is whether microfilming should be performed first and digital images taken from the film, or whether digitization should be the first action, which can then be used to output to microfilm; hence the term ‘Computer Output Microfilm’ or ‘COM’. (In the case of retrospective conversion of old microfilm stock to digital format, this decision is not applicable unless the masters are of sufficiently low standard to warrant rephotographing).
The most comprehensive review of the potential of COM was performed by the Cornell Digital Microfilm Conversion Project (see Kenney, 1997, http://www.thames.rlg.org/preserv/diginews/diginews2.html#com). This was a sister project to Yale’s Open Book initiative, as both were involved in creating 600dpi bi-tonal images of 19th century brittle books. The COM project investigated the quality and cost effectiveness of the scan first, and then output to microfilm approach, as opposed to filming first and then scanning. They concluded that:
Chapman, Conway, and Kenney (1999) have conducted the most recent study of these issues. Comparing the work done at Cornell (COM) and Yale (film first then scan) the study focuses on ‘The Future of the Hybrid Approach for the Preservation of Brittle Books’ (hybrid meaning digital and microfilm). It notes that ‘despite predictions that microfilm could be replaced by digital imaging, many have come to appreciate that digitization may increase access to materials but it does not guarantee their continued preservation’. The study rests on the assumption that ‘until digital preservation capabilities can be broadly implemented and shown to be cost-effective, microfilm remains the primary reformatting strategy’ with reformatting being the only viable strategy for the preservation of brittle paper, and that ‘although digital imaging can be used to enhance access, preservation goals will not be considered met until a microfilm copy or computer output microfilm recording of digital image files has been produced that satisfies national standards for quality and permanence’.
In short digitizing first and then outputting to microfilm can produce significantly better quality, as noted above. From a purely systematic outlook it involves only one step in the digitization chain from the original to produce access level images. In addition, microfilming first (with the knowledge that the microfilm is to be digitized) can be a troublesome process as experienced with the Bodleian Broadside Ballads Project which noted a considerable drop in throughput by the microfilming unit when attempting to meet the needs of future digitization. However, the equipment needed to produce COM in-house is costly and not readily available at Oxford (though out-sourcing of COM is clearly an option to be considered). Therefore such additional resources would have to be found to make this a viable option for any digitization unit.
For a comprehensive review of agencies performing Microfilm Scanning (including the Zuma Corporation used by the Bodleian Photographic Studio), and COM vendors, see ‘Technical Review’ RLG Diginews 1.2 (August, 1997 - http://www.rlg.org/preserv/diginews/diginews2.html#hardware&software).
This should not imply, however, that digitization should be regarded as ephemeral, or short-term. Chapman and Kenney’s observation that ‘digital conversion efforts will be economically viable only if they focus on selecting and creating electronic resources for long-term use’ still applies (http://www.dlib.org/dlib/october96/cornell/10chapman.html).
Digital access can also enhance the potential for analysis: that is to say a digital object can be edited, spliced, filtered, etc. without any damage to the original master, and researchers can subject the file to all manner of analyses (e.g. image analysis software) without causing any damage. Increased access is also, unfortunately, a double-edged sword. The widespread availability of digital surrogates (e.g. via the web) can ultimately lead to increased demand for access to the original (as borne out by previous experiences with microfilms). Therefore it is essential that high quality surrogates must be available at the institution housing the original document to deflect this demand (though it must be recognised that it is almost certainly impossible to reduce demand for access to the original, even with the highest quality surrogates available, down to zero).
Having accepted the advantages digitization presents for facilitating access, and the disadvantages digitization has in acting as a substitute for standard preservation methods, it is important not to be misled into digitizing only to a standard which meets current user needs. It is clear from previous projects that it is most cost-effective to digitize at a master level quality to allow for multiple output (e.g. print, microfilm, access images, thumbnails, etc.). This, however, needs to be balanced with the constraints of time and money the project or service is working under.
Assessment and Selection of Source Material (see previous section)
Application of Metadata
In the abstract this is satisfactory as it covers all the stages one must go through to successfully complete the digitization part of a project. However, in terms of actual practices in the working environment, this is clearly too generalized to be of much use. At the University of Oxford it has been recognised that the priority for the institution is to establish an on-demand (reactive) digitization service that replicates the functionality of a reprographics unit. In addition, bearing in mind the number of unique/rare collections the University currently holds, it should also work towards a more proactive digitization unit that could target collections on a project scale (i.e. not simply reacting to reader requests) and could also offer a cost-effective service to projects throughout the University, operating on a semi-commercial basis. The digitization chain for both of these would be much more elaborate, as can be seen from the two suggested workflows drawn up by this study (see http://www.bodley.ox.ac.uk/scoping/matrix.htm).
Understandably, when looking at a collection under the ‘Assessment and Selection’ stage some digitization assessment will take place. However, it should be noticed that there may be a considerable time delay between the assessment/selection stage and the actual digitization of the collection, and changes in technology may bring into question some decisions made earlier on. In the case of this study, for example, all the collections analysed will have been studied during the first half of 1999, however it is highly unlikely that funding will be available before the end of the year at the very least. In these few months there may have been significant advancements in the hardware and software available for capturing, and a second digitization assessment must be performed again in the light of this before full digitization can go ahead.
It is at this stage that one should confirm previous decisions made as to whether the digital surrogates are meant to act as a preservation copy, an access copy, a print copy, or all three, and whether the digitization is part of a hybrid solution in which other surrogates are to be used (e.g. microfilms) as part of the conservation process.
More formally, ‘digitization assessment’ should consider:
To give a simple example, when dealing with a graphic original there might be four scanning options to choose from:
For time-based audio, things are considerably more complex. The technique of digitizing can not be categorised using the above techniques as such but instead has to differentiate by such methods as the sampling standard (e.g. either mono or stereo), and the sampling rates (e.g. 11khz, 22khz, 44khz). Film involves the issues of both graphical scanning (e.g. reproduction of the picture in the frame), audio (if a soundtrack is included), and also fluidity of motion by focussing on the frame per second rate (fps).
It is risky, however, to proceed with any step before fully considering the relationship between conversion -- where quality, throughput, and cost are primary considerations -- and access, where processibility, speed, and usability are desirable. Informed project management recognizes the interrelationships among each of the various processes, and appreciates that decisions made at the beginning affect all subsequent steps. An excessive concern with user needs, current technological capabilities, image quality, or project costs alone may compromise the ultimate utility of digital collections. At the outset, therefore, those involved in planning a conversion project should ask, "How good do the digital images need to be to meet the full range of purposes they are intended to serve?" (http://www.dlib.org/dlib/october96/cornell/10chapman.html)The most obvious problem with benchmarking is ascertaining what level of capture is satisfactory, i.e. for present and future needs. Kenney and Chapman (June, 1996, p.7) advocate a ‘full informational capture’ policy, i.e. ‘ensuring that all significant information contained in the source document is fully represented’. Elsewhere they elaborate on this by stating that:
The ‘full informational capture’ approach to digital conversion is designed to ensure high quality and functionality while minimizing costs. The objective is not to scan at the highest resolution and bit depth possible, but to match the conversion process to the informational content of the original -- no more, no less. At some point, for instance, continuing to increase resolution will not result in any appreciable gain in image quality, only a larger file size(6)
Yet what is the smallest level of significant detail? For text it might be the smallest letter or symbol that the reader needs to be able to see. In printed books this is often to be found in the footnotes, but in maps and line drawings the object might be an individual house or cartographic symbol. In manuscripts it could be down to distinguishing between textures (e.g. hair and flesh) of the vellum. In photographs or pictures it could be a number of things depending upon the user. To paraphrase - ‘the significant detail is in the eye of the beholder’. James Reilly, Director of the Image Permanence Institute, describes a strategy for scanning photographs which is applicable generally of ‘knowing and loving your documents’. He advocates choosing a representative sample of photographs and, in consultation with those with curatorial responsibility, identifying key features that are critical to the documents' meaning. It is assumed that those with curatorial responsibility will be aware of two important features:
Chapman and Kenney (October, 1996) list the selected attributes of source documents which can help in assessing significance as:
size of details (in mm)
text characteristics (holograph, printed)
medium and support (e.g., pencil on paper)
illustrations (content and process used)
tones, including colour
dynamic range, density, and contrast
detail and edge reproduction
Furthermore, in their Digital Imaging for Libraries and Archives (June 1996, pp. 7-34) they provide a comprehensive system for checking most of the above categories based on a Quality Index system, and using target examples such as the RIT Alphanumeric Test Object, the IEEE Std 167A.1-1995 Facsimile Test Chart, the AIIM Scanner Test Chart 2, the Kodak Q13 Greyscale Control Bar, and the Kodak Q60 Colour Target, noting particular success with the RIT and AIIM tests for resolution(8).
Similarly, the NARA’s EAP guidelines (http://www.nara.gov/nara/vision/eap/eapspec.html) have extensive guidelines for benchmarking and calibration assessing:
Choose a sample of hard-copy originals, along with print negative counterparts.The above suggestions for benchmarking are entirely valid and form an extremely useful base. However, the most important rule is the rule of eye/experience. Regardless of what the more scientific approaches to benchmarking indicate, one has to produce a pilot scan and judge the results according to what one can see, what one can print, etc. Furthermore, it is widely recognised that no benchmarking can be truly accurate as nearly every collection encountered will have considerable variation within it.
Digitize portions of the original volume at 600 dpi (title page, table of contents, selected illustrations, indexes) using a calibrated Xerox WG-40 flat-bed scanner with as many of the enhancement features invoked as possible and practical, and following the operational guidelines developed by Cornell University.
Produce laser prints at 600 dpi on the Xerox DocuTech.
Digitize the identical pages from the microfilm print negative version.
Produce laser prints at 600 dpi on the Xerox DocuTech.
Compare matching prints under an eye-loupe (10X magnification), paying particular attention to letter fill-in or drop-out or fill in, highlights and shadows in line drawings, etc.
Choose one combination of filter settings for the microfilm scanner that achieves most closely the appearance of the digitized original.
Note the characteristics of the film source, once "maximum" quality has been obtained.
Scan a volume with similar basic characteristics without benchmarking from the original.
Compare prints of "benchmarked" volume with unbenchmarked one; adjust settings accordingly; note sources of discrepancies for future reference.
Abstracting from all of this, however, the over-riding message is that benchmarking must be viewed as central to the digitization process. The original source document’s dimensions, condition, attributes, and above all the finest level of detail you need to capture bearing in mind user requirements must all be considered. Having established these you will need to perform the benchmarking tests themselves. Current standards of digitization and the nature of the source documents themselves will have direct effects on how successful you are. For a real-life example of this process in action readers should consult D’Amato and Klopfenstein (1996) especially section 6 on the benchmarking of the illustrations (http://www.nmnh.si.edu/cris/techrpts/imagopts/section6.html#RTFToC31). In this the ‘characteristics’ (the ‘full informational capture approach’ detailed above) of the illustrations were noted, benchmarking was performed at various levels of detail, and then the curators were consulted as to what would be the best standard.
The discussion above, of course, relates to the digitization of manuscripts, texts, graphics, etc., but not to time-based media. In the latter (i.e. film and audio) extra benchmarking standards will need to be brought in looking at pitch, tone, volume, smoothness of motion and transmission, bringing into play extra standards such as fps (frames per second), and khz.
However, before looking at how these additional costs may increase the
funding required it is worthwhile getting some idea of the unit costs related
to digitizing material. There is not simple check-list that will provide
accurate and comprehensive figures for digitizing a single item and one
can only look at figures presented by other projects. In all cases the
figures presented are simply guidelines and should not be regarded as formulaic
as there are numerous variables that may come into play related to the
condition of the original source document that could lead to marked increases
in the unit costs. For example, in the feasibility study for the JIDI project,
the Higher Education Digitization Service drew up the following matrix:
All prices are exclusive of VAT, in pounds sterling, and are for outputting uncompressed TIFFs.
The majority of the projects (5 out of 7) were delivered within the costs identified in the JIDI Feasibility Study report, which is even more impressive because of the time lapse (over a year) between report and actual projects commencement. Out of the remaining 2 contracts, HEDS priced one well over the amount in the JIDI report because of the nature of the originals and the low volume (costs were estimated at a figure of 5,000 items though in reality most projects only brought in between 500-1,500 items which will push up prices), whilst the other contract was costed at £2.20 per item when the JIDI report allowed for an upper limit of £2.00.
A different example, but equally instructive, can be found in the case
study of costs performed by the BUILDER project for two of their collections
(the University of Birmingham Exam papers, and the Midland Collection).
The digitization of each was analysed using various methods and scanners
with the costs equating to:
|Type of scanning||University of Birmingham Exam Papers (1587 exam papers, 4539 pages, all A4, loose-leaf, typed/printed text, bi-tonals at 300dpi, converted to PDF)||Midland History (1971-1997) (3987 images 6” x 9”, strippable journal, 600 dpi bi-tonal TIFFs, plus 50 photographs, 600dpi greyscale 8-bit TIFFs)|
|Out-sourcing to HEDS, delivered back on CD||Unit cost: 11p per page
Overall cost (including set-up and production): £775.00 (i.e. 17p per page)
|Using Minolta PS3000P Scanner (scanner available)||Hourly production rate: 90 pages
Overall cost (including weighted annual salary but not hardware): £724.06 (i.e. 16p per page)
|Hourly production rate: 80 pages
Overall cost (including weighted annual salary but not hardware): £715.50 (i.e. 18p per page)
Overall cost plus Adobe capture software (for photographs): £1203.13 (30p per page)
|Flat-bed scanner with sheet-feeder (Fujitsu ScanPartner 600C for Birmingham Exam Papers, Fujitsu M3093DE/DG for Midland History))||Hourly production rate: 180 pages
Overall cost (including weighted annual salary): £362.03 (i.e. 8p per page).
Overall cost if hiring scanner: £2,007.17 (44p per page)
Overall cost if buying scanner: £2124.53 (47p per page)
|Hourly production rate: 180 pages
Overall cost (including weighted annual salary): £362.03 (i.e. 8p per page).
Overall cost if hiring scanner, plus Adobe Capture Software: £4449.66 (£1.12 per page)
Overall cost if buying scanner plus Adobe Capture Software: £5092.75 (£1.28 per page)
|Optical Character Recognition||Not calculated||OCR Processing Time: 133 hours
(@ 2 mins per page)
Proof Reading Time: 665 hours (@10 mins per page)
Total cost: £11,456.64 (or £2.94 per page)
BUILDER’s study comes down heavily on the side of outsourcing material for digitization. Experience here shows that the costs of outsourcing material via HEDS are considerably less than attempting to perform the task in-house, even taking into account postage, preparation, and quality assurance. In addition, via the external vendor approach no staff time was involved in the scanning and negotiations were smooth. On the other hand although it can be seen that costs increase considerably with in-house scanning (note the Minolta scanner was made available through a previous project), this must be balanced with the benefits to the host institution of the experience gained, proximity of digitization to material, internal management of source documents and files, and easier quality control. One additional point that is worth noting relates to buying equipment outright as opposed to hiring it in. BUILDER discovered that the increase in costs brought about by purchasing a flat-bed scanner as opposed to hiring one was negligible, thus making the latter system questionable for projects that run over 2 years or more (the conditions of hiring the scanners demanded that there was a 2 year lease).
Costs experienced by Oxford projects are compatible with the general areas of the figures above, but illustrated the importance of taking into account the hidden costs of a digitization project. The Bodleian Broadside Ballads project, combining in-house microfilming with out-sourced scanning of the surrogate, noted a cost of around 61-65p per image. The Celtic and Medieval Manuscripts project, digitizing at high level using either a Kontron or Dicomed camera directly from source, noted a cost per image ranging from £2.50 to £4.50 based on weighted annual salaries and throughput. Yet like the other projects the need to stress the additional costs was emphasised. It was noted, for example, with the Celtic and Medieval MSS that updates to hardware should be at a rate of a new PC every 3 years (c.£1,000), and a new camera every 5 years (c.£20,000). ILEJ, in its final report, noted that each page cost c. 18p to scan, but on top of this each image cost 29p to index, and 25p to process, bringing the true cost up to 75p per image. Furthermore the RSP noted that out-sourcing to Xerox (as recommended by HEDS) resulted in a complete costing breakdown of:
Unit cost: 12p per page
OCR: 16p per page (but does not include cost of proof-reading)
Medium costs: £40.00 per CD
The Wilfred Owen project (see Lee and Groves, 1999) produced a reasonably detailed costing sheet looking at digitization costs of graphics (ranging from 50p to £10.00 per image), keying in (£1.50 per 1,000 characters) and audio/video capture (£5.00 per 3 minutes). In total the full project cost £62,000 delivering around 2,000 digital objects (pages of manuscripts and from a journal, photographs, video/audio clips, still shots), averaging £31 per digital object. Yet, outside of staffing, consultancy, hardware/software, and copyright, only around £3,000 pounds was actually spent on digitizing (i.e. £1.50 per digital object).
Kenney and Rieger (1998) include a much more developed costing sheet than the one produced in the Wilfred Owen Report. As well as indicating hourly production rates for in-house scanning they include a full range of costs for scanning images (various sizes and bindings) at differing resolutions (ranging from $0.25 to $12.00 per image). Once again, they stress the hidden costs of the digital project which are not always apparent when looking at the costs per page level.
In terms of in-house throughput this varied considerably, depending upon the methods used. At its peak, for example the Mekel MX500 XL-G, used in the ILEJ project, should achieve 1,200 images per hour (i.e. two microfilms), but in reality this dropped to as low as 300 a day. The Celtic and Medieval Manuscript project's high-level cameras peaked at a maximum of 200 scans a day (small pages, easy handling), but more realistically settled at between 40 and 70 images a day for the larger files. These figures were for consecutive pages of a given volume, or for comparatively easy single sheet material, and throughput rates would be much slower for odd pages from many volumes, just as the initial setup times can be substantial. Using similar equipment, the JIDI John Johnson collection is only managing to achieve around 40 scans a day (c.800 a month). An example of a project study external to Oxford, is the NDLP Digital Conversion Team which noted:
A current plan to scan 60,000 pages from early congressional journals in bound volumes calls for three people to prepare materials to keep five scanners (with two people per scanner) busy for twelve weeks. Another three full-time people are expected to take twenty weeks to review scanned page-images and derived text versions marked up with SGML after delivery by the contractor. In some cases, preparation and quality review are performed by members of the NDLP Digital Conversion Team. In others, the NDLP has supported the hiring of staff to be based within the divisions responsible for different types of material (such as Music, Prints & Photographs, or Geography & Maps). (Arms, April 1996).
Related to this is the concern expressed be many librarians and curators for the need to quickly throughput high-demand material. Items which have been sent for digitization in many cases will have to be returned to the library or department as quickly as possible to satisfy reader requests.
Bearing in mind the problems associated with drawing up accurate costs for a project the policy adopted by this scoping study has been, in full consultation with HEDS, as follows:
High resolution (aka. Archiving resolution) used for highest quality digitization (e.g. Oxford’s Celtic and Medieval Manuscript project), for archiving file format with ‘full informational capture’, for outputting to high-quality print and film surrogates (in the case of graphics/text documents). The distinction between preservation copies and preservation-quality images should be noted (see http://memory.loc.gov/ammem/pictel/index.html).
Medium resolution used for screen display, low quality printing.
Low resolution often used for thumbnails
Depending on the project, all of the above could in theory be treated as access level images, though in most cases this term usually refers to medium resolution images for screen displays, and/or low resolution thumbnails to allow for quicker browsing of the collection. The NARA EAP Project has detailed guidelines for their specifications for digitizing, including a matrix which summarizes the specifications derived for the project (http://www.nara.gov/nara/vision/eap/eapspec.html).
With graphics one of the most important factors is resolution. Different resolutions are required for different purposes, with high resolution often equating to an increase in unit costs. Furthermore, the digitization industry at present is awash with various projects digitizing at different resolution levels, and there are no archival or access standards available to which everyone adheres. Resolution usually refers to the number of horizontal and vertical pixels, e.g. a 640 x 480 image means 640 pixels along the horizontal axis and 480 pixels along the vertical. Dots per inch or DPI refers both to the number of dots/pixels captured per inch from the source document and to the number of pixels per inch on computer monitors and available through printers. However, as with most studies of this nature when referring to DPI, this report is simply looking at the dots per inch used in the scanning or capturing process. It is worth also considering the problem of ‘effective dpi’. Take for example a picture of a map. The digitizer scans the picture in at 600 dpi (e.g. a 5” x 4” print, with the image of the map filling the whole photograph exactly). However, if the original map was in fact 25 inches across (i.e. five times the width of the picture) the ‘effective dpi’ would be 600/5, i.e. 120dpi.
Franziska Frey notes that ‘a growing consensus within the preservation
community is that a number of image files must be created for every photograph
to meet a range of uses’ (1997). The article goes on to outline a set of
standards for three example access files:
|The digital image is used only as a visual reference image in an electronic data base||
|The digital image is used for reproduction||
|The digital image represents a "replacement" of the original in terms of spatial and tonal information content||
These should be looked on as simple ‘generic’ guidelines and can not be viewed as an accurate forecast of the digitizing standards for all projects of a similar nature. As noted above, there are too many variables which may come into play differentiating between seemingly similar types of source documents. For example, the Library of Congress’s Manuscript Digitization Demonstration Project (http://memory.loc.gov/ammem/pictel/index.html) outlined the types of issues they considered when looking at their source documents:
In short any attempt to define a set of digitizing standards is fraught with difficulties. Instead, this study has collected together notes on the varying standards used by numerous Oxford, national, and international digitization projects. These have been presented in a tabular form for people to consult with ease and reflect the decisions made to date (i.e. by March 1999 [N.B. This table has not been made available in this report as it is meant for internal study only]).
These should not be used as definitive requirements for any other projects, but a few overall points can be made:
* If a high quality film intermediary already exists, it is cheaper and safer to scan from film rather than from the original item. The quality of the intermediary will have a direct impact on the quality of the digital image. If, as with older film (i.e. captured prior to the British standards established in the 1970s), the intermediary is of poor quality the scanned image will be inferior. It is recognised that it is best to use camera negatives whenever possible as subsequent duplication leads to loos of detail and resolution:
‘In general, it is better to work from a negative than from a positive not only because of generational loss but because the negative provides a smoother curve in the dynamic range, so that highlights and shadows are handled better (Ester, 1996).’
However, within the libraries sector, Oxford is fortunate in the equipment it has for high-level scanning. At present it has:
The disparate locations of this equipment, however, pinpoints the need to address where the digitizing equipment should be housed. Even with the most stringent of security systems and safety precautions, some digitization would have to be done at point of collection, i.e. when dealing with rare or unique items (some of which are priceless), one does not wish to move material around too much, particularly using public areas. As Arms (1996) notes it is usual to ‘capture from original materials … on site under curatorial supervision’, something which is particularly relevant for the needs of Oxford given the intellectual and financial value of many of the holdings which might be digitized. The solution to this problem is two-fold:
The Library of Congress concurs with many of these findings. For the American Memory Project running under the NDLP (National Digital Library Program) it established a fifteen-point workflow practice for the Quality Review stage covering such areas:
Aids and more specific guidelines to help in the testing of such ideas as contrasts and noise are available with accepted test cards and procedures. A good place to start to look for these is the Photographic and Imaging Manufacturer’s Association/IT10 Still Picture Imaging page (http://www.pima.net/it10a.htm) which has a growing list of standards. (See also Reilly and Frey, 1996).
Arms, C. ‘Historical Collections for the National Digital Library: Lessons and Challenges at the Library of Congress’ Part I D-Lib Magazine (April, 1996, http://www.dlib.org/dlib/april96/loc/04c-arms.html); Part II in D-Lib Magazine (May, 1996, http://www.dlib.org/dlib/may96/loc/05c-arms.html). A useful preliminary review of the National Digital Library Program’s (NDLP) digitization of Americana at the Library of Congress.
Arnamaganean Institutes in Copenhagen and Reykjavik (http://www.hum.ku.dk/ami/aminst.html; http://www.hum.ku.dk/ami/amproject.html). Aims to produce a catalogue with links to digital images of the complete collection. Access to these images will be: low-quality (75 dpi) watermarked images available freely over the Web; higher quality (300 dpi) images available to subscribers; and high-quality images (600 dpi) available for sale.
Arts and Humanities Data Service (http://www.ahds.ac.uk/). The main web site of the AHDS with links to all the service providers. In particular the Managing Digital Collections section (http://ahds.ac.uk/manage/manintro.html) with its series of reports.
Ashmolean Museum (http://www.ashmol.ox.ac.uk/). Various projects have been going on at the Ashmolean Museum. These include a collaboration with the Bridgeman Art Library to build up an image library from transparencies, and a forthcoming project looking at the Allen Photography Archive (c. 1,500 black and white photographs).
Australian Co-Operative Digitization Project (http://www.nla.gov.au/ferg/). Collaborative scanning project of Australian newspapers from 1840-45. Produced (via out-sourcing) 400dpi bi-tonal TIFFs (CCITT Group 4 compression). Noted that only 15-20% of digital images could be produced from existing microfilm stock, so had to cost in new microfilming. Reviewed in Howell (1997).
Beazley Archive (http://www.beazley.ox.ac.uk). Four projects currently underway: Database of Athenian Pottery; Beazley’s Drawings; Cast Collection; and Ancient Gems and Finger-Rings. Working with a system to automatically watermark images.
Besser, H., and Trant, J. ‘Introduction to Imaging’ (1995 - http://www.gii.getty.edu/intro_imaging/0-Cover.html). A good overview including a very approachable description (with images) of the types of equipment used in scanning (http://www.gii.getty.edu/intro_imaging/11-Scan.html). It also includes an extremely useful Glossary of technical terms (http://www.gii.getty.edu/intro_imaging/Gloss.html).
Besser, H., and Yamashita, R. ‘The Cost of Digital Image Distribution’ (http://sunsite.berkeley.edu/Imaging/Databases/1998mellon). An extensive Mellon-funded report of the Museum Education Site License Project (MESL). Provides comprehensive figures on costings and processes.
Bodleian Broadside Ballads Project (http://www.bodley.ox.ac.uk/mh/ballads/). C. 30,000 images of Broadside Ballads held at the Bodleian Library, Oxford. Images are photographed to microfilms. These are then sent to an outside vendor for scanning to bi-tonal TIFFs at 400 dpi. On their return they are batch-processed to GIFs and made available via the Allegro database system.
BUILDER Birmingham University’s Integrated Library Development and Electronic Resource (http://builder.bham.ac.uk). A major Hybrid Library project funded under the ELIB project, aimed at developing ‘a working model of the hybrid library within both a teaching and research context, seamlessly information sources, local and remote, using a Web-based interface, and in a way which will be universally applicable.’
Burney Collection at the British Library (http://minos.bl.uk/diglib/access/microfilm-digitisation.html). 1,500 reels of early English newspapers from the Civil War onwards. Used a Mekel 400 XL scanner, in-house, to produce 400 dpi bi-tonal TIFFs (using CCITT Group 4 compression). Reviewed and outlined in Howell (1997).
California Heritage Collection ‘Digitizing the Collection: Image Capture’ (http://sunsite.berkeley.edu/CalHeritage/image.html). Discusses the 1996 project which involved adding images (thumbnails) to finding aids for collections held at the Bancroft Library. Used 35mm slides captured to PhotoCD, converted to JPEGs and then to GIFs for thumbnails, but noted the problems in this multi-staged workflow.
Caribbean Newspapers Imaging Project (http://www.karamelik.uflib.ufl.edu/projects/mellon/). Based at the University of Florida, and funded by the Mellon foundation. Used collections at the A. Smathers Libraries, digitizing from microfilm 265,000 pages of Caribbean newspapers. Produced 400dpi bi-tonal scans (TIFF CCITT Group 4 compression) but experimented with 400dpi greyscales. Reviewed in Howell (1997).
Celtic and Medieval Manuscripts (http://image.ox.ac.uk/). High-resolution digitization of a series of medieval manuscripts held at the Bodleian library and college libraries at the University of Oxford. Using 1 Kontron and 2 Dicomed cameras. All three have employed special cradles but could use traditional copy stand. The project found that the maximum size of document that could be dealt with was A3 and had to use special high-frequency fluorescent cold lighting used (based on model of National Library of Scotland) as traditional lamps produced too much heat (1 hour could induce a 0.1% shrinkage of vellum). Images are scanned as 24-bit uncompressed TIFFs aiming at 600dpi (but getting at best 570dpi). Browsable using HTML only allowing access to JPEGs and GIFs (but has experimented with FlashPIX).
Centre for the Study of Ancient Documents (University of Oxford; http://www.csad.ox.ac.uk/CSAD/Images.html). This unit, part of the faculty of Literae Humaniores, is conducting a series of imaging projects on rare and unique material, including ‘squeezes’ (filter paper impressions of inscriptions), stylist tablets (using 180 degree imaging in collaboration with the University’s Department of Engineering), ink tablets, and papyrology.
Chapman, S., Conway, P., and Kenney, A. R. ‘Digital Imaging and Preservation Microfilm: The Future of the Hybrid Approach for the Preservation of Brittle Books’ RLG DigiNews 3.1 (February 15, 199; http://www.thames.rlg.org/preserv/diginews/diginews3-1.html). A full report (and decision matrix re film first/scan or COM approaches) will appear on the CLIR Web Site.
Colorado Digitization Project (http://coloradodigital.coalliance.org/toolbox.html). An extremely useful site of links to the major topics surrounding digitization.
Columbia University’s ‘Technical Recommendations for Digital Imaging Projects’ (http://www.columbia.edu/acis/dl/imagespec.html). A concise set of guidelines for digitization projects prepared by the Image Quality Working Group of ArchivesCom, a joint Libraries/AcIS committee.
Council on Library and Information Resources (http://www.clir.org/) - CLIR. Runs four programmes (Commission on Preservation and Access, Digital Libraries, The Economics of Information, and Leadership) and commissions numerous publications. Their Commission on Preservation and Access state that: ‘some information is created digitally and exists only that way, but historical materials are also being digitized as a means of providing access to special collections that have been locked in libraries and archives to prevent their deterioration. All digital files pose serious preservation problems, and finding ways to assure the safekeeping and accessibility of knowledge in this new format is among CLIR's highest priorities’ (http://www.clir.org/programs/cpa/cpa.html).
D’Amato, D., and Klopfenstein, R. C., ‘Requirements and Options for the Digitization of the Illustration Collections of the National Museum of Natural History’ (March 1996, http://www.nmnh.si.edu/cris/techrpts/imagopts/index.html). A comprehensive study of the digitization of fish illustrations for the Museum. Takes the project through its various stages of selection, benchmarking and digitization.
DEBORA Project (http://www2.echo.lu/libraries/en/projects/debora.html). Although just starting this EC project aims to ‘develop tools for accessing collections of rare 16th century documents via networks. This includes the setting up of a production chain for digitizing old books. Digitisation will yield sets of images to be stored and indexed in an Image Base Management System (IBMS), accessible via the World-wide Web. The tools will also incorporate image recognition and features supporting co-operative work.’
Digital Heritage and Cultural Content (http://www.echo.lu/digicult/en/backgrd.html). EC-funded site looking at libraries and technology. Includes the full copy of ‘Digitisation of Library Materials’, report of the concertation meeting and workshop held in Luxembourg on 14 December 1998.
Donovan, K. ‘The Promise of FlashPix Image File Format’ RLG DigiNews 2.2 (April 15, 1998 -http://www.rlg.org/preserv/diginews/diginews22.html#FlashPix). A useful overview and analysis of the FlashPix image file format, which may provide a useful solution for access level images. This format has been successfully used by the Celtic and Medieval Manuscripts project at the University of Oxford (http://image.ox.ac.uk/) for a stand-alone exhibition in Ireland.
Elkington, N. ‘Joint RLG and NPO Conference on Guidelines for Digital Imaging’ RLG DigiNews 2.5 (October, 1998; http://www.thames.rlg.org/preserv/diginews/diginews2-5.html#feature1) good overview with links for the workshop held in Warwick, 1998.
Frey, F. ‘Digital Imaging for Photographic Collections: Foundations for Technical Standards’ RLG DigiNews 1.3 (December 15, 1997 - http://www.rlg.org/preserv/diginews/diginews3.html#com). A comprehensive discussion of the standards used for digitizing photographs and many of the issues involved.
Gertz, J. ‘Oversize Color Images Project’ Phase I (http://www.columbia.edu/dlc/nysmb/reports/phase1.html), and Phase II (http://www.columbia.edu/dlc/nysmb/reports/phase2.html)
Global Inventory Project (http://www.gip.int). An EC and G8 funded project that allows one to search an inventory of digital initiatives. Described as a ‘one stop facility’ linking distributed national and international inventories of projects, studies and other activities relevant to the promotion and the further development of knowledge and understanding of the Information Society.
Hawaiian Newspaper Project (http://hypatia.slis.hawaii.edu/~hnp/welcome.html). This project seeks to make available selected, heavily used Hawaiian language newspapers (1834-1948) to students throughout the state of Hawaii who have access to the World Wide Web (WWW). Uses a Minolta Microdax 3000 digital microfilm workstation, scanning approx. 3,800 images to TIFF and GIF formats.
Howell, A. ‘Film Scanning of Newspaper Collections: International Initiatives’ RLG DigiNews 1.2 (August, 1997, http://www.thames.rlg.org/preserv/diginews/diginews2.html#film-scanning). A useful review of three initiatives The Burney Collection at the BL, the Caribbean Newspaper Imaging Project at the University of Florida, and the Australian Co-Operative Digitization Project. Outlines the problems and opportunities of scanning newspapers (all from microfilm). Mentions that the ideal resolution is 600dpi bi-tonal TIFFs but only the Yale Open Book project has successfully achieved this and needed modifications to their Mekel scanner. All three projects resorted to 400dpi bi-tonal scanning, though the University of Florida experimented with conversion from 400 dpi greyscale scanning.
Internet Library of Early Journals (ILEJ, http://www.bodley.ox.ac.uk/ilej/). A collaborative project between Oxford, Leeds, Birmingham and Manchester. Six journals chosen, The Builder, Notes & Queries, Blackwood's (19c) and PTRS, Gentleman's Magazine, and the Annual Registry (18c). Covering 10 or 20 run from each, total images 108,000. Oxford, Manchester, and Birmingham provided main scanning locations (using scanning assistants) with Oxford also doing microfilm scanning. For paper-based documents two Minolta PS3000P scanners were used (one in Manchester, one in Birmingham). Scanned as bi-tonal TIFFs (400dpi) converted to 100dpi GIFs, usually measured as pixels, c. 1000 across. Conversion was performed using ImageAlchemy. Microfilm scanning used Mekel MX500XL-G based in Bodleian Photographic Studio. Scanned as greyscale and bi-tonals, double images (i.e. 2 pages) then split into single page images (21,000 in total from The Gentleman's Magazine and The Builder). The Builder scanned as 200dpi TIFFS (c. 10MB an image) converted to JPEGs (better compression than GIF). Gentleman's Magazine 70% scanned as 300dpi bi-tonal TIFFs converted to GIFs, and 30% as 100dpi greyscale TIFFs converted to JPEG.
Kenney, A. R. ‘The Cornell Digital Microfilm Conversion Report: Final Project to NEH’ RLG DigiNews 1.2 (August, 1997, http://www.thames.rlg.org/preserv/diginews/diginews2.html#com). A summary report of the Computer Output Microfilm project involving 177 reels of 19th and 20th century agricultural history documents.
Kenney, A. R. and Chapman, S. ‘Digital Conversion of Research Library Materials: A Case for Full Informational Capture’ D-Lib Magazine (October, 1996; http://www.dlib.org/dlib/october96/cornell/10chapman.html). This also provides a useful example of benchmarking with an analysis of a real life example using a 1914 ‘brittle book’ entitled Farm Management.
Kenney, A. R. and Chapman, S. Digital Imaging for Libraries and Archives (New York, 1996 ISBN 1 85604 207 3). This book accompanied a series of workshops conducted in the US. It is an invaluable book full of extremely useful formulae, reading lists, etc. Reviewed in Ariadne (http://www.ariadne.ac.uk/checkout/digital-imaging/intro.html) by Brian Kelly, 14th March, 1997.
Kenney, A. R., and Rieger, O. Y. Managing Digital Imaging Projects: An RLG Workshop (RLG: May, 1998). Another book to accompany a workshop on digital imaging projects, which is once again extremely useful. It tackles the area of costing and managing projects, but also has an overview of the basic technologies.
Lee, S. D., and Groves, P. ‘On-Line Tutorials and Digital Archives or ‘Digitising Wilfred’’ (Jan 1999, http://www.jtap.ac.uk). Full report on the Wilfred Owen Multimedia Digital Archive including digitization costs.
Library of Congress American Memory Project and National Digital Library Program (http://lcweb2.loc.gov/). It is strongly recommended that interested parties look at their ‘Quality Review of Document Images’ internal training guide which provides a comprehensive discussion of the problems and recommended solutions adopted in the project (http://lcweb2.loc.gov/ammmem/award/docs/docimqr.html). In addition, in association with Ameritech the LOC have run a National Digital Library Competition (http://memory.loc.gov/ammem/award/lessons.html). This ‘lessons learned’ briefings covers a range of projects including:
National Archives and Records Administration’s Electronic Access Project (http://www.nara.gov/nara/), especially their Guidelines for Digitizing Archival Materials for Electronic Access (http://www.nara.gov/nara/vision/eap/eapspec.html).
Noerr, P. ‘The Digital Library Toolkit’ (April 1998, http://www.sun.com/edu/libraries/digitaltoolkit.html). A good overview of questions and processes involved in the setting up a digital library.
Photographic and Imaging Manufacturer’s Association (http://www.pima.net/it10a.htm). This has numerous ‘standards’ and downloadable test cards for quality assurance tests of digital images.
Refugee Studies Programme Digital Library Project (Oxford). An extensive collection (c.25,000 items) of Grey literature focusing on Refugee Studies, drawn from the collections at Oxford (currently in the pilot stage). All material had to be disbound as appropriate and then scanned by Xerox off-site at 300dpi bi-tonal TIFFs, Group 4 compression (some colour also, and some greyscale). A copy sent to RAMOT Digital for batch processing into IOTA. Xerox also provide uncorrected OCR for use in access system employing a TEI catalogue and OpenText 5.0. The original Feasibility Study performed by the Higher Education Digitization Service (http://heds.herts.ac.uk) is now publicly available at: http://heds.herts.ac.uk/Guidance/RSP_fs.html.
Reilly, J. and Franziska, F. ‘Recommendations for the Evaluation of Digital Images Produced from Photographic, Micrographic, and Various Paper Formats’ (http://lcweb2.loc.gov/ammem/ipirpt.html). A detailed evaluation of the performance of scanners, commissioned by the NDLP.
‘Scanners and Digital Cameras’ RLG DigiNews 1.1 (April 15, 1997 - http://www.thames.rlg.org/preserv/diginews/diginews1.html#hardware&software). Although a bit dated, many of the links to sites evaluating digital cameras are valid.
Sharpe, L. ‘Preservation-Quality Scanning of Bound Volumes: Integration of the Picture Elements ISE Board with the Minolta PS-3000 Book Scanner’, RLG DigiNews 1.1 (April 15, 1997 - http://www.thames.rlg.org/preserv/diginews/diginews1.html#feature). Outlines some of the problems with bound volume scanning, and notably use of the Minolta PS3000 scanner.
Smith, A. 'Why Digitize?' and 'The Future of the Past: Preservation in American Research Libraries' (1999 - http://www.clir.org/pubs/reports/reports.html). New reports at the CLIR site. Both studies comes down heavily on the side of digitization for access, as opposed to preservation.
Susstruck, S. ‘Imaging Production Systems at CORBIS Corporation’ RLG DigiNews 2.4 (August 15, 1998; http://www.rlg.org/preserv/diginews/diginews2-4.html#technical). Describes the large digital archives created by the CORBIS Corporation. Use high quality drum scanners (cost around $30-60,000 each).
Technical Advisory Service for Images (TASI - http://www.tasi.ac.uk/). See especially their guidelines and summaries for creating a digital archive (http://www.tasi.ac.uk/building/building2.html).
‘Technical Review: Outsourcing Film Scanning and Computer Output Microfilm (COM) Recording’ RLG DigiNews 1.2 (August, 1997; http://www.thames.rlg.org/preserv/diginews/diginews2.html#hardware&software). A comprehensive review of vendors, including hardware, software, contact details, output formats, etc.
Toyota City Imaging Project (http://www.bodley.ox.ac.uk/toyota/openpage.html). Drawn from the material held in the John Johnson collection at Oxford. Photographed onto 35mm slides by the Bodleian Photographic Studio and then outsourced for conversion to PhotoCD (project began in 1993 when PhotoCDs were widely available but high resolution digitization equipment was not). Access images were taken from PhotoCDs using ImageAlchemy for conversion. Base/16 converted to GIFs, Base x 4 to JPEGs, and also created thumbnails.
UMI’s Early English Books project. Digitization of Early English Books I and II (following Pollard and Redgrave) from microfilm collections. Approximately 22m pages (or 11m images). Scanning to 400 dpi TIFFs using 13 SunRise SR50s, occupying 3.5TB of storage. Perform 100% QA on all images, with indexing to page level delivered by Fulcrum database. Delivering compressed images on the fly by AT & T’s DjVu software, and using Digimarc watermarking. Back-ups on CD-ROM (c. 5,000) and currently delivered by Jukebox system (but with first 24 images of each item being stored on a Hard Drive, using Sun Ultra 450 web server).
Webb, C. ‘The Ferguson Project: A Hybrid Approach to Reformatting Rare Australiana’ (http://www.nla.gov.au/nla/staffpaper/cwebb1.html). A National Library of Australia project based on John Alexander Ferguson’s Bibliography of Australia. Outlines the benefits of the hybrid approach (microfilm and digitization).
Wilfred Owen Multimedia Digital Archive (http://info.ox.ac.uk/jtap). Manuscripts, photographs, audio, and video digitization project centred around the poet Wilfred Owen and the Great War. Used various methods including outsourcing to high resolution digitization unit (using the Kontron camera) at Oxford, but also employed flat-beds, etc. Audio delivered as RealAudio files, and video as MPEG II and QuickTime.
Yale’s Open Book Project (http://www.library.yale.edu/preservation/pobweb.htm).
Yale University Library major project to convert 10,000 books from microfilm
to digital form, using Xerox Corp. for the out-source scanning (with a
Mekel M400 microfilm scanner). Reports available at the CLIR’s site (http://www.clir.org/cpa/reports/openbook/openbook.html
(1) For example, the Centre for the Study of Ancient Documents (Oxford) is working in collaboration with the University’s Department of Engineering to investigate ways of scanning, via a 180 degree arch, of stylus inscriptions.
(2) Here we are talking about the suggestion that a digital preservation copy can be created as opposed to using standard preservation surrogates. This is not the same as attempts (highly valid) to study how to preserve digital objects such as those being conducted by the CEDARS project.
(3) As Arms (April, 1996) notes: ‘One issue that can not be adequately addressed here is an ongoing topic of discussion at the Library: the potential for digital versions to serve as preservation copies. Traditionally, preservation of content has focussed on creating a facsimile, as faithful a copy of the original as feasible, on a long-lasting medium. The most widely accepted method for preserving the information in textual materials is microfilming and for pictorial materials is photographic reproduction.’
(4) The overall problems of preserving digital information are being addressed by the United Kingdom’s CEDARS project (http://www.curl.ac.uk/cedarsinfo.shtml). For a concise overview of the problem (with historical perspective) see the University of Iowa’s on-line exhibition on preserving information (http://www.lib.uiowa.edu/ref/exhibit/contents.htm).
(5) Kenney, A. R. and Chapman, S. Digital Imaging for Libraries and Archives (New York, 1996 ISBN 1 85604 207 3), p. iv. Reviewed in Ariadne (http://www.ariadne.ac.uk/checkout/digital-imaging/intro.html) by Brian Kelly, 14th March, 1997.
(6) Kenney, A. R. and Chapman, S. ‘Digital Conversion of Research Library Materials: A Case for Full Informational Capture’ D-Lib Magazine (October, 1996; http://www.dlib.org/dlib/october96/cornell/10chapman.html). This also provides a useful example of benchmarking with an analysis of a real life example using a 1914 ‘brittle book’ entitled Farm Management.
(7) For accurate reproduction of colour, one should look to the work of the International Color Consortium (ICC - http://color.org/) in particular their ‘ICC Profile Format’. See the discussion of Color Management Systems in the ‘Technical Review’ of RLG DigiNews 1.3 (December 15, 1997 - http://www.rlg.org/preserv/diginews/diginews3.html#hardware&software).
(8) See also the MTF Target: Sine Patterns M-13-60, discussed in Williams, D. ‘What is an MTF and Why Should You Care?’ RLG DigiNews 2.1 (February 15, 1998 http://www.rlg.org/preserv/diginews/).
(9) See RLG worksheet (http://www.rlg.org/preserv/RLGWorksheet.pdf) and Chapman and Kenney on Costs and Benefits (http://www.dlib.org/dlib/october96/cornell/10chapman.html).
(10) For a quick overview of the types of some of the common file formats available (notably via the Internet) see Perlman, E. and Kallen, I. ‘Common Internet File Formats’ (1995 - http://www.matisse.net/files/formats.html).
(11) It should be noted that the Centre for the Study of Ancient Documents also has a Phase One Fuji camera, but this is clearly owned by the institution.
(12) The Photographic Studio at the Bodleian employs 1 FTE for quality control on all microfilms.