This is an alternative route to a point that Walt Crawford and Michael Gorman make very well in their snappy 1995 book Future Libraries: Dreams, Madness, and Reality. It would take, Crawford and Gorman estimate, about 168 gigabytes of memory, after compression, to store one year’s worth of page-images of The New Yorker, scanned at moderate resolution, in color; thus, if you wanted to make two decades of old New Yorker s accessible in an electronic archive, you would consume more memory than OCLC uses to hold its entire ASCII bibliographic database. “No amount of handwaving, mumbo-jumbo, or blithe assumptions that the future will answer all problems can disguise the plain fact that society cannot afford anything even approaching universal conversion,” Crawford and Gorman write. “We have not the money or time to do the conversion and cannot provide the storage.”
E-futurists of a certain sort — those who talk dismissively of books as tree-corpses — sometimes respond to observations about digital expense and impermanency by shrugging and saying that if people want to keep reading some electronic copy whose paper source was trashed, they’ll find the money to keep it alive on whatever software and hardware wins out in the market. This is the use-it-or-lose-it argument, and it is a deadly way to run a culture. Over a few centuries, library books (and newspapers and journals) that were ignored can become suddenly interesting, and heavily read books, newspapers, and journals can drop way down in the charts; one of the important functions, and pleasures, of writing history is that of cultural tillage, or soil renewal: you trowel around in unfashionable holding places for things that have lain untouched for decades to see what particularities they may yield to a new eye. We mustn’t model the digital library on the day-to-day operation of a single human brain, which quite properly uses-or-loses, keeps uppermost in mind what it needs most often, and does not refresh, and eventually forgets, what it very infrequently considers — after all, the principal reason groups of rememberers invented writing and printing was to record accurately what they sensed was otherwise likely to be forgotten.
Mindful of the unprovenness of long-term digital storage, yet eager to spend large amounts of money right away, Lesk, Lynn, Battin, and the Technology Assessment Advisory Committee adopted Warren Haas’s position: microfilm strenuously in the short term, digitize from the microfilm (rather than from originals) in the fullness of time. Turn the pages once was the TAAC’s motto. Microfilm has, Stuart Lynn noted in 1992, higher resolution and superior archival quality, and we can convert later to digital images at “only a small increment 5of the original cost” of the microfilming. He sums up: “The key point is, either way, we can have our cake and eat it, too.”
As ill luck would have it, the cake went stale quickly: people just don’t want to scan from microfilm if they can avoid it. It isn’t cheap, for one thing: Stuart Lynn’s “small incremental cost” is somewhere around $40 per roll — that is, to digitize one white box of preexisting microfilm, without any secondary OCR processing, you are going to spend half as much again to convert from the film to the digital file as it cost you to produce the film in the first place. If you must manually adjust for variations in the contrast of the microfilm or in the size of the images, the cost climbs dramatically from there. And resolution is, as always, an obstacle: if you want to convert a newspaper page that was shrunk on film to a sixteenth of its original size, your scanner, lasering gamely away on each film-frame, is going to have to resolve to 9,600 dots per inch in order to achieve an “output resolution” of six hundred dots per inch. This is at or beyond the outer limits of microfilm scanners now.
And six hundred dots per inch doesn’t do justice to the tiny printing used on the editorial pages of nineteenth-century newspapers anyway. In an experiment called Project Open Book, Paul Conway demonstrated that it was possible to scan and reanimate digitally two thousand shrunken microfilm copies of monographs from Yale’s diminished history collection (1,000 volumes of Civil War history, 200 volumes of Native American history, 400 volumes on the history of Spain before the Civil War, and 400 volumes having to do with the histories of communism, socialism, and fascism) — but Conway was working from post-1983, preservation-quality microfilm made at the relatively low reduction-ratios employed for books. “We’ve pretty much figured out how to do books and serials and things up to about the size of, oh, eleven by seventeen, in various formats, whether it’s microfilm or paper,” Conway says. “We’ve kind of got that one nailed down, and the affordable technology is there to support digitization from either the original document or from its microfilm copy. But once you get larger than that, the technology isn’t there yet, [and] the testing of the existing technology to find out where it falls off is not there.” Conway hasn’t been able to put these scanned-from-microfilm books on the Web yet. “The files are not available now,” he wrote me,
because we chose (unwisely it now turns out) to build the system around a set of proprietary software and hardware products marketed by the Xerox Corporation. Our relationship with Xerox soured when the corporation would not give us the tools we needed to export the image and index data out of the Xerox system into an open, non-proprietary system. About two years ago, we decided not to upgrade the image management system that Xerox built for us. Almost immediately we started having a series of system troubles that resulted in us abandoning (temporarily) our goal of getting the books online…. In the meantime, the images are safe on a quite stable medium (for now anyway).
The medium is magneto-optical disk; the project was paid for in part by the National Endowment for the Humanities.
Newspapers have pages that are about twenty-three by seventeen inches — twice as big as the upper limits Conway gives. The combination of severe reduction ratios, small type, dreadful photography, and image fading in the microfilmed inventory make scanning from much of it next to impossible; one of the great sorrows of newspaper history is that the most important U.S. papers (the New York Herald Tribune, the New York World, the Chicago Tribune, etc.) were microfilmed earliest and least well, because they would sell best to other libraries. We may in time be able to apply Hubble-telescopic software corrections to mitigate some of microfilm’s focal foibles, but a state-of-the-art full-color multimegabyte digital copy of a big-city daily derived, not from the original but from black-and-white Recordak microfilm, is obviously never going to be a thing of beauty. And no image-enhancement software can know what lies behind a pox of redox, or what was on the page that a harried technician missed.
In the late eighties, the Commission on Preservation and Access wanted an all-in-one machine that would reformat in every direction. It commissioned Xerox to develop specifications for “a special composing reducing camera capable of digitizing 35mm film, producing film in difference reductions (roll and fiche), paper, and creating CD-ROM products.” As with Verner Clapp’s early hardware-development projects at the Council on Library Resources, this one didn’t get very far. The master digitizers — Stuart Lynn, Anne Kenney, and others at Cornell, and the Mellon Foundation’s JSTOR team, for example — realized almost immediately that they shouldn’t waste time with microfilm if they didn’t have to. “The closer you are to the original, the better the quality,” Anne Kenney told me. “So all things being equal, if you have microfilm and the original, you scan from the original.” JSTOR came to the same conclusion:
Читать дальше