One interesting discovery that we made in the process of obtaining bids is that working from paper copies of back issues of journals, rather than from microfilm, produces higher quality results and is — to our surprise — considerably cheaper. This conclusion has important implications beyond JSTOR.
It sure does have important implications: it means that most of the things that libraries chopped and chucked in the cause of filming died for nothing, since the new generation of facsimilians may, unless we can make them see reason, demand to do it all over again.
CHAPTER 35. Suibtermanean Convumision
The second major wave of book wastage and mutilation, comparable to the microfilm wave but potentially much more extensive, is just beginning. At the upper echelons of the University of California’s library system, a certain “Task Force on Collection Management Strategies in the Digital Environment” met early in 1999 to begin thinking about scanning and discarding components of its multi-library collections. Two of the librarians “anticipated resistance 1to the loss of printed resources, especially by faculty in the Humanities, but agreed that the conversation had to begin.” Others prudently pointed out that the “dollar and space savings would likely be minimal for the foreseeable future and should not be used to justify budget reductions or delays in needed building improvements.” Still others wanted to be sure that the organizers of the program arranged things so that the “campuses which discarded their copies would not be disadvantaged.”
For some years, Cornell’s Anne Kenney has been a leader of the scan clan. She knows (as she told the attendees at a Mellon Foundation-sponsored conference in 1997) that “the costs of selecting, converting, and making digital information available can be staggering.” Since it is so horribly expensive, she believes that the only way libraries will be able to pay for it is if “digital collections can alleviate 2the need to support full traditional libraries at the local level.” Therefore, over the past decade, in its various grant-funded scanning projects, Cornell University has snarfed its way through a banquet of old material, employing the language of earnest preservationism whenever it was expedient. (The books are “deteriorating,” “rapidly self-destructing,” 3etc.) They have disbound a collection of what are in some cases extremely rare math books 6of a century ago (and printed up germ-free facsimiles 7of them on a Xerox DocuTech printer), and books on Peruvian guano 8and butterflies and forestry. The paper facsimiles are a way of easing the transition to the digital library: “Conceivably, this may at some point allow librarians to propose other service alternatives as a substitute for traditional shelf storage,” says a footnote to the report — meaning that to use a book, you would have it printed out on demand or you would read it on-screen. Of course, if the book had to be printed out, there was an opportunity to generate a little revenue, too: “There may also be opportunities 9to underwrite some of the costs of preservation through the sale of facsimile editions.”
A few years later, Cornell got Mellon money to scan original runs of nineteenth-century American magazines like Scribner’s, Scientific American, Harper’s, and Atlantic Monthly. A wonderful nineteenth-century monthly magazine, replete with many hundreds of engravings, called The Manufacturer and Builder (already microfilmed in 1989 by the Northeast Document Conservation Center), was unmade by Cornell as one of its contributions to the digital Making of America project. (The Making of America was conceived by Stuart Lynn and others at Cornell in part to alleviate the problem of the “escalating cost of storage 10and the lack of adjacent building space.”) Ah, but it’s searchable, you may say, and neither the microfilm nor the original issues are: it’s worth destroying an illustrated run of The Manufacturer and Builder to get a fully searchable copy of it up on the Web. Yes, it is searchable, but because the type of the original is small and the resolution of the scanning is only six hundred dots per inch, the image-processing software doesn’t have enough information to chew on. As a result, while the images Cornell offers are legible, the OCR text available for your searches sometimes speaks in a language entirely its own. Here, for example, is Cornell’s searchable text of the beginning of an 1883 article about a subterranean convulsion 11in Java:
As intimated in our editomial remnarks last month, the gm-eat suibtermanean convumision imi, Java gmoxvs mom-c appalling as time facts relating to it become better kumouvum, antI time meal magnitude of time tlisturbance can be mucasumably compiehended.
“It xviii be unnecessamy,” the writer continues, “for us to enter into further details of time catastrophic, save to remark that all we mepouted of time changes in time configumation of time hind lmadi time suiruounding ocemun bottom, has beemi comifimmed, amid much mom-c extensive changes noticed.” An explanatory note about viewing the plain text derived from Making of America page-images says that “OCR accuracy is high 12but varies from page-to-page depending on a number of variables”; the note blames the errors on the “brittle, faded, and foxed” originals, saying that proofreading “would be very expensive and time consuming.” A production note preceding the OCR text says that Cornell did the work in order “to preserve the informational content 13of the deteriorated original.” The “best available copy” of the original was used, of course. The originals were disbound, we learn elsewhere, “due to the brittle nature 14of many of the items.” I asked Anne Kenney how the library determined brittleness; she said they used the double-fold test. “I’m not as wedded to retaining, at each site, the original sources as some may be,” she said.
It’s extremely kind of Cornell’s librarians to put these images on the Web, and one can’t blame them for the untutoredness of their OCR software (which despite its sometimes garbled output unquestionably helps researchers in their truffle hunting 15), but it’s truly a shame, after the decades of havoc wrought by microfilming, that pages bearing such a wealth of engravings are once again needlessly dying to feed the sausage factory. The faculty and students of Cornell were not asked whether they wanted valuable runs of nineteenth-century magazines sacrificed for this experiment. “These things were never aired out in a public forum,” says Joel Silbey, a Cornell historian who served on an advisory board to the library. “I was stunned when I first heard that they would have to disembowel the things.” He began to “express consternation.” The response of Anne Kenney and her colleagues was, according to Silbey, that “this was the only way it could be done and that it had to be done or we would lose things.” Which is a curious rationale, since the intellectual content of The Manufacturer and Builder, along with many other Making of America titles, was already backed up on microfilm when Cornell began their work — the emergency last-ditch “rescue” of this supposedly at-risk title had already happened. After some discussion, several of the disbound runs were sent to Rare Books, where they are or will be boxed. It was too late for The Manufacturer and Builder, though.
So the machine-induced loss begins all over again. But it can be stopped: there is no reason why one medium must mandatorily stab another one in the back. John Warnock, head of Adobe Software and a book collector of catholic tastes and deep pockets, discovered that he could create extremely fine-grained, full-color electronic copies of his own antiquarian books, using an overhead camera with a four-by-five-inch digital camera-back, without doing anything to them more injurious than turning their pages. He founded Octavo Corporation, which has published searchable facsimiles of early editions of Robert Hooke, William Harvey, Franklin, Galileo, Newton, and Copernicus; Octavo recently finished photographing one of the Folger Library’s First Folio editions of Shakespeare. It takes several minutes for the array of sensors in the camera to process the detail in one double-page spread, each of which consumes one hundred and forty megabytes of storage; the resulting scans have a serene luminosity and depth of detail. I described to Warnock the Cornell project of digitizing and throwing away rare nineteenth-century math books and replacing them with black-and-white printouts at six hundred dots per inch. “I have no sympathy with that, I’m afraid,” Warnock said.
Читать дальше