CHAPTER 34. Turn the Pages Once
“This is all about Pat Battin’s vision of a digital library,” Randy Silverman, Utah’s conservator, said to me: “One digital library fits all.” Microfilm is really only a passing spasm, a “cost-effective buffer technology” 1(as one of the newsletters of the Commission on Preservation and Access has it) that will carry us closer to the far digital shore. Silverman sums up the Commission’s thinking as follows: “How to fund technology in a time of barely increasing acquisition funds? And even if all the scholars have monitors, it’s like having a color TV in 1964. What are you going to watch? Somehow we’ve got to get some goods online. Preservation was a natural cause to help justify the conversion to an international electronic library. Battin played it for all it was worth.” If you unwrapped three million word-mummies — if you mined them from the stacks, shredded them, and cooked their brittle bookstock with the help of steady disaster-relief money — you could pump the borderless bitstream full of rich new content.
In 1994, shortly before Battin retired from the Commission, her newsletter published a special “Working Paper on the Future.” Its authorship is credited to the Commission’s board and staff, but it reads like one of her own heartfelt manifestos. “The next step for our nation’s libraries and archives is an affordable and orderly transition into the digital library of the future,” the working paper contends. All “information repositories,” large and small, private and public, “must” make this transition. (Why must they all?) We’re going to need continuing propaganda for it to be successful, too: “Changing a well-entrenched paradigm requires frequent and public articulation of the new mind set required in many arenas.” And, yes, there will be “high initial costs” involved in making the transition; these may threaten to “paralyze initiative,” unless we make a “thoughtful” comparison to what is described as “the rapidly escalating costs of traditional library storage and services.”
Doesn’t that sound a lot like Michael Lesk — once of Bellcore, now at the National Science Foundation handing out bags of seed money for digital library projects, who believes that libraries would save money if they got rid of the vast majority of their nineteenth-century duplicates in favor of middling-resolution networked facsimiles, and who says he routinely tells libraries that they might not want to repair their buildings, since they could digidump most of what their stacks held instead?
One reason Battin sounds like Lesk is that she worked closely with him for several years; she invited him to serve, along with a number of other resolute anti-artifactualists, on what has proved to be her most consequential committee: the Technology Assessment Advisory Committee. 2Lesk wrote the TAAC committee’s first report, “Image Formats for Preservation and Access”: “Because microfilm to digital image conversion is going to be relatively straightforward,” Lesk mispredicted, “and the primary cost of either microfilming or digital scanning is in selecting the book, handling it, and turning the pages, librarians should use either method as they can manage, expecting to convert to digital form over the next decade.” He and another non-librarian, Stuart Lynn — at the time Cornell’s vice president for information technologies (who retired in 1999 from the chief information officership of the University of California, where he kept an eye on digital-library projects partially funded by Michael Lesk’s National Science Foundation) — took the position in the advisory committee’s discussions that the costs of digital conversion and storage had dropped to the point (Verner Clapp’s long-dreamed-of point) that it was almost as cheap to scan-and-discard as to build.
Michael Lesk is uncharismatic and plodding; Stuart Lynn, however, is an Oxford-educated mathematician with a measure of brusque charm. He and his onetime colleague Anne Kenney (currently Cornell’s assistant director of preservation) became, with the financial support of Battin’s Commission, the Mellon Foundation (always ready to help), and Xerox Corporation, the progenitors of some of the most successful digital-library projects of the nineties. Stuart Lynn believed in an economic model in which digital preservation would be, as he told me, “self-funding.” If you were able to “funge immortality dollars into operating dollars”—that is, if you assumed a certain (fairly high) per-item cost for physical book storage, and if you ejected the original books once you digitized them and relied on virtual storage, and if you sold facsimiles-on-demand produced on a Xerox DocuTech high-speed printer (Lynn was serving on an advisory panel at Xerox at the time), you would, so his hopeful model suggested, come out more or less even — and you’d have all the emoluments of networked access.
But digital storage, with its eternally morphing and data-orphaning formats, was not then and is not now an accepted archival-storage medium. A true archive must be able to tolerate years of relative inattention; scanned copies of little-used books, however, demand constant refreshment, software-revision-upgrading, and new machinery, the long-term costs of which are unknowable but high. The relatively simple substitution 3of electronic databases for paper card catalogs, and the yearly maintenance of these databases, has very nearly blown the head gaskets of many libraries. They have smiled bravely through their pain, while hewing madly away at staffing and book-buying budgets behind the scenes; and there is still greater pain to come. Since an average book, whose description in an online catalog takes up less than a page’s worth of text, is about two hundred pages long, a fully digitized library collection requires a live data-swamp roughly two hundred times the size of its online catalog. And that’s just for an old-fashioned full-text ASCII digital library — not one that captures the appearance of the original typeset pages. If you want to see those old pages as scanned images, the storage and transmission requirements are going to be, say, twenty-five times higher than that of plain ASCII text — Lesk says it’s a hundred times higher, but let’s assume advances in compression and the economies of shared effort — which means that the overhead cost of a digital library that delivers the look (if not the feel) of former pages at medium resolution is going to run about five thousand times the overhead of the digital catalog. If your library spends three hundred thousand dollars per year to maintain its online catalog, it will have to come up with $1.5 billion a year to maintain copies of those books on its servers in the form of remotely accessible scanned files. If you want color scans, as people increasingly do, because they feel more attuned to the surrogate when they can see the particular creamy hue of the paper or the brown tint of the ink, it’ll cost you a few billion more than that. These figures are very loose and undoubtedly wrong — but the truth is that nobody has ever underestimated the cost of any computer project, and the costs will be yodelingly high in any case. “Our biggest misjudgment was 4underestimating the cost of automation,” William Welsh told an interviewer in 1984. “Way back when a consultant predicted the cost of an automated systems approach, we thought it was beyond our means. Later, we went ahead, not realizing that even the first cost predictions were greatly underestimated. The costs of software and maintenance just explode the totals.”
Things that cost a lot, year after year, are subject, during lean decades, to deferred maintenance or outright abandonment. If you put some books and papers in a locked storage closet and come back fifteen years later, the documents will be readable without the typesetting systems and printing presses and binding machines that produced them; if you lock up computer media for the same interval (some once-standard eight-inch floppy disks from the mid-eighties, say), the documents they hold will be extremely difficult to reconstitute. We will certainly get more adept at long-term data storage, but even so, a collection of live book-facsimiles on a computer network is like a family of elephants at a zoo: if the zoo runs out of money for hay and bananas, for vets and dung-trucks, the elephants will sicken and die.
Читать дальше