The Universe at Your Fingertips
URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed.


Second Thoughts About Web Publishing

Stephen Sottong
Science Librarian
Payson Library
Pepperdine University


Copyright 1997, Stephen Sottong. Used with permission.

Abstract

Although web publishing is increasing at a great rate, there are certain disadvantages which should be taken into account before the decision is made to either publish on the web or subscribe to web based publications. The disadvantages fall into three basic categories: hardware, software and archival.

Hardware problems include limitations in display technology, and speed and loading of the World Wide Web network and local networks over time.

Software problems include the lack of an open, accepted standard for publication; and the limited lifespan of software.

Archival problems include the short-term nature of the electronic publishing industry and its associated companies and the limited lifespan of archival media.

I will conclude that electronic publishing may be useful for certain limited applications, but that, for the present, material which is needed over time should be archived on paper.

Discussion

In the rush to move into the brave new world of Web publishing, very little consideration has been given to the possible, technical problems that could delay or damage acceptance of the media for mainstream publication. My view of Web publishing is uncon ventional since I was an Electrical Engineer who worked in the field of digital communications before my change of career to librarianship. In this talk, I will describe, in non-engineering terms, some of the technical difficulties with Web publication. With this information, you can make a better informed decision about whether electronic publishing is correct for you or your institution.

Let me start by saying that I am not a Luddite. I write Web pages for my Library and use the Web on a daily basis. There is a definite place for the Web in an overall publication mix, but its use should be tempered by an understanding of its inherent st rengths and weaknesses. You will hear about the strengths from most of the other presenters; I will talk about the technical weaknesses. These weaknesses fall into three categories: hardware, software and archiving.

In the first category, the foremost hardware problem is display technology. Most of you would not seriously consider reading a novel from your computer. The chief reason for this is eyestrain. In a Lou Harris poll, "computer related eyestrain" was call ed the number one office related health complaint. The National Institute of Health and Safety said that 88 percent of people in the U.S. who work at PCs for more than 3 hours a day, suffer from eyestrain. This amounts to 58 million people. The reasons for this eyestrain are related to the structure of the ubiquitous Video Display Tubes that each of us spend much of our day using. The maximum resolution of the best video displays is only about 100 dots per inch. A standard 14 inch display running in standard VGA mode is displaying at only 60 dots per inch. By contrast, the cheapest laser printer has a resolution of 300 dots per inch. This is not 3 times the maximum resolution of a video display, but 9 times since it is 3 time the resolution horizon tally and 3 time the resolution vertically.

To contrast further, books are generally printed at a resolution of 1,200 dots per inch or 144 times the resolution of a video display. The difference between photographs and video images is even more startling since photographic film has a resolution of approximately 4,500 dots per inch or 200,000 times the resolution of video. This increased resolution results in finer lines and greater detail. In addition to the problem of resolution, the contrast (the ratio of dark to light) can be as much as 10 ti mes greater on a book than on a video screen.

These problems are caused by the basic design of a video tube. Please follow along using the diagram in the handout. Briefly, a video tube has an electron gun at the back that shoots a stream of electrons toward the screen. At the sides of the tube are electromagnets that move the electron beam back and forth and up and down. On the screen are a series of dots or stripes of phosphorescent material that glow when struck by the electron beam. The number of electrons shot by the electron gun varies with the intensity of the color you want to display.

To increase the resolution of this tube to, say 300 dots per inch so that it equals the resolution of an inexpensive laser printer, you must first reduce the size of the phosphorescent dots -- this is not difficult. But, then you must decrease the size o f the electron beam so that it still only hits one dot at a time -- this is difficult because electrons tend to repel each other which causes the beam to spread. So the electron gun must be made to finer tolerances. The electron gun must also vary the e lectron stream 9 times faster.

The magnets on the sides of the tube must work 3 times faster and 3 times more accurately (only 3 since one set handles the horizontal and a different set handles the vertical). The magnets would have to be built smaller and more precisely which would be expensive. In addition, aiming the beam must be very precise, so any external magnetic fields would cause more interference than it does with the present video display. External fields are generated by speakers, clocks, lights, or the computer itself. To overcome this interference the display would have to be heavily shielded from outside magnetic fields.

This does not mean that video display tubes with higher resolution could not be made, it simply means that it could not be made economically enough to be marketable.

Liquid crystal displays have similar problems. The best liquid crystal displays use a system called active matrix which requires fabricating transistors on the glass surface of the display. Every color dot in the display has its own set of transistors. To increase the display density 9 times would require approximately 9 times the number of transistors. Recently, the Department of Defense has contracted for an LCD display with 300 dot per inch resolution, but the price of such displays will be beyond the means of the average buyer for the foreseeable future.

A common problem to both kinds of displays is that as resolution increases, the amount of video memory also increases. To display the standard VGA resolution of 640 by 480 in true color mode requires almost 1 megabyte of memory. As mentioned earlier, VG A on a 14 inch monitor displays only 60 dots per inch. To raise this to 300 dots per inch would require 25 times the video memory or about 22 and one half megabytes of memory. Updating the video display, which is already slow with graphics intensive sof tware, would be even slower.

Overall, display technology has lagged significantly behind all other parts of the computer. Present display technologies are not capable of equaling the readability of print and no technology on the horizon offers any better hope. Some day, there will be a display as inexpensive and easy to read as a book, but that day is not in the foreseeable future. The timing of the technical breakthroughs needed to create such a display is unpredictable. We cannot design present day libraries around phantom techn ologies.

A second hardware problem lies in the nature of the Web itself. The World Wide Web is a network of networks. The Internet was originally designed as a high-speed, flexible pathway system that would provide reliable transmission of scientific information in wartime. The recent popularity of the Web has increased the load on this network. The LA Times in February reported that has tripled its internet every year for the past three years and a Forbes reports that MCI traffic has increased 50 times in the last three years. MCI is one of the major providers for the Internet backbone. As traffic on the net increases and begins to approach capacity, messages are routed less directly and delayed.

The very design of the Net may help to cause slowdowns. Routers (the hardware devices which decide what path a message takes on its circuitous journey along the net) have been blamed for periodic slowdowns of traffic on the network. A National Science F oundation study found that more than 90% of the routing updates are unnecessary. Routing updates are messages sent between routers for internal, Internet traffic control. In some cases, this amounts to 20 million update messages a day when only 10,000 w ere expected. This could amount to as much as 10 to 20 percent of network traffic. Cisco Systems which makes over 85% of routers has been attempting to diagnose the problem, but does not yet understand it. Ken Siegmann writes, that due to the Internet's "amorphous and disorganized nature ... Nobody really knows how much traffic is flowing on it nor where the problems are." The network has grown beyond the comprehension of its creators.

The recent America Online debacle with busy signals and outages is only the worst example of what could happen to users of the Net. The most likely scenario would be a gradual slowdown of network traffic with periods when traffic comes to a virtual stand still. If we rely on the Web as the major source of information for our libraries, we will be, essentially, out-of-business when the Web slows to a crawl.

The second technical problem with electronic publishing is software. In order to understand the problems with software, one must first understand the structure of digital documents. A digital document, whether it is transmitted over the Web or read from a disk is only a string of binary numbers. There is nothing inherent in this stream of numbers to identify their format, content or purpose. Whether those numbers are displayed as letters, numbers, a spreadsheet, a database or a picture is solely deter mined by the software that produced the file, the software reading the file and the operating systems software. When these pieces of software are compatible, the end product is exactly what the originator intended. Problems arise when there are software incompatibilities. Some examples will illustrate this. Older versions of software cannot read documents generated by later versions. When a document is generated on a Macintosh, it almost never look the same on a PC, even if the same variety of word p rocessor is used. Similarly, when converting a document from one word processor to another, the converted document is almost never identical to the original. Formatting is omitted or skewed with a loss of information from the document.

But software is a moving target as well as a movable feast. The only long-term standard for binary files is ASCII and it describes text only in Roman alphabets. UNICODE is emerging as a standard that will describe all character sets, but it will still n ot describe text formatting. SGML has been proposed as a page description language standard, but it is still in development and implementations of the standard vary. HTML can describe the basic format of a Web page as well as its links to other pages, b ut the exact appearance of the page will vary from Web Browser to Web Browser and from computer to computer. HTML is also a new and emerging standard that is being pulled in disparate ways by the two major players in the Web Browser field, Netscape and M icrosoft as we witnessed only two weeks ago in the controversy over differing methods of "Push" technology.

The volatility of software tends to make valuable, older electronic documents unreadable. To guard against such unplanned obsolescence, electronic documents must be periodically converted to the latest software at an expense of time, money and some loss of formatting or information. This is equivalent to periodically rebinding your entire collection.

This leads directly to the third technical problem, archiving. Because of the nature of digital documents and the volatility of software, archiving an electronic document requires more than simply preserving the binary data. As the Committee on Governme nt Operations said in its report to Congress: "Preservation of electronic information may require continued availability of computer software, operating systems, manuals and hardware, as well as various types of electronic storage media. Continual change s in computer hardware and software ... contribute to the complexity of long-term preservation."

The traditional method of preserving electronic documents is to store them on magnetic or optical media. The National Media Laboratory web site (the Web site is listed in the bibliography) shows the projected data lifespans for various media. Magnetic m edia have data lifespans from 10 to 40 years depending on storage conditions. Increased temperature and humidity shorten the data lifespan of these media. Even with good storage, NML recommends moving the data on tapes to new media every 10 years to avo id possible data loss. The added expense of this kind of preservation could be prohibitive, particularly with little used documents, and, indeed, preservation is not taking place. The government is a key culprit in this. The National Archives and Recor ds Administration reports that 80% of government agencies required to turn over data tapes for long-term storage have failed to do so. Some of the agencies are nearly 15 years overdue.

The magnitude of the problem is already staggering. The General Services Administration in a 1993 model program transferred 640 tapes of petroleum geology data to compact disk. This may seem hopeful, but this amounts to just 3.2% of the 20,000 petroleum data tapes. These tapes contain information that cost the taxpayers over $1 billion to collect. In an era of tightening budgets, preservation of thousands of valuable data tapes from agencies such as NASA and NOAA (the National Oceanographic and Atmosph eric Administration -- the nations weather bureau) are taking a back seat to more pressing, current needs and a wealth of data compiled at a cost of billions of dollars is decaying.

CD-ROMS were initially thought to be the panacea for long-term preservation of digital data, but this has not proven to be the case. CD-ROMS suffer from forms of deterioration such as depolymerization of the polycarbonate substrate of the disk. This mak es the disk brittle. Depolymerization is caused by moisture in the air which also causes the aluminum reflective layer of the disk to oxidize or delaminate.

Sulfur compounds such as those found in smog, can also attack CD-ROMS. In one case where CD-ROMs were accidentally packaged in high sulfur cardboard containers, the disks failed within months. Overall, the data lifespan of normal CD-ROMS is 10 to 30 yea rs and this assumes near ideal storage conditions with stable temperature and humidity and little or no handling of the disks. This is shorter than the useful lifespan of acid paper, which most libraries will no longer buy.

Certain, expensive CD's may have longer data lifespans, but this is irrelevant since media generally supplanted before they actually wear out. The time before a media becomes obsolete is estimated by an NML scientist to be "10 or twenty years (or less)". The successor to the CD-ROM may have already arrived in the DVD-ROM which can store up to seven times the data of a present CD-ROM. The new DVD-ROM drives will read today's CD's but not the re-writable CD's being used to transfer older data formats by preservation programs such as the GSA program mentioned earlier. The DVD_ROM shares the same technical problems mentioned earlier for the CD-ROM and is also more susceptible to data loss from scratches due to its higher data density.

The case of the 1960 Census is illustrative of how vital public documents can be jeopardized by format obsolescence. That census was archived on a proprietary magnetic tape format which became obsolete a few years after the census. About a decade later, when it was found that the tapes were deteriorating and it was necessary to transfer the data, there were only 2 machines in the world which could read them, one in the Smithsonian and one in Japan.

As Will Manley pointed out in the latest Library Journal, "It's a historical oddity that World War II is a better documented war than Vietnam. The reason is that Vietnam was a computer-era war, and many of its documents are stored on electro nic tapes that can be accessed only by digital equipment that no longer exists."

All of us have experienced obsolete storage media. Many of us still have programs and data on 5 1/4" disks which are now obsolete. To give a personal example, My wife stored her Master's thesis on an 8 inch floppy disk 13 years ago. Equipment is no lon ger available to read the disk. The hardware problem is further exacerbated by the WordStar software used to create the file. At the time the thesis was written, WordStar was the standard word processor but it is now defunct. Only the paper copies pres erve her work.

If we truly intend to move to electronic publishing, then some standards and methods for long-term archiving of digital sources must be established. The alternative is to archive electronic documents on paper since acid-free paper has a projected archiva l life of 500 years or longer.

A final technical problem with archiving harks back to the nature of a digital document, for "a bit is a bit is a bit". Digital documents do not age and wear gradually the way paper documents do. No matter how often they are used or how much they are mo dified or tampered with, they seem to the reader utterly pristine. Unlike a paper document, tampering cannot be verified. This is especially important where, for proprietary or convenience reasons, electronic documents are kept on a centralized server. In such cases, "Official Copies" of a document can easily be modified to suit "Official" purposes. The specter of this "1984" scenario is possible, if we allow it to happen.

The vision of the library of the future is an institution where any patron, anywhere can find any work from any era. This is a noble vision and electronic publishing may be a step toward it. But, if the mission of libraries is, as Crawford and Gorman pu t it: to "...serve their users and preserve the culture by acquiring, listing, making available, and conserving the records of humankind..", then we are remiss in our duties when we collect in media which cannot be adequately preserved and disseminated t o our patrons today and in the future. Where information and knowledge are of lasting value, the best means for transmitting them into the future is still print on paper. When the technical problems I have outlined here are solved, then we can begin ele ctronic publication in earnest and move toward the vision. Until then, we must preserve knowledge in the best medium available.

Selected Bibliography

Books

Crawford, Walt and Gorman, Michael. Future Libraries: Dreams Madness & Reality. Chicago: American Library Association, 1995.

Nikles, David E. and Forbes, Charles E. "Accelerated aging studies for polycarbonate optical disk substrates," IN Optical Data Storage '91: Proceedings of the SPIE Volume 1499, edited by James J. Burke, et. al., 39-41. Bellingham, WA: SPIE, 1991.

Stoll, Clifford. Silicon Snake Oil. New York: Anchor Books, 1995.

Journal & Newspapers

Atencio, Rosemarie. "Eyestrain: the number one complaint of computer users," Computers in Libraries, September 1996, 40-43.

Berinstein, Paula. "Images in your future: The missing picture in an online search," Online, 21 (Jan/Feb 1997): 38.

Cloonan, Michele Valerie. "The preservation of knowledge," Library Trends, 41(22 March): 594.

Day, Rebecca. "Pixel Puzzle: building the infrastructure for digital photography," Electronic Engineering Times, 1 February 1997, 52.

Fox, Barry. "Can CD companies stop the rot?", New Scientist, 4 December 1993, 19.

G.F. "Survey finds health concerns among VDT workers at LC," American Libraries 22 (January 1991): 15.

Harmon, Amy. "Why the French Hate the Internet," The Los Angeles Times, 27 January 1997, A1.

Heger, Kyle. "Print: a road kill on the information superhighway?", Communication World 11(October 1994): 30.

Hotz, Robert Lee. "Fragile virtual libraries," The Los Angeles Times, 8 October 1995, Record edition, sec. A, 1.

Lange, Larry. "Cisco acknowledges, fixes one problem -- Study: router flaws jamming Net traffic," EE Times, 6 January 1997, sec. News.

Leek, Matthew R. "Will a Good Disc Last Forever," CD-ROM Professional, 8 (November 1995): 102.

Manley, Will. "Guerrilla Librarians," American Libraries 28 (April 1997): 192.

McCarthy, Shawn P. "NARA official: most agencies are late with archiving," Government Computer News, 14 September 1992, 98.

Metcalfe, Bob. "From the Ether" (recurring column), Info World, 30 September 1996, 11 November 1996, 18 November 1996, 25 November 1996, 3 February 1997.

Mulligan, Thomas S. "The Internet Backbone," The Los Angeles Times, 3 February 1997, D1.

Olsen, Florence. "Scientific legacy facing ruin as old mag tapes deteriorate," Government Computer News, 6 December 1993, 1.

Parker, Dana J. "CD-Recordable: the large, the small and the printable," CD-ROM Professional, 9 (July 1996): 86-99.

Schofield, Julie Anne. "DARPA awards contract for battlefield LCDs," Design News, 20 January 1997, 16.

Seymour, Jim. "Your eyes come first: preventing eyestrain and other problems caused by monitors," PC Magazine, 24 October 1995, 93. Sprout, Alison L. "Waiting to download," Fortune, 5 August 1996, 64.

Report

Taking a Byte out of History: The Archival Preservation of Federal Computer Records. By John Conyers, Jr., chairman Committee on Government Operations, 6 November 1990, Hrpt 101-978.

Internet Sources

B.R. "Could it be Routers Causing Congestion?", LAN Times, 3 February 1997. [{http://www.lantimes.com/97/97feb/702a016b.html}].

Caruso, Jeff and Rogers, Amy. "Bogus Messages Snarling the 'Net," Communications Week Interactive, 17 January 1997, ({http://techweb.cmp.com/cw/cwi/netnews/011397/news0117-4.html}).

Gilder, George. "Feasting on the giant peach," Forbes ASAP, 26 August 1996, ({http://www.forbes.com/asap/gilder/telecosm19.html}).

Siegmann, Ken. "Is the Internet about to Crash?", thesite, 7 February 1997, ({http://www.thesite.com/0297/w2/work/work395_020797.html}).

National Media Laboratory documents are available at: {http://www.nml.org/}


HTML 3.2 Checked!