A: Centigrade 232

July 6, 2017 | Autor: Raymond Dessy | Categoria: Chemical Engineering, Analytical Chemistry
Share Embed


Descrição do Produto

A/C

WebWorks

Centigrade 232 Books have been burned for many reasons, for often no one defends libraries—not in Alexandria or Berlin, nor in China under Chin Shih Huang Ti, nor when Spaniards sacked Tenochtitlan. Ray Bradbury's classic science fiction novel, Fahrenheit 451, portrays a society in which books lead to discontent, and society's firemen light bonfires with them. The WWW has ignited its own tempest over digital libraries. Rising costs of scientific journals, publishers' panic over being left behind, and desires for some of the money are accelerants for a. conflagration that is changing libraries and even the laboratory. Proponents of the blaze promise the end of a chaotic age and trumpet a new epoch of online information all retrievable by intelligent agents The arguments for digital libraries are many. Institutional libraries, facing limited budgets and shelf space, are canceling serials and delaying book purchases. As a result, prices increase and subscriber numbers drop, creating a vicious circle. In response, universities and publishers all hungrily eye electronic publishing—and with reason. The percentage of articles published in 1984 that have not been cited since is stunning: science —25%, social sciences —50%, humanities —90% (i). Electronic publishing may be cheaper, but sadly, most efforts don't incorporate available facilitating features. This column provides to demonstration sites that may represent the future for digital libraries

www.hti.umich.edu/, http://etextlib.virginia. edu/modeng/modengO.browse.html. When will chemistry similarly flood the Internet? Dissertations are already there (http://scholar .lib.vt.edu/theses/theses. html), although not all agree that instant, open availability is correct. Some scientists worry that their best ideas, not yet fully developed, will drift too early into other hands. Others enjoy the accelerated pace of discovery that rapid distribution promotes (http://www.ijc.com/). Some print publishers, including Analytical Chemistry, will not accept papers based on author's research that has previously been posted on the WWW. Online journals, parallel with their printed antecedents are common; although their pricing strategies are often opaque Where could it all lead? Current efforts

Storing items in a digital library is technically easy. Searching is not. Traditional query strategies are often frustrating. For example, CyberStacks, a collection of Internet resources has been categorized by the Library of Congress classification scheme (http://www.public.iastate.edu/ -CYBERSTACKS/). The problem with assigning documents to single categories within a hierarchy is that most documents discuss several different topics simultaneously. A different approach, called "text clustering", uses the Scatter/Gather interface to oroup pocuments according to overall similarities in content. Each cluster is represented by a list of topical words that attempts to convey Classic approaches A pioneering example of what digital librar- what the documents in the cluster are ies can do is Project Gutenberg, a volunteer about. Users can scatter documents into effort typifying the WWW at its best (http:// clusters, or groups, then gather subsets of www.gutenberg.net/). Gutenberg offers full- these groups and rescatter them to form more relevant subgroups. Search for "star" text versions of literature in the public dousing your favorite browser and be overmain. Many exemplary sites now offer keywhelmed and then examine an approach word proximity searching, hypertext, developed to handle large distributed coladvanced Boolean strategies, and mirror sites: http://www.bibliomania.com/, http:// lections and compound documents (http://

www.parc.xerox.com/istl/projects/ia/ sg-overview.html). When users enter a query, the search engine usually brings back many different pages. How can you identify the best returned hits? One solution is a graphical or symbolic spatial presentation relating the query to actual words in the document. TileBars ii sn examplle :ttp://www.parc. xerox.com/istl/projects/ia/tb-overview. html; http://elib.cs.berkeley.edu/tilebars (request top three hits, wetlands+chemical pollutants). TileBar documents are represented as long, thin rectangles, proportional to the source length. Each document is partitioned into a set of multiparagraph segments, or tiles, using a TextTiling aagorithm. The query is specified by a list of topics, which are individually applied to a copy of the tiled document. Each topic is written as a list of synonyms or related words, constituting a term set. For example, a simple analytical chemistry query might consist of topics such as "calcium ICP, and matrix". The top row of each TileBar document diagram corresponds to hits for thefirstterm set (e.g. calcium) and

TileBar matches between a query and returned documents are shown graphically and spatially by the TileBar patterns shown. Dark, vertical alignments mean a good match. (Adapted with permission from Marti Hearst, Ret. 2.)

Analytical Chemistry News & Features, March 1, 1998 2 0 9 A

A/C

WebWorks

the next rows to the other term sets. The first column of each diagram corresponds to the first segment of the document, the second column to the second segment, and so on. The darker the segment or tile, the more frequently the query term occurs in that part of the document. Dark and light patterns quickly show the relationship between the documents and the query terms in both proximity and text-density space, and also where the terms are located in the document. Clicking on a darkened area retrieves the full-text document segment with all the term set words highlighted in different colors. Images and voice Retrieving black-and-white or colored images from large and varied collections with the object's content as the search key is a challenging, important problem. Chemical examples include surface-analysis images, false-color 2-D representations, or plasma profiling. One approach uses an image representation that segments the raw pixel data into a small set of regions coherent in color and texture space. This is "blob world" (Binary Large OBjects). Blobs are used for stills, videos, and sound. Try searching for all the striped objects at http:// http.cs.berkeley.edu/ ~ carson/blobworld/ index.html. IBM's Query by Image Content system also lets you search large image databases based on visual image properties such as color percentages, color layout, and textures. You match these attributes using an image as the query, not words. However, contentbased image queries can be combined with text and keyword predicates. Online demos and free short-term downloads are available. Search trademark images at http:// wwwqbic.almaden.ibm.com/; then, take a sidetrip to patent searches at http://patent. womplex.ibm.com/. Other demo sites are Alexandria Digital Library (http://alexandria.sdc.ucsb.edu/ framesl.html), WebSeek (http://www. ctr.columbia.edu/webseek/ and http:// www.ctr.columbia.edu/videoq/), and Excaliber (http://www.excalib.com/products/ vrw/vrw.html). These initiatives provide content-based image and video search/ cataloging tools that variously use natural language understanding, image processing, speech recognition, and video compression. Some can retrieve short video "clips" in response to queries. Video image and voice are now an archival searchable medium. 210 A

Languages and dialects People will always use different languages. Human cross-language information retrieval resources are in their infancy, but French, German, Spanish, Japanese, and English examples exist: http://crl.nmsu.edu/users/madavis/ mundial.html, http://siing.navi.ntt.co.jp/titan/iitan-e. html, http://www.eurospider.ch/eurospider/ DemoPage.html, http://www.globalink.com. "Inductively coupled plasma" searches become "induction acoplamiento acoplar plasma" queries; others become "spectrometrie de masse" or "magnetische resonanzspektroskopie". As digital libraries grow, computer metadata dialects become an important issue. Metadata are bibliographic-like infor-

Will salamanderbadged computers of Centigrade 232 eliminate the traditional scientific journal and book? mation, representing title, author, subject, resource type, resource identifier, format, language, relation, coverage, and rightsmanagement information. The first three items are common card catalog items; the rest are essential for electronic searches. Each item must be rigorously specified. One standard, the Dublin Core, is described at http://purl.oclc.org/metadata/ dublin_core/. Achieving worldwide agreement on metadata standards is difficult. Sources and concerns How can a chemist keep up with the field? Entry into many of the existing electronic journals is via the Colorado Alliance ofResearch Libraries, http://www.coalllance. org/ejournal/. The Digital Library Federation is at http://lcweb.loc.gov/loc/ndlf/. D-Lib Magazine offers monthly stories, commentary, and briefings: http://www. dlib.org/. United Kingdom's digital library sites are found at http://ukoln.bath.ac.uk/

Analytical Chemistry News & Features, March 1, 1998

elib. A seminal view of Digital Libraries is at http://community.bellcore.com/lesk/ diglib.html (1). For those concerned about copyright issues, a good source is http://s9000. furman.edu/DD/book/chap2/essay/ propl.html. Fanning the flames of worry in this area are recent governmental steps toward new protections for electronic database owners, which are being promoted by politicians, publishers, and lawyers. Just when the means to easily share data is available, it may instead become more difficult. Will libraries be able to continue sharing via interlibrary loan, or will "first-sale" rights change? Will current "fair-use" practices be curtailed? These issues deal with the vendor's ability to control the first sales of an object, but not necessarily subsequent sales, and with the rights of reasonable users to reproduce material for research and scholarly activity. Increasingly, stringent contracts are also replacing copyright practice. Unfortunately, our professional needs, technology and the law all beat to a different rhythm. The future Will salamander-badged computers of Centigrade 232 eliminate the traditional scientific journal and book? At EDUCOM '97, a keynote talk was entitled "Death of the Book". It predicted that "scholars would cut their grieving short to read the latest journal articles online, to become the publishers of their own papers and books... and, that parallel tracks of images, sounds, and text will be packed as tight as possible, speeding the flowing of information to the human brain." Catchy book titles, such as Death of the Author and Future of the Book, proliferate. Hype fills the air. The quandary lurks in the question: "If we haven't standardized book or reference formats in 500 years what is the expectation that such efforts with search engines document standards and required legal changes will be any more successful?" Raymond E. Dessy, Virginia Tech (Comments, e-mail, and forums are invited at http://www. ckem. vt. edu/chem-dept/ dessy/internet/J Thanks to Ed Fox, Virginia Tech. (1) Lesk, M. Practical Digital Libraries, Morgan Kaufmann: San Francisco, 1997. (2) http://www.sims.berkeley.edu/~hearst/, http://www.sciam.com/0397issue/ 0397hearst.html, http://www.acm.org/ sigchi/chi95/Electronic/documnts/ papers/mah bdvhtm

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.