A Library-Based Computer System for Indexing, Storing and Retrieving Literature in the Cardiovascular Field, 1964
Page  1

A LIBRARY-BASED COMPUTER SYSTEM FOR INDEXING, STORING AND RETRIEVING LITERATURE IN THE CARDIOVASCULAR FIELD Martin M. Cummings National Library of Medicine, Bethesda, Md. Medical bibliography has been a valuable tool of the physician and scientist since the sixteenth century. William Harvey's classical publication of 1628 "De Motu Cordis"[1] marks the beginning of cardiovascular physiology, and contains no less than 79 references to previous work on anatomy and physiology of the heart. These include 30 citations of Galen, 16 of Aristotle and one of Hippocrates. [1. HARVEY, WILLIAM. 1963. The Circulation of the Blood and Other Writings. Trans. by Kenneth J. Franklin. : 1-236. Dent. London, England.] Today, scientific publications contain an average of nearly 14 references per article,[2] many of the citations referring to the author's own previous work. In part, this reflects a diminution of scholarship among contemporary scientists; in large measure it results from the "information overload" which has made it impossible for the working scientists to keep informed of published findings now being reported annually in more than 50,000 scientific journals published in forty languages throughout the world. This does not include the vast number of books and monographs which are also printed each year. [2. GARFIELD, EUGENE & I. H. SHER. 1963. New factors in the evaluation of scientific literature through citation indexing. Am. Documentation 14: 195-201.] In seeking to contend with this problem, the National Library of Medicine has established MEDLARS, an acronym for (Medical Literature Analysis and Retrieval System) which represents an advanced development in the storage and retrieval of scientific information through the use of computer processes. In order to understand this system, its power and its limitations, it is necessary to have a perspective of the needs and circumstances surrounding its development. Previous Work of NLM For many years medical libraries have had the responsibility of aiding scientists, teachers, practitioners and students by classifying the literature into specialized subject areas, by answering specific reference questions, by providing lists of current literature and by making the documents themselves available for study. Beginning with the farsighted work of John Shaw Billings 100 years ago, the National Library of Medicine has been carrying out these services in all disciplines of medicine and biology. Billings began the publication of the monthly Index Medicus 85 years ago in 1879 at a time when medicine and biology were still attempting to determine their own dimensions. This was the time when the production of scientific printed materials in these

Page  2 fields was relatively quite small, when the collection of the entire Library, at that time under the U. S. Army, probably did not exceed 100,000 pieces. At the turn of the century the quantity of the medical and biological literature had begun increasing at a pronounced rate. The literature had also become more complex with more sophisticated studies of disease, the continued development of new disciplines and the splintering of others, and research achievements which stimulated scientists to seek continually new approaches to the problems of man's health. By the 1920's, the Library of the Surgeon General's Office was collecting medical literature on a worldwide basis. It was seeking, with methods developed in the previous century, to process this material for effective use by the medical and biological sciences. Even at that point in history it was becoming apparent that library systems would be swamped by the quantity of the literature and that new methods for effectively indexing these materials would be needed. Library methods were polarized in terms of the disciplinary structure of medicine and biology and, understandably perhaps, there was no indication of the vast changes which were to be developed in the field of medicine in subsequent years. The impact of medical research after World War II dramatically further increased the quantity of the literature, the demand for scientific information and the means of gaining access to the literature. By 1955, indexes to the medical literature were suffocating in backlogs. Upwards of 14,000 journals were being published in the field of biomedicine alone. Use of Computers The successful use of computers for data processing suggested that with careful system design the computer might also serve as a reference storage and retrieval device. In 1961, after careful study, the following objectives were established for MEDLARS:[3] 1. "Improve the quality of and enlarge Index Medicus and at the same time reduce the time required to prepare the monthly edition for printing from 22 to 5 working days. 2. "Make possible the production of other compilations similar to Index Medicus in form and content. 3. "Make possible for Index Medicus and other compilations, the inclusion of citations derived from other sources, as well as from journal articles. 4. "Make possible the prompt (a maximum of two days) and efficient servicing of requests for special bibliographies, on both a demand and a recurring basis, regularly searching up to five years of stored computer files. 5. "Increase the average depth of indexing per article by a factor of five, i.e., 10 headings versus 2. 6. "Nearly double the number of articles that may be handled annually-from 140,000 now to 250,000 in 1969.

Page  3 7. "Reduce the need for duplicative total literature screening operations."[3. DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE. 1963. U. S. Public Health Service, National Library of Medicine. The MEDLARS Story. : 1-74. Government Printing Office. Washington, D. C.] For the most part, the National Library of Medicine has achieved these objectives, and at this point, after ten months of operational experience, we might recast our principal MEDLARS objectives as follows: Provide rapid dissemination of lists of current publications in the medical field including the Index Medicus and other regular recurring bibliographies in more specialized subject areas such as cancer and cardiovascular disease. Provide bibliographic control of a wide segment of the periodical literature, available for rapid retrieval in response to subject-oriented queries of the computer files-demand bibliographies. Provide wide availability of the MEDLARS data base to other medical research institutions which can duplicate the retrieval capability, and, what is equally important, make more specialized use of the contents of the file in their own reseach programs. MEDLARS Equipment The equipment used by MEDLARS includes thirteen paper-tape typewriters (Friden Flexowriters) for preparation of computer input; a Honeywell-800 computer for editing, sorting, compressing, merging, storing and formatting data for subsequent printing; and a special optical printer called GRACE (Graphic Arts Composing Equipment) used to convert computer output into high-quality photocopy for publication purposes. MEDLARS is subdivided into three component parts: an Input Subsystem, a Retrieval Subsystem, and a Publication Subsystem. The Input Subsystem joins the scientific and linguistic talents of trained literature analysts to the tremendous processing capabilities of the computer. Medical journals are forwarded to the Index Unit where professional indexers classify the subject content of each article in the journals by assigning subject headings from the Library's controlled list of terms called MeSH (Medical Subject Headings). The indexers translate titles of foreign language papers and transliterate thosein non-Latin alphabets. The indexer data sheets are processed by Flexowriter operators who prepare a paper tape record for computer input. This basic unit record includes the article's title, author names, journal reference, and subject headings assigned by the indexer. After verification of the Flexowriter hardcopy, corrected tapes are batched and spliced for entry into the computer. The computer input programs are run once a day. These programs edit the input extensively, reject improperly prepared unit records, and build the two major data files-the Compressed Citation File (CC F) which is used in the Retrieval Subsystem, and the processed Citation File (PCF) used in the Publication Subsystem. Currently, about 150,000 papers from 2,300 medical journal titles are processed annually and added to the computer

Page  4 files. More than half of the articles are in foreign journals requiring a massive translation effort. The Retrieval Subsystem is initiated by receipt of a request for a demand bibliography from a medical researcher or practitioner. Such requests are forwarded to a staff of search specialists who have had extensive training both in indexing and the logic of machine retrieval. The searcher formulates the request into logical statements intelligible to the computer system. Search parameters may include subject headings, specific journal titles, specific languages, author names, year of publication, and computer entry data. Formulated search requests are punched into paper tape, proofread, and batched for computer processing. The Demand Search computer programs are designed to match a group of search questions against every record in the Compressed Citation File. The demand bibliographies which result from this search process are printed in any one of a variety of output formats by means of report generator programs. Demand bibliographies are normally printed on the computer's high speed printer. The Publication Subsystem is concerned with preparation of periodic indexes to current literature. Each working day punched cards are entered into the computer telling which recurring bibliographies (or the Index Medicus) are to be compiled on that particular day. The computer selects the appropriate citations from the Processed Citation File, performs a rather complicated task of page composition, and prepares a magnetic tape file of print records for the computer driven phototypesetter (GRACE), an acronym for Graphic Arts Composing Equipment. Grace GRACE is a revolutionary computer driven typesetter which prints from a high-quality font of 226 characters (upper and lower case) onto positive photographic film or paper. Operating at a speed of approximately 300 characters per second, it represents the only system currently capable of delivering high-quality typography directly from a computer and at computerspeeds. GRACE converts digital information from magnetic tape to characters on photographic film. The exposed film is developed by an automatic film processor, inspected, cut into page-size sheets, and packaged for mailing to a printer. The resulting film masters are used directly for plate-making, printing, and binding of the final publication. GRACE composes Index Medicus in 16 hours, a task which would require 55 linotype operators. The output printing load is expected to increase from about 290 million characters in 1964 to 590 million in 1969. MEDLARS is entirely unique in several respects. It is the only system of this type operating in the medical field. It is also the only large-scale information retrieval project based in a research library, thus providing

Page  5 both bibliographic control and access to the documents themselves. The revolutionary computer driven printing device-GRACE-adds to the system's uniqueness. The problems of system engineering have been solved to produce an operational reality with an average of 700 new documents processed and entering the files each day. The total store of citations indexed is now 190,000. We expect that by 1969 the total will be close to one million. Importance of Library Base It should be noted that MEDLARS derives much of its power by virtue of being a part of the Library. NLM is not only the world's largest medical library, it is the world's largest research library in any single scientific and professional field. Thus MEDLARS has the benefit not only of the Library's collecting capacities but also of the high level of literature dissemination built up over the past 130 years. Its photocopy services to physicians and scientists has reached a level of three million pages per year. The next major development in MEDLARS will be its decentralization. From the beginning, the Library has realized that the system offered the opportunity of extending its retrieval capacities to other institutions in this country and abroad. This will be accomplished through the provision of duplicate tapes of the MEDLARS holdings. These tapes can be produced at low cost, and their use on compatible computer equipment will give universities, medical schools and other organizations the same capacity we have at NLM for searching the literature. NLM is entering into contracts this year with two universities so that reprogrammed tapes may be processed by noncompatible computers. Next year, NLM will be ready to implement the further decentralization of converted MEDLARS tapes to medical libraries which provide regional services. Using the same basic units of information, MEDLARS can handle recurring bibliographic and/or demand search requests related to a specific discipline or to an entire mission. The changing patterns of medicine and medical research give this power tremendous importance today and it is the aim of NLM to exploit it to the fullest. Cardiovascular Medicine There are about 150 journals published in the field of cardiovascular medicine and we acquire, and catalog, all of them. Of this total, only 40 are published in English and the other 110 in foreign languages. Fifty of the highest quality journals in this field are now being analyzed and indexed for MEDLARS. Thirty of these are published in English and 20 in other languages. I should like to give you a number of examples of how we are assisting research in cardiovascular medicine. Several months ago the National Heart Institute decided to consider the advisability of producing a recurring bib-

Page  6 liography on blood coagulation and fibrinolysis. The focus of the initial considerations was on the selection of appropriate subject headings drawn from the 6200 descriptors NLM uses for MEDLARS indexing. After several successful trial runs, 69 subject headings were chosen. The National Heart Institute has made the decision to proceed with the publication of this recurring bibliography using MEDLARS output. It will appear in November, 1965 and will contain approximately 6,000 bibliographic citations annually. I have been given permission by the Heart Institute to say that its staff and consultants have been delighted with the product. They have found a minimum of nonrelevant citations in response to the search questions. This is in keeping with our experience in the past 10 months in hundreds of other MEDLARS' searches in a broad spectrum of disciplinary and disease areas. Some of you may be familiar with the cerebrovascular bibliography which we have been producing since 1961 for the National Institute of Neurological Diseases and Blindness and the National Heart Institute. It yields about 15,000 citations annually. The most recent issue of this bibliography has been prepared by MEDLARS, with the type for its printing being set and composed by GRACE. MEDLARS is now producing approximately 100 bibliographies per month. It has a potential capacity of producing 100 per day. Since April, 1964, MEDLARS has performed 17 demand searches in the field of cardiovascular medicine. These searches produced a total of nearly 7000 citations going back to January 1963, when the MEDLARS input was begun. I would like to mentions few of these: In response to a request from the Division of Pediatric Cardiology of the Duke University Medical Center, we carried out a search of the literature on recordings of the electrocardiogram in the embryo. For the Heart Disease Control Program of the Public Health Service we searched out references pertaining to environmental influences on the cardiovascular system of man and animals; it included the effects of temperature, humidity, water, air, space and nutrition. For the University of Colorado Medical Center we fulfilled a request "To review the recent literature of heart block, particularly that type which is iatrogenic in origin (as by inadvertently placing a suture around the bundle of His during heart surgery), and the use of artificial pacemakers." For the National Institute of Child Health and Human Development, MEDLARS searched the literature on "The thyroid and atherosclerosis, with particular reference to coronary disease and hypothyroidism and myxedema." Other searches were concerned with areas as relatively broad as thrombosis and with narrow fields such as reserpine toxicology. A search for the

Page  7 Stanford Medical Center on blood coagulation factors in children brought forth 507 citations from 18 months of the literature. Other areas we have searched have had headings such as: chlorothiazide toxicology; cardiac glycosides, epidemiology of arteriosclerosis; ultrastructure of cardiac muscle in the embryo; blood substitutes; myocarditis; pulmonary vein damage. We believe MEDLARS-coupled with the tremendous literature resources of the Library-has an unprecedented capacity to serve the field of cardiovascular medicine. The problems of the practitioners are of special concern, because of the urgency of need, the requirement for speed and the latitude of interest. Moreover, a bibliographic citation, unto itself, is of little use to a practitioner unless he has the whole article and other necessary data which can be translated to the needs of his patients. MEDLARS offers a powerful tool for that purpose. It offers a new modality for improving the continuing education for practitioners. The NLM requires assistance from the medical community in developing new subject headings which reflect accurately the current state of the art of medicine. Thus we ask for critical feedback from the Library's customers, as to the value and effectiveness of MEDLARS products. This is necessary not only for our immediate programs, but to help insure that our systems for dealing with the literature keep pace with the changing character and needs of modern medicine. The Future We do not consider MEDLARS an end, but only a beginning. We already are looking forward to the development of a MEDLARS Mark II and even a Mark III. It is hoped that our research will lead also to improvements in the selection, acquisitioning, and cataloging of new materials, including the entry of pertinent monographs into MEDLARS. Other NLM projects include automation of the card catalog, improved graphic image storage and retrieval systems, and mechanization of the serial record. The information overlaod demands that we find new methods to cope with scientific data. The computer's versatility must be exploited to serve literature needs as well as those directly related to the conduct of research.