Competency E

Design, query, and evaluate information retrieval systems

Introduction

Design Principles

Best practices for the design of any given information retrieval system will rely most heavily upon the type of information included in the system and what it is the users of the system need. However, there are some standards which can help guide the design of an information retrieval system. Ultimately, the design of any given system comes down to the details. Chowdhury (2010, p. 10) describes the process of information retrieval system design as “a series of choices from which the designer selects each element and tries to fit it with the proposed objective of the system.” As such, each decision must work to improve the quality of the information retrieval system. If a given decision is worsening or simply not improving the design of the system, the decision should be reevaluated.

Chowdhury (2010, p. 2) writes, “The major objective of an information retrieval system is to retrieve the information – either the actual information or the documents containing the information – that fully or partially match the user’s query.” This has implications far beyond just what an information retrieval system does. In fact, it means the design for any given information retrieval system. The type of information being retrieved should influence the design of the system. Having a good understanding of the type of information which makes up an information retrieval system should include an understanding of how users of that information are likely to search for the information. This, too, should hold significant weight in the design.

Another important aspect to consider is cross-system compatibility. Although accessing a database’s information directly through the database may be the most effective way to find information, allowing the search of this database through another system, such as an OPAC, can increase its value to users. Backwards compatibility is equally important. As systems improve and evolve, their ability to continue to access past information is crucial. Standards help make this possible. “[Standards] aim to enable machines and information systems to communicate with one another by sharing and exchanging data and to enable users to have access to more than one information system…using the same techniques and interface” (Chowdhury, 2010, p. 484).

Ultimately, designers of information retrieval systems must ensure their systems are easily searchable and display the results in an organized fashion which permits users to locate relevant information.

Querying and Information Retrieval

The method of searching a given information retrieval system is, like so many other aspects of Library and Information Science, dependent on a number of factors. The type of database and what kind of information it holds matters immensely. A website like Tumblr or Twitter may encourage users to tag their content in order to organize and synthesize it, but because these sites do not hold their users to any standard, there is no vocabulary control. Finding related content may require, then, a large list of synonyms and an understanding of the particular jargon of a given subject. Unlike Google, for example, these sites do not produce results that reflect synonyms. If a searcher wishes to capture the whole of a subject on Tumblr, Twitter, or a similarly designed information retrieval system, they would have to search diligently.

Meanwhile, a more structured system, such as an EBSCO database, will do more to guide the user through an efficient and fruitful search. Features like articles linked by subject headings and lists of articles that cite another, single article make finding related information simple. Once the “correct” vocabulary for a set of information is determined, locating a wealth of information is straightforward as the creators of the database have cataloged the content to fit this vocabulary.

Other information retrieval systems categorize by the type of information. This is common in the typical public library catalog and is also exemplified in Facebook. Although a keyword search may locate the content which the user seeks, these information retrieval systems are capable of filtering by type. A public library catalog, for instance, may allow limiting a search by audiobooks only. Facebook, meanwhile, will allow a user to search just by events, people, or other type-specific content.

A search strategy will largely depend, again, on the information retrieval system and the desired outcome. Chowdhury (2010, p. 201) remarks that prior to diving into a search strategy, the type of search being performed should be defined. The types of searches, listed by Chowdhury, are as follows: known item search, search for specific information or a fact, search for information related to a problem or issue, exploratory search, and search to keep up to date in a specific field.

Once the type of search has been identified, the user can move forward with the search. A Boolean search will allow the user to search the content in a more dynamic way by eliminating, substituting, or supplementing words or phrases for a more complete approach. Depending on the design of the information retrieval system, the user may be able to search not only by keyword, but by date range, author, publisher, title, or whatever other identifier the designer has allowed for.

By enabling a function which displays results by relativity in a database, a user can use what Chowdhury (2010, p. 207) calls a probabilistic retrieval model. The top search results should, if the user has entered appropriate vocabulary, best match the topic in which they’re interested in. The farther down the list the user views results, the less the database “believes” the articles are related to the search. This can save the searcher time in finding the most relevant articles or materials. A similar system works when the use of the database by other searchers factors in. Some systems will learn and produce better results based on what items past searchers seeking the same information have selected or viewed in greater detail.

Evaluation

Evaluation is an important part of maintaining a well-functioning and useful information retrieval system. Chowdhury (2010, p. 282) notes, “Many information scientists advocate that evaluation of an information retrieval system should always be user-oriented” as opposed to manager-oriented. This makes sense as the users are the customers and with unhappy customers, an information retrieval system may become unused and therefore pointless.

Prior to beginning an evaluation, those designing the evaluation must determine what successes and failures are to be measured and how. Different information scientists have come up with differing lists of criteria for these evaluations, but, ultimately, criteria should reflect the needs of the community being served. Taking this into account, any good information retrieval system will be reasonably precise in its returns. Beyond this, the evaluative criteria will be far better determined with the community of users and potential users in mind.

Evidence

LIBR 202 Individual Evaluation

This evaluation discusses the successes and failures of a group project in which the group designed an information retrieval system of candles. With input from another group in the class, this assignment examines the shortcomings not only through the lens of a producer but also through that of a user, which exposed weaknesses the designers might not otherwise have been aware of. Lessons learned report on the difficulty of not having all of the information necessary to build a comprehensive profile on a given candle (or whatever item is to be registered in a system), and the importance of including clear rules and instructions on how to get the most out of an information retrieval system for users. As it actively describes and critiques an information retrieval system, I believe it is sufficient evidence for Competency E.

 

LIBR 202 Exercise 1C

This assignment is an example of a rule written for an item to be entered into an information retrieval system. As the same person may not necessarily be entering every item into an information retrieval system, it is important to have rules that establish a standard for the entering of new data so that all information for all items match. If, for example, one of the data points for an item was its length and some data enterers supplied length in inches while others used centimeters, the database may well be useless for anyone trying to find items of a particular length. Additionally, information should be entered consistently. Continuing with the inches example, if items are entered as “6 inches,” “six inches,” and “6” ”, locating all items that six inches long will take additional time as all three formats would require searching. Users who are unaware of this may miss out on entire lists of items, too. This rule for entering data on pens provides an example of a well-written rule for an information retrieval system and therefore satisfies evidence of Competency E.

 

Conclusion

The advance of technology is providing humans with far greater abilities in their information retrieval systems. While machine learning may provide excellent opportunities for machines to do more work than the humans using them (such as searching for synonyms without the user having to enter each synonym individually in a search), the first step to successful searching is a solid and well-designed information retrieval system. Having a good understanding of how these systems work not only allows library staff to build these systems, but also allows library staff to search these systems in a better and more nuanced way with better results.

References

Chowdhury, G. G. (2010). Introduction to modern information retrieval. New York, NY: Neal-Schuman Publishers, Inc.