Topic: information retrieval with an index

topics > computer science > information > Group: information retrieval

full-text indexing
information retrieval by cross reference
information retrieval with queries
manual indexing
thesaurus and information retrieval
using keywords to search hypertext


An index maps terms and phrases to pages. Indexing is important for information retrieval. It allows random access into text. For exmaple, most non-fiction books include an index.

Search is particularly important on the Web.

Individual words may be too general or too specific for indexing. Manual indices usually mix words, phrases, and names to limit their scope. Stemming, inexact matching, synonyms, and variants may be helpful. (cbb 10/07)

Subtopic: importance of indexing up

Quote: the CADES indices for documentation and source files are very important [»pratGD9_1976]
Quote: besides providing an access path for retrieval, an index describes the term usage of a database; signature files lose this information [»eastCM1_1988]

Subtopic: World Wide Web up

Quote: even though the World Wide Web was built on the idea of links, global navigation has been replaced by search engines [»furnGW3_1997]

Subtopic: keyword selection up

Quote: discrimination value weighting: best keywords occur frequently inside particular documents but infrequently outside [»saltG7_1986]
Quote: the best keywords occur neither too rarely or too frequently; 10% hits is good [»saltG7_1986]

Subtopic: search scope up

Quote: to increase precision, default search scope is limited to title, key terms, general description, application, and captions

Subtopic: result display up

Quote: used page miniatures for visual index color-coded by chapter [»vandA7_1988]
Quote: best retrieval with 150-300 word passages starting every 25 words [»kaszM10_1999]

Subtopic: user studies of indexing up

Quote: subjects using an index located facts quicker than those with embedded menus; experience reduced differences [»marcG1_1988]
Quote: keyword searches both preferred and performed better than a menu system based on Dewey Decimal numbers [»sowaJF_1984]
Quote: sets of terms are the most effective index and query procedure; only a synonym dictionary produces improved performance [»saltG4_1970]
Quote: study found searching on titles and abstracts as good as controlled keywords [»saltG7_1986]
Quote: most Hyperties subjects used an index when asked to make an efficient search [»marcG1_1988]

Subtopic: multiple index terms up

Quote: sets of terms are the most effective index and query procedure; only a synonym dictionary produces improved performance [»saltG4_1970]
Quote: phrases are not substantially superior to single terms for indexing; nor are sophisticated analysis tools [»saltG4_1970]

Subtopic: stemming up

Quote: document characterized by stem counts; query converted to stem count followed by best fit lookup [»dattRT1_1979, OK]
Quote: different stemmers behave similarly as far as retrieval effectiveness [»paicCD8_1996]
Quote: test stemmers with a dictionary of groups of semantically equivalent words; use an interactive program to divide words into tightly and loosely related groups [»paicCD8_1996, OK]
Quote: stemmer behavior depends on stemmer weight; i.e., how aggressively the stemmer combines words [»paicCD8_1996]
Quote: stemmers are up to a third better than simply truncating words at five to seven characters
Quote: truncating words at six characters is better than five or seven characters

Subtopic: partial match up

Quote: handle partial match queries by a permuted dictionary where each word appears in all possible rotations [»wittIH_1991]
Quote: Glimpse features a small index and approximate matching; divide data into blocks, index the blocks, full text search of possible matches [»manbU10_1993]
Quote: perform approximate matching on words in index, then use inverted index to locate data [»manbU10_1993]

Subtopic: file search up

Quote: a Presto document collection is the documents in its inclusion list, plus those matching its query, minus those in its exclusion list [»dourP6_1999]

Subtopic: automatic synonyms, aliasing and variants up

Quote: study showed improved recall with variant forms, synonyms, and hierarchical thesaurus [»marcG1_1988]
Quote: unlimited aliasing of keywords can increase recall from 20% to 80%
Quote: tested search by unlimited aliasing with controlled vocabulary; found the opposite: some index terms are better than others [»brooTA4_1993]
Quote: SuperBook users can add synonyms for keywords to improve access by all users [»remdJR11_1987]
Quote: a thesaurus represents the relations between word senses rather than words; be careful when expanding query words with a thesaurus [»krovR4_1992]
Quote: use a thesaurus to define index terms and groups of related terms; first use of thesaurus in information retrieval [»robeN12_1984]

Subtopic: examples up

Quote: Compendium supports boolean search of any part of an entry [»glusRJ5_1989]

Related Topics up

Topic: full-text indexing (37 items)
Topic: information retrieval by cross reference (7 items)
Topic: information retrieval with queries (18 items)
Topic: manual indexing (19 items)
Topic: thesaurus and information retrieval (29 items)
Topic: using keywords to search hypertext
(26 items)

Updated barberCB 6/05
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.