Topic: full-text indexing
Topic: information retrieval by cross reference
Topic: information retrieval with queries
Topic: manual indexing
Topic: thesaurus and information retrieval
Topic: using keywords to search hypertext
| |
Summary
An index maps terms and phrases to pages. Indexing is important for information retrieval. It allows random access into text. For exmaple, most non-fiction books include an index.
Search is particularly important on the Web.
Individual words may be too general or too specific for indexing. Manual indices usually mix words, phrases, and names to limit their scope. Stemming, inexact matching, synonyms, and variants may be helpful. (cbb 10/07)
Subtopic: importance of indexing
Quote: the CADES indices for documentation and source files are very important [»pratGD9_1976]
| Quote: besides providing an access path for retrieval, an index describes the term usage of a database; signature files lose this information [»eastCM1_1988]
| Subtopic: World Wide Web
Quote: even though the World Wide Web was built on the idea of links, global navigation has been replaced by search engines [»furnGW3_1997]
| Subtopic: keyword selection
Quote: discrimination value weighting: best keywords occur frequently inside particular documents but infrequently outside [»saltG7_1986]
| Quote: the best keywords occur neither too rarely or too frequently; 10% hits is good [»saltG7_1986]
| Subtopic: search scope
Quote: to increase precision, default search scope is limited to title, key terms, general description, application, and captions
| Subtopic: result display
Quote: used page miniatures for visual index color-coded by chapter [»vandA7_1988]
| Quote: best retrieval with 150-300 word passages starting every 25 words [»kaszM10_1999]
| Subtopic: user studies of indexing
Quote: subjects using an index located facts quicker than those with embedded menus; experience reduced differences [»marcG1_1988]
| Quote: keyword searches both preferred and performed better than a menu system based on Dewey Decimal numbers [»sowaJF_1984]
| Quote: sets of terms are the most effective index and query procedure; only a synonym dictionary produces improved performance [»saltG4_1970]
| Quote: study found searching on titles and abstracts as good as controlled keywords [»saltG7_1986]
| Quote: most Hyperties subjects used an index when asked to make an efficient search [»marcG1_1988]
| Subtopic: multiple index terms
Quote: sets of terms are the most effective index and query procedure; only a synonym dictionary produces improved performance [»saltG4_1970]
| Quote: phrases are not substantially superior to single terms for indexing; nor are sophisticated analysis tools [»saltG4_1970]
| Subtopic: stemming
Quote: document characterized by stem counts; query converted to stem count followed by best fit lookup [»dattRT1_1979, OK]
| Quote: different stemmers behave similarly as far as retrieval effectiveness [»paicCD8_1996]
| Quote: test stemmers with a dictionary of groups of semantically equivalent words; use an interactive program to divide words into tightly and loosely related groups [»paicCD8_1996, OK]
| Quote: stemmer behavior depends on stemmer weight; i.e., how aggressively the stemmer combines words [»paicCD8_1996]
| Quote: stemmers are up to a third better than simply truncating words at five to seven characters
| Quote: truncating words at six characters is better than five or seven characters
| Subtopic: partial match
Quote: handle partial match queries by a permuted dictionary where each word appears in all possible rotations [»wittIH_1991]
| Quote: Glimpse features a small index and approximate matching; divide data into blocks, index the blocks, full text search of possible matches [»manbU10_1993]
| Quote: perform approximate matching on words in index, then use inverted index to locate data [»manbU10_1993]
| Subtopic: file search
Quote: a Presto document collection is the documents in its inclusion list, plus those matching its query, minus those in its exclusion list [»dourP6_1999]
| Subtopic: automatic synonyms, aliasing and variants
Quote: study showed improved recall with variant forms, synonyms, and hierarchical thesaurus [»marcG1_1988]
| Quote: unlimited aliasing of keywords can increase recall from 20% to 80%
| Quote: tested search by unlimited aliasing with controlled vocabulary; found the opposite: some index terms are better than others [»brooTA4_1993]
| Quote: SuperBook users can add synonyms for keywords to improve access by all users [»remdJR11_1987]
| Quote: a thesaurus represents the relations between word senses rather than words; be careful when expanding query words with a thesaurus [»krovR4_1992]
| Quote: use a thesaurus to define index terms and groups of related terms; first use of thesaurus in information retrieval [»robeN12_1984]
| Subtopic: examples
Quote: Compendium supports boolean search of any part of an entry [»glusRJ5_1989]
|
Related Topics
Topic: full-text indexing (37 items)
Topic: information retrieval by cross reference (7 items)
Topic: information retrieval with queries (18 items)
Topic: manual indexing (19 items)
Topic: thesaurus and information retrieval (29 items)
Topic: using keywords to search hypertext (26 items)
|