Th Topic: problems with information retrieval

Topic: problems with information retrieval

topics > computer science > information > Group: information retrieval

Group:
meaning and truth
Group:
problems with hypertext

Topic:
comparing paper to electronic access to information
Topic:
loosely structured data
Topic:
private language argument for skepticism about meaning
Topic:
problem of assigning names
Topic:
problem of classifying information
Topic:
problem of information overload
Topic:
problem of screen size
Topic:
problems with reading hypertext
Topic:
searching the Web
Topic:
skepticism about knowledge

Summary

Problems of information retrieval include scale, relevance, the tradeoff between precision and recall, archiving information, junk knowledge. (cbb 11/07)

Subtopic: measuring retrieval effectiveness

Quote: determine recall by exhaustive search, preidentified relevant documents, random sample in a relevant domain [»blaiDC_1990]
Quote: can use Zipf's law to determine retrieval system effectiveness; want Zipfian rank:frequency for context and subject description usage [»blaiDC_1990]

Subtopic: precision/recall tradeoff

Quote: high precision if narrow query, low precision if wide query [»saltG7_1986]

Subtopic: problem of relevance

Quote: only the seeker of information is sure of what he or she is looking for; the push concept of information delivery only sells ads [»dvorJC3_1997]
Quote: only the user can judge what is relevant to his or her own need [»greeR9_1995]
Quote: relevance is being potentially helpful to a user in the resolution of a need [»greeR9_1995]
Quote: a document may be rejected as irrelevant, even though it would help resolve the need at hand

Subtopic: problem of scale

Quote: as a document retrieval system becomes larger, queries require intersecting terms to satisfy the futility point [»blaiDC_1990]
Quote: with pervasive networking will want access to a trillion resources; use agents
Quote: compare describing someone to meet at an airport gate vs. someone attending a baseball game; like information retrieval [»blaiDC_1990]
Quote: pre-tests of STAIRS were successful because of small-scale databases [»blaiDC3_1985]
Quote: vector space model does not work well on the web; returns short documents [»brinS4_1998]
Quote: the fallacy of abundance: in a large information retrieval system, it is hard to write reasonable queries that do not retrieve at least some relevant documents [»blaiDC1_1996]
Quote: the fallacy of abundance: the ease of retrieving information about a subject creates an illusion that little remains hidden [»swanDR10_1960]

Subtopic: brute fact vs. meaning

Quote: information retrieval is based on the brute facts of documents; can not capture the meaning of a document

Subtopic: data vs. document retrieval

Quote: in data retrieval, queries and data descriptions are fairly precise; simple matching is sufficient [»blaiDC1_1996]
Quote: in document retrieval, queries and data descriptions are imprecise; especially for documents with certain intellectual content

Subtopic: location/application vs. keyword/pattern

Quote: users overwhelmingly prefer to locate files by location or application rather than by keyword or filename pattern [»barrD7_1995]
Quote: people prefer location-based searching over keywords and filenames; better reminding; place files where they will be seen [»barrD7_1995]

Subtopic: problem of recall

Quote: the implicit assumption of simple full-text retrieval systems is that we recall words and phrases in a document exactly; but psychologists have shown that memory is inexact [»blaiDC1_1996]

Subtopic: problem of manipulation

Quote: problem of manipulating search engines for profit; e.g., metadata is easily abused since it is invisible [»brinS4_1998]

Subtopic: problem of junk

Quote: if an organization keeps all of its documents, searchers must wade through irrelevant information to find important documents; this noise degrades search performance [»blaiDC1_1996]

Subtopic: problems with archiving information

Quote: since an individual's total information needs are large and complex, his Smalltalk system will also be large and complex
Quote: designers need help in keeping track of notes and recalling them appropriately; requires understanding of the design process and the problem [»soloE5_1984]
Quote: out-of-control users of electronic mail are archivers, read mail often, read and often file everything, keep a large inbox; and can't find messages [»mackWE10_1988]
Quote: archiver-type users of electronic mail try to read and file everything; many distribution lists; problems with finding old mail [»mackWE10_1988]
Quote: personal, computer files are ephemeral, working, or archival; users archived little information

Subtopic: problem of change

Quote: a fully successful, manual index would imply that knowledge can be organized by an immutable and unambiguous indexing scheme; but, knowledge and language change [»swanDR10_1960]

Subtopic: multiple index terms -- anchor vs. qualifiers

Quote: search queries for STAIRS may have four or five intersecting terms; performed poorly [»blaiDC3_1985]
Quote: inquirers will tend to fix an anchor set of terms and add additional ones; since they can't judge the anchor set, they blame the added terms [»blaiDC_1990]

Subtopic: STAIRS study of information retrieval

Quote: twenty percent recall during evaluation of STAIRS full-text document-retrieval system [»blaiDC3_1985]
Quote: the STAIRS study used interactive retrieval; searchers could revise their queries until they believed that they had retrieved all of the documents they wanted [»blaiDC1_1996]
Quote: the STAIRS study used the lawyers and paralegals who selected the 40,000 documents in the collection; like a personal document collection [»blaiDC1_1996]
Quote: in STAIRS, users believed they were retrieving 75% instead of actual 20% [»blaiDC3_1985]
Quote: searches in the first half of the STAIRS study had the same mean level of success as searches in the second half; evidence that searchers were operating at the best of their ability [»blaiDC1_1996]
Quote: lawyers very surprised at low recall rate for STAIRS [»blaiDC3_1985]
Quote: in STAIRS many search terms would retrieve ten thousand documents [»blaiDC3_1985]
Quote: the STAIRS database concerned San Francisco's BART system [»blaiDC1_1996]
Quote: the lawsuit between San Francisco and BART contractors was settled before the STAIRS evaluation

Subtopic: studies of information retrieval

Quote: all known indexing procedures produce relatively mediocre results [»saltG4_1970]
Quote: tested retrieval with questions that used words from text and/or headings, or neither [»eganDE5_1989, OK]
Quote: in Viewdata study, half of questions answered incorrectly and search strategy often failed [»graySH2_1989]
Quote: early comparison of full-text, computerized search with a manual index [»swanDR10_1960]
Quote: in a 100 article collection, manual and full-text search retrieved less than half of the relevant documents on average
Quote: full-text search retrieved more relevant documents than a manual index

Subtopic: estimating recall

Quote: a candidate set is formed by negating one or more query terms; the STAIRS study estimated recall by sampling the candidate sets; usually small enough and rich enough in unretrieved, relevant documents to sample confidently [»blaiDC1_1996]
Quote: recall studies of large document retrieval systems depended on the persistence of the evaluators and where they looked for unretrieved, relevant documents [»blaiDC1_1996]
Quote: in a large collection, the percentage of unretrieved, relevant documents is too low to sample with confidence

Subtopic: directed search vs. boolean queries

Quote: scan and select did as well as boolean queries when searching an electronic encyclopedia [»marcG1_1988]

Subtopic: implicit structure

Quote: with named relationships can not follow paths implicitly defined by the data [»kentW_1978]

Subtopic: automatic indexing

Quote: automatic indexing must input and verify twenty times as much data as manual indexing

Related Topics

Group: meaning and truth (18 topics, 634 quotes)
Group: problems with hypertext (7 topics, 98 quotes)
Topic: comparing paper to electronic access to information (35 items)
Topic: loosely structured data (20 items)
Topic: private language argument for skepticism about meaning (34 items)
Topic: problem of assigning names (25 items)
Topic: problem of classifying information (42 items)
Topic: problem of information overload (23 items)
Topic: problem of screen size (12 items)
Topic: problems with reading hypertext (9 items)
Topic: searching the Web (53 items)
Topic: skepticism about knowledge
(34 items)