Group: meaning and truth
Group: problems with hypertext
Topic: comparing paper to electronic access to information
Topic: loosely structured data
Topic: private language argument for skepticism about meaning
Topic: problem of assigning names
Topic: problem of classifying information
Topic: problem of information overload
Topic: problem of screen size
Topic: problems with reading hypertext
Topic: searching the Web
Topic: skepticism about knowledge
| |
Summary
Problems of information retrieval include scale, relevance, the tradeoff between precision and recall, archiving information, junk knowledge. (cbb 11/07)
Subtopic: measuring retrieval effectiveness
Quote: determine recall by exhaustive search, preidentified relevant documents, random sample in a relevant domain [»blaiDC_1990]
| Quote: can use Zipf's law to determine retrieval system effectiveness; want Zipfian rank:frequency for context and subject description usage [»blaiDC_1990]
| Subtopic: precision/recall tradeoff
Quote: high precision if narrow query, low precision if wide query [»saltG7_1986]
| Subtopic: problem of relevance
Quote: only the seeker of information is sure of what he or she is looking for; the push concept of information delivery only sells ads [»dvorJC3_1997]
| Quote: only the user can judge what is relevant to his or her own need [»greeR9_1995]
| Quote: relevance is being potentially helpful to a user in the resolution of a need [»greeR9_1995]
| Quote: a document may be rejected as irrelevant, even though it would help resolve the need at hand
| Subtopic: problem of scale
Quote: as a document retrieval system becomes larger, queries require intersecting terms to satisfy the futility point [»blaiDC_1990]
| Quote: with pervasive networking will want access to a trillion resources; use agents
| Quote: compare describing someone to meet at an airport gate vs. someone attending a baseball game; like information retrieval [»blaiDC_1990]
| Quote: pre-tests of STAIRS were successful because of small-scale databases [»blaiDC3_1985]
| Quote: vector space model does not work well on the web; returns short documents [»brinS4_1998]
| Quote: the fallacy of abundance: in a large information retrieval system, it is hard to write reasonable queries that do not retrieve at least some relevant documents [»blaiDC1_1996]
| Quote: the fallacy of abundance: the ease of retrieving information about a subject creates an illusion that little remains hidden [»swanDR10_1960]
| Subtopic: brute fact vs. meaning
Quote: information retrieval is based on the brute facts of documents; can not capture the meaning of a document
| Subtopic: data vs. document retrieval
Quote: in data retrieval, queries and data descriptions are fairly precise; simple matching is sufficient [»blaiDC1_1996]
| Quote: in document retrieval, queries and data descriptions are imprecise; especially for documents with certain intellectual content
| Subtopic: location/application vs. keyword/pattern
Quote: users overwhelmingly prefer to locate files by location or application rather than by keyword or filename pattern [»barrD7_1995]
| Quote: people prefer location-based searching over keywords and filenames; better reminding; place files where they will be seen [»barrD7_1995]
| Subtopic: problem of recall
Quote: the implicit assumption of simple full-text retrieval systems is that we recall words and phrases in a document exactly; but psychologists have shown that memory is inexact [»blaiDC1_1996]
| Subtopic: problem of manipulation
Quote: problem of manipulating search engines for profit; e.g., metadata is easily abused since it is invisible [»brinS4_1998]
| Subtopic: problem of junk
Quote: if an organization keeps all of its documents, searchers must wade through irrelevant information to find important documents; this noise degrades search performance [»blaiDC1_1996]
| Subtopic: problems with archiving information
Quote: since an individual's total information needs are large and complex, his Smalltalk system will also be large and complex
| Quote: designers need help in keeping track of notes and recalling them appropriately; requires understanding of the design process and the problem [»soloE5_1984]
| Quote: out-of-control users of electronic mail are archivers, read mail often, read and often file everything, keep a large inbox; and can't find messages [»mackWE10_1988]
| Quote: archiver-type users of electronic mail try to read and file everything; many distribution lists; problems with finding old mail [»mackWE10_1988]
| Quote: personal, computer files are ephemeral, working, or archival; users archived little information
| Subtopic: problem of change
Quote: a fully successful, manual index would imply that knowledge can be organized by an immutable and unambiguous indexing scheme; but, knowledge and language change [»swanDR10_1960]
| Subtopic: multiple index terms -- anchor vs. qualifiers
Quote: search queries for STAIRS may have four or five intersecting terms; performed poorly [»blaiDC3_1985]
| Quote: inquirers will tend to fix an anchor set of terms and add additional ones; since they can't judge the anchor set, they blame the added terms [»blaiDC_1990]
| Subtopic: STAIRS study of information retrieval
Quote: twenty percent recall during evaluation of STAIRS full-text document-retrieval system [»blaiDC3_1985]
| Quote: the STAIRS study used interactive retrieval; searchers could revise their queries until they believed that they had retrieved all of the documents they wanted [»blaiDC1_1996]
| Quote: the STAIRS study used the lawyers and paralegals who selected the 40,000 documents in the collection; like a personal document collection [»blaiDC1_1996]
| Quote: in STAIRS, users believed they were retrieving 75% instead of actual 20% [»blaiDC3_1985]
| Quote: searches in the first half of the STAIRS study had the same mean level of success as searches in the second half; evidence that searchers were operating at the best of their ability [»blaiDC1_1996]
| Quote: lawyers very surprised at low recall rate for STAIRS [»blaiDC3_1985]
| Quote: in STAIRS many search terms would retrieve ten thousand documents [»blaiDC3_1985]
| Quote: the STAIRS database concerned San Francisco's BART system [»blaiDC1_1996]
| Quote: the lawsuit between San Francisco and BART contractors was settled before the STAIRS evaluation
| Subtopic: studies of information retrieval
Quote: all known indexing procedures produce relatively mediocre results [»saltG4_1970]
| Quote: tested retrieval with questions that used words from text and/or headings, or neither [»eganDE5_1989, OK]
| Quote: in Viewdata study, half of questions answered incorrectly and search strategy often failed [»graySH2_1989]
| Quote: early comparison of full-text, computerized search with a manual index [»swanDR10_1960]
| Quote: in a 100 article collection, manual and full-text search retrieved less than half of the relevant documents on average
| Quote: full-text search retrieved more relevant documents than a manual index
| Subtopic: estimating recall
Quote: a candidate set is formed by negating one or more query terms; the STAIRS study estimated recall by sampling the candidate sets; usually small enough and rich enough in unretrieved, relevant documents to sample confidently [»blaiDC1_1996]
| Quote: recall studies of large document retrieval systems depended on the persistence of the evaluators and where they looked for unretrieved, relevant documents [»blaiDC1_1996]
| Quote: in a large collection, the percentage of unretrieved, relevant documents is too low to sample with confidence
| Subtopic: directed search vs. boolean queries
Quote: scan and select did as well as boolean queries when searching an electronic encyclopedia [»marcG1_1988]
| Subtopic: implicit structure
Quote: with named relationships can not follow paths implicitly defined by the data [»kentW_1978]
| Subtopic: automatic indexing
Quote: automatic indexing must input and verify twenty times as much data as manual indexing
|
Related Topics
Group: meaning and truth (18 topics, 634 quotes)
Group: problems with hypertext (7 topics, 98 quotes)
Topic: comparing paper to electronic access to information (35 items)
Topic: loosely structured data (20 items)
Topic: private language argument for skepticism about meaning (34 items)
Topic: problem of assigning names (25 items)
Topic: problem of classifying information (42 items)
Topic: problem of information overload (23 items)
Topic: problem of screen size (12 items)
Topic: problems with reading hypertext (9 items)
Topic: searching the Web (53 items)
Topic: skepticism about knowledge (34 items)
|