ThesaHelp: how to find everything relevant to some topic or question
ThesaHelp: needle-in-a-haystack test of directed search
ThesaHelp: read-everything test of directed search
ThesaHelp: why does directed search succeed for Thesa
| |
Summary
Barber performed an informal test of directed search in October 1989 with several subjects, and presented the results in a research seminar. The experiment should be redone in a formal setting.
The results indicate that directed search is better than keyword search or read-everything search.
Procedure
One person selected and wrote 10 questions for passages ("needles") in (QuoteRef: akscRM7_1988). The questions did not use keywords from the passages. The subjects were computer science graduate students. They were given a short introduction to Thesa and to the task. The subjects searched for relevant material for each question. There was three conditions: reading a preprint of the article, using a keyword index, or using directed search in Thesa.
See notes below for further details.
Read-everything Search
question | #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 | Total
|
---|
found | 1 | 8 | 15 | 9 | 7 | 2 | 6 | 1 | 4 | 1 | 54
|
---|
missed | 0 | 2 | 6 | 0 | 2 | 0 | 1 | 0 | 1 | 0 | 12
|
---|
searched | 220 | 220 | 220 | 220 | 220 | 220 | 220 | 220 | 220 | 220 | 2200
|
---|
found needle | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | n/a | 8
|
---|
results |
|
---|
recall | 0.82
|
---|
precision | 0.02
|
---|
needles | 8 of 9
|
---|
Keyword Search
question | #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 | Total
|
---|
found | 1 | 2 | 7 | 3 | 3 | 1 | 3 | 0 | 1 | 0 | 21
|
---|
missed | 0 | 8 | 14 | 6 | 6 | 1 | 4 | 1 | 4 | 1 | 45
|
---|
searched | 1 | 13 | 17 | 5 | 32 | 42 | 31 | 3 | 8 | 17 | 169
|
---|
found needle | 1 | 0 | 0 | 1 | n/a | 1 | 1 | 0 | 0 | n/a | 4
|
---|
results |
|
---|
recall | 0.39
|
---|
precision | 0.12
|
---|
needles | 4 of 8
|
---|
Thesa Directed Search
question | #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 | Total
|
---|
found | 1 | 10 | 21 | 9 | 8 | 2 | 5 | 1 | 5 | 1 | 63
|
---|
missed | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 3
|
---|
searched | 17 | 44 | 38 | 13 | 18 | 18 | 28 | 5 | 25 | 5 | 211
|
---|
found needle | 1 | 1 | 1 | 1 | n/a | 1 | 1 | 1 | 1 | n/a | 8
|
---|
results |
|
---|
recall | 0.95
|
---|
precision | 0.30
|
---|
needles | 8 of 8
|
---|
Notes
Found measures the number of relevant quotations found. Missed is the number of quotations found in some other search. Searched is the number of quotations viewed. Recall is 'found/(found+missed)' . Precision is 'found/searched'. Needles is the number of needles (i.e., pre-selected passages) found.
The passage for question #5 was from a figure of the preprint. It was not in the electronic copy of the article.
The subjects decided that the passage for question #10 was not relevant to the question. So the directed and keyword searches had eight needles to locate, and the read-everything searches had nine needles to locate.
In the read-everything case, Searched is the total number of quotations identified for the directed and keyword searches plus the number of tables and figures. Subjects were explicitly asked about the relevance of unfound needles. Other quotations were deemed relevant if one of the subjects deemed it relevant.
Related Topics
ThesaHelp: how to find everything relevant to some topic or question
ThesaHelp: needle-in-a-haystack test of directed search
ThesaHelp: read-everything test of directed search
ThesaHelp: why does directed search succeed for Thesa
|