Th ThesaHelp: a preliminary test of directed search

ThesaHelp: a preliminary test of directed search

topics > Thesa topics > ThesaGroup: search with Thesa

ThesaHelp:
how to find everything relevant to some topic or question
ThesaHelp:
needle-in-a-haystack test of directed search
ThesaHelp:
read-everything test of directed search
ThesaHelp:
why does directed search succeed for Thesa

Summary

Barber performed an informal test of directed search in October 1989 with several subjects, and presented the results in a research seminar. The experiment should be redone in a formal setting.
The results indicate that directed search is better than keyword search or read-everything search.
Procedure

One person selected and wrote 10 questions for passages ("needles") in (QuoteRef: akscRM7_1988). The questions did not use keywords from the passages. The subjects were computer science graduate students. They were given a short introduction to Thesa and to the task. The subjects searched for relevant material for each question. There was three conditions: reading a preprint of the article, using a keyword index, or using directed search in Thesa.
See notes below for further details.
Read-everything Search
question #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total

found 1 8 15 9 7 2 6 1 4 1 54

missed 0 2 6 0 2 0 1 0 1 0 12

searched 220 220 220 220 220 220 220 220 220 220 2200

found needle 1 1 1 1 0 1 1 1 1 n/a 8
results

recall 0.82

precision 0.02

needles 8 of 9
Keyword Search
question #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total

found 1 2 7 3 3 1 3 0 1 0 21

missed 0 8 14 6 6 1 4 1 4 1 45

searched 1 13 17 5 32 42 31 3 8 17 169

found needle 1 0 0 1 n/a 1 1 0 0 n/a 4
results

recall 0.39

precision 0.12

needles 4 of 8
Thesa Directed Search
question #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total

found 1 10 21 9 8 2 5 1 5 1 63

missed 0 0 0 0 1 0 2 0 0 0 3

searched 17 44 38 13 18 18 28 5 25 5 211

found needle 1 1 1 1 n/a 1 1 1 1 n/a 8
results

recall 0.95

precision 0.30

needles 8 of 8

Notes

Found measures the number of relevant quotations found. Missed is the number of quotations found in some other search. Searched is the number of quotations viewed. Recall is 'found/(found+missed)' . Precision is 'found/searched'. Needles is the number of needles (i.e., pre-selected passages) found.
The passage for question #5 was from a figure of the preprint. It was not in the electronic copy of the article.
The subjects decided that the passage for question #10 was not relevant to the question. So the directed and keyword searches had eight needles to locate, and the read-everything searches had nine needles to locate.
In the read-everything case, Searched is the total number of quotations identified for the directed and keyword searches plus the number of tables and figures. Subjects were explicitly asked about the relevance of unfound needles. Other quotations were deemed relevant if one of the subjects deemed it relevant.

Related Topics

ThesaHelp: how to find everything relevant to some topic or question
ThesaHelp: needle-in-a-haystack test of directed search
ThesaHelp: read-everything test of directed search
ThesaHelp: why does directed search succeed for Thesa