Map
Index
Random
Help
Topics
th

ThesaHelp: a preliminary test of directed search

topics > Thesa topics > ThesaGroup: search with Thesa



ThesaHelp:
how to find everything relevant to some topic or question
ThesaHelp:
needle-in-a-haystack test of directed search
ThesaHelp:
read-everything test of directed search
ThesaHelp:
why does directed search succeed for Thesa

Summary

Barber performed an informal test of directed search in October 1989 with several subjects, and presented the results in a research seminar. The experiment should be redone in a formal setting.

The results indicate that directed search is better than keyword search or read-everything search.

Procedure

One person selected and wrote 10 questions for passages ("needles") in (QuoteRef: akscRM7_1988). The questions did not use keywords from the passages. The subjects were computer science graduate students. They were given a short introduction to Thesa and to the task. The subjects searched for relevant material for each question. There was three conditions: reading a preprint of the article, using a keyword index, or using directed search in Thesa.
See notes below for further details.

Read-everything Search

question#1#2#3#4#5#6#7#8#9#10Total
found1815972614154
missed026020101012
searched2202202202202202202202202202202200
found needle111101111n/a8

results
recall0.82
precision0.02
needles8 of 9

Keyword Search

question#1#2#3#4#5#6#7#8#9#10Total
found127331301021
missed0814661414145
searched1131753242313817169
found needle1001n/a1100n/a4

results
recall0.39
precision0.12
needles4 of 8

Thesa Directed Search

question#1#2#3#4#5#6#7#8#9#10Total
found11021982515163
missed00001020003
searched174438131818285255211
found needle1111n/a1111n/a8

results
recall0.95
precision0.30
needles8 of 8

Notes

Found measures the number of relevant quotations found. Missed is the number of quotations found in some other search. Searched is the number of quotations viewed. Recall is 'found/(found+missed)' . Precision is 'found/searched'. Needles is the number of needles (i.e., pre-selected passages) found.

The passage for question #5 was from a figure of the preprint. It was not in the electronic copy of the article.
The subjects decided that the passage for question #10 was not relevant to the question. So the directed and keyword searches had eight needles to locate, and the read-everything searches had nine needles to locate.

In the read-everything case, Searched is the total number of quotations identified for the directed and keyword searches plus the number of tables and figures. Subjects were explicitly asked about the relevance of unfound needles. Other quotations were deemed relevant if one of the subjects deemed it relevant.


Related Topics up

ThesaHelp: how to find everything relevant to some topic or question
ThesaHelp: needle-in-a-haystack test of directed search
ThesaHelp: read-everything test of directed search
ThesaHelp: why does directed search succeed for Thesa

Updated barberCB 10/1997
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.