Topic: full-text indexing
Topic: suffix trie and suffix array
Topic: pattern matching
Topic: signature files
Topic: searching the Web
Topic: hypertext links
| |
Reference
Zobel, J., Moffat, A.,
"Inverted files for text search engines",
ACM Computing Surveys, 38, 2, July 2006, pp. 1-56.
Google
Quotations
2 ;;Quote: indexing core for constructing a document-level index with ranked query evaluation; refinements for reorganization, phrases, and compression; includes bibliography
| 8 ;;Quote: use inverted files for text query evaluation; better than suffix arrays and signature files
| 8 ;;Quote: all terms should be indexed; including numbers, URL tokens, and stopwords
| 13 ;;Quote: build two indices, a word-level index for phrase and Boolean searches and a document-level index for ranked queries
| 13 ;;Quote: for phrase queries, use an index for word pairs that begin with a common word; the three commonest words halves phrase querying time
| 16 ;;Quote: merge-based index construction scales well; 100MB of memory, can minimize disk space overhead, only one parsing pass, compression effective
| 32 ;;Quote: while fast for grep, suffix arrays do not scale well; no compression, 1.7x memory-resident data, no ranked queries
| 32 ;;Quote: grep pattern matching by suffix array of vocabulary or by inverted index of digrams or trigrams
| 33 ;;Quote: problems with signature files -- false matches are linear in the collection size, large index, more disk accesses for short queries, and no ranked queries
| 35 ;;Quote: index the anchor text as a title of the target page; helps identify a document; good for Web searching
|
Related Topics
Topic: full-text indexing (37 items)
Topic: suffix trie and suffix array (20 items)
Topic: pattern matching (42 items)
Topic: signature files (21 items)
Topic: searching the Web (53 items)
Topic: hypertext links (45 items)
|