Map
Index
Random
Help
th

Quote: index gigabyte text collections with compression and an external, multi-way mergesort; average of one byte per pointer; less than 4 hours

topics > all references > references m-o > QuoteRef: moffA8_1995 , p. abstract



Topic:
text compression

Quotation Skeleton

This paper describes a new indexing algorithm designed … the positive integers and an in-place external multi-way … [of nearly 2 million short documents and over 500,000 terms] in under 4 hours, using less than 40 megabytes of … [p. 538] The final compressed index occupies less than … [It indexes the byte-address of every case-folded and stemmed term in the collection using] an average of 1 byte for each … [pointer] … [p. 539] [An in memory algorithm requires 5 gigabytes of main memory.] … [p. 540] [A multi-pass, disk-based merge sort without compression takes 22 hours with 40 megabytes of main memory and 10 gigabytes of disk space.]   Google-1   Google-2

Copyright clearance needed for quotation.


Related Topics up

Topic: text compression (16 items)

Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.