This paper describes a new indexing algorithm designed … the positive integers and an in-place external multi-way … [of nearly 2 million short documents and over 500,000 terms] in under 4 hours, using less than 40 megabytes of … [p. 538] The final compressed index occupies less than … [It indexes the byte-address of every case-folded and stemmed term in the collection using] an average of 1 byte for each … [pointer] … [p. 539] [An in memory algorithm requires 5 gigabytes of main memory.] … [p. 540] [A multi-pass, disk-based merge sort without compression takes 22 hours with 40 megabytes of main memory and 10 gigabytes of disk space.]
Google-1
Google-2
Copyright clearance needed for quotation.