Topic: external search and sort

topics > computer science > programming > Group: patterns

database keys and indexing
database queries, joins, and relational algebra
full-text indexing
information retrieval by searching
search algorithms
searching compressed data
signature files
sort algorithms
World-Wide Web
Subtopic: external memory subroutines up

Quote: survey of external memory algorithms; includes sorting, permuting, FFT, graphs, databases, GIS, text processing [»vittJS6_2001]
Quote: survey of sorting in database systems; in-memory sorting, disk-based external sorting, implementation techniques [»graeG9_2006]

Subtopic: external sort up

Quote: simple, optimal, external sort using sampling and a priority queue; exactly two reads and writes per record [»leuFC2_2000]
Quote: hillsort generalizes heapsort for external sorting; no temporary external storage, brothers in 1 seek, as good as merge sort [»wegnLM7_1989]
Quote: Grab records the blocks in a file that contain an index term; speeds up search with very low storage overhead [»leskM3_1988]
Quote: the retrieval and scan of inverted lists is the dominant cost of search queries on large text databases [»moffA10_1996]
Quote: with an internal index for each compressed inverted list, reduce processing time 80% with an index that is 10% of the text

Subtopic: external trie up

Quote: a pointer-free trie structure for very large sets on secondary storage; 2 bits per node; organized into pages in a tree [»shanH8_1996]

Subtopic: disk index up

Quote: B+-trees are nearly optimal for disk-based indices; good storage utilization [»bayeR_1972]

Subtopic: spatial search up

Quote: R*-trees for efficient and robust spatial access for points and rectangles; e.g., spatial join; analysis of clipping, transformations, and overlapping regions [»beck5_1990]
Quote: a grid file guarantees multi-dimensional access in at most two I/O operations; the union of grid cells is a bounding box [»shasD_2004]

Subtopic: merging up

Quote: use file snapshot and record append for producer-consumer queues and many-way merging; minimizes synchronization overhead; record append at-least-once at a known offset [»gherS10_2003]

Subtopic: time-split B-tree up

Quote: use time-split B-tree (TSB-tree) for time-slice queries of a transaction-time database; data clustered by time-slice and key; replicate historical versions as needed; store on WORM disk [»lomeD_1993]
Quote: time-split B-trees are space and time efficient; the fraction of redundant records is bounded [»lomeD_1993]
Quote: time-split B-trees allow concurrent updates via short, independent atomic actions [»lomeD_1993]

Subtopic: n-grams up

Quote: n-grams of text streams are efficient, robust, and capture statistically significant phrases [»coheJD12_1998]

Subtopic: read-only sort up

Quote: if insert into an index without rewriting can write during reads and can use optical disks [»faloC9_1988]

Subtopic: searching compressed data up

Quote: use Burrows-Wheeler block-sorted data for dynamic searching; compressed, rapid searches; groups similar substrings together [»willK12_2003]
Quote: fast searching of compressed text using Huffman encoding, shift-or search, and Boyer-Moore filter; 30% compression, 2-8x faster than agrep [»demoES8_1998]
Quote: DCA compression and antidictionaries allow search and parallel algorithms because their encodings do not depend on context except for the block's prefix [»crocM7_1999]

Subtopic: distributed, exhaustive search up

Quote: use Chinese lotto to rapidly perform exhaustive search; perform random tests by millions of participants; e.g., code breaking [»quisJJ11_1991]
Quote: the Connection Machine can search for substrings in time proportional to the length of the substring [»hillWD_1985]

Subtopic: manual sort up

Quote: Murray's Scriptorium was a simple shed lined with a 1000 pigeon holes and later converted to shelves [»murrKM_1977]

Related Topics up

Topic: b-trees (16 items)
Topic: database keys and indexing (18 items)
Topic: database queries, joins, and relational algebra (34 items)
Topic: full-text indexing (37 items)
Topic: information retrieval by searching (35 items)
Topic: search algorithms (40 items)
Topic: searching compressed data (9 items)
Topic: signature files (21 items)
Topic: sort algorithms (24 items)
Topic: World-Wide Web
(42 items)

Updated barberCB 7/05
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.