Map
Index
Random
Help
Topics
th

QuoteRef: brinS4_1998

topics > all references > ThesaHelp: references a-b



ThesaHelp:
references a-b
Topic:
searching the Web
Topic:
hypertext as a global database
Topic:
probability
Topic:
hypertext links
Topic:
problems with information retrieval
Topic:
information retrieval with queries
Topic:
data caching
Group:
data structures
Topic:
unique numeric names as surrogates
Topic:
examples of file systems
Topic:
data compression algorithms
Topic:
error safe systems
Topic:
full-text indexing
Topic:
bugs

Reference

Brin, S., Page, L., "The anatomy of a large-scale hypertextual web search engine", Seventh International World-Wide Web Conference, April 1998, Brisbane, Australia, Computer Networks and ISDN Systems, 30, 1-7, pp. 107-117. Google

Other Reference

notes from full version at
http://www.elsevier.nl:80/cas/tree/store/comnet/free/www7/00/index.htm or http://google.stanford.edu

Notes

section numbers from http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm

Quotations
1.3.2 ;;Quote: Google stores all web documents it finds; allows independent, efficient research of the web
2.1.1 ;;Quote: rank web pages by counting backlinks to page; normalize by number of links; PageRank is probability of a visit
2.1.1 ;;Quote: definition of PageRank algorithm; iterative calculation using citations, out links, and damping factor
2.1.2 ;;Quote: high PageRank if many pages point to the page, or if highly ranked pages point to the page
2.2 ;;Quote: index anchor text as well as target; 259 million anchors for 24 million pages
2.3 ;;Quote: Google records proximity, font size, and raw HTML for all pages
3.1 ;;Quote: vector space model does not work well on the web; returns short documents
3.1+;;Quote: a search for "Bill Clinton" should return reasonable results
3.2 ;;Quote: problem of manipulating search engines for profit; e.g., metadata is easily abused since it is invisible
4.2 ;;Quote: designed Google data structures to avoid disk seeks; a seek still takes 10 milliseconds
4.2.1 ;;Quote: Google uses BigFiles with 64-bit offsets, multiple file systems, compression, and allocation/deallocation
4.2.2 ;;Quote: Google compresses repository uses zlib; faster than bzip
4.2.2+;;Quote: Google stores docID, length, URL and document; with error log, can rebuild everything
4.2.5 ;;Quote: compact encoding of indexed hit list; plain hits with capitalization, font size, offset; fancy hits for URL, anchor, etc
4.3 ;;Quote: running the web crawler generated a fair amount of e-mail and phone calls; need to solve problems as they occur
5.2 ;;Quote: Google processes 4 million pages a day; indexer keeps up with the crawler


Related Topics up

ThesaHelp: references a-b (396 items)
Topic: searching the Web (45 items)
Topic: hypertext as a global database (28 items)
Topic: probability (21 items)
Topic: hypertext links (45 items)
Topic: problems with information retrieval (51 items)
Topic: information retrieval with queries (18 items)
Topic: data caching (28 items)
Group: data structures   (12 topics, 275 quotes)
Topic: unique numeric names as surrogates (67 items)
Topic: examples of file systems (44 items)
Topic: data compression algorithms (53 items)
Topic: error safe systems (75 items)
Topic: full-text indexing (35 items)
Topic: bugs (65 items)

Collected barberCB 12/98 5/04
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.