Group: algorithms
Group: mathematics
Topic: continuum in mathematics
Topic: discrete vs. continuous
Topic: FOCUS number system
Topic: geometry
Topic: integer values and operations
Topic: kinds of numbers
Topic: numerical error
Topic: science as measurement
Topic: type conversion
Topic: unbounded precision
Topic: units
Topic: value as an abstraction
 
Summary
A real number is the least upper bound of a set or segment of rational ratios. A number is irrational if the segment is open.
Floating point numbers represent a real number with a bounded number of bits. They are widely used for scientific and graphics programming. IEEE Standard 754 is a formal specification of floating point numbers. It is implemented by SANE and Modula3. The standard carefully defines roundoff, floating point formats, exceptions, infinity, and denormalized numbers. These allow careful analysis of an algorithm.
Fixed point fractions and rational ratio approximation may be used in place of floating point numbers. The EDSAC represented all reals by fractional numbers. EDSAC programmers found scaling difficult and could use a floating point interpreter instead.
Also mentioned: multiprecision arithmetic, real to string conversion, and arithmetic coding. (cbb 4/98)
Subtopic: real numbers
Quote: construct the real numbers as segments of series of rational ratios in order of magnitude; an irrational number is a segment without boundary [»russB_1919, OK]
 Quote: every bounded subset of the reals has a least upper bound [»simmGF_1963]
 Quote: every rational and irrational number is symbol for a cut that divides the real numbers into two
 Quote: the computable numbers are the real numbers whose decimal expression can be calculable by finite means; because human memory is limited [»turiAM11_1936]
 Subtopic: complex numbers
Quote: instantiating 'complex' instantiates two instance of the form 'real' with the names 'r' and 'i' [»wulfWA4_1974]
 Quote: for three numbers to be a vector they must be associated with a coordinate system so that rotating the coordinate system rotates the vector [»feynRP_1963]
 Subtopic: floating point
Quote: students preferred drawing in screen pixels until they had written several custom widgets; floating point easier and more abstract [»pausR10_1992]
 Quote: real numbers are unrepresentable ideals which are approximated in a computer [»wegnP10_1986, OK]
 Quote: interpreter represents numbers as a 24 bit mantissa and a 6 bit exponent; 8 significant decimal digits [»laniJH1_1954]
 Subtopic: floating point scale
Quote: floatingpoint support should include the precision of a number, and conversions between the number and its components [»reidJK6_1980]
 Quote: Modula3 provides three fixed floatingpoint types for efficiency: real, longreal, and extended
 Quote: Modula3's strict conversions requires separate representations for real, longreal, and extended literals; makes it difficult to write generic procedures [»goldD6_1992]
 Quote: use one or more 'long' annotations instead of required decimal places; otherwise mismatch between number and its representation [»wirtN6_1966]
 Subtopic: floatingpoint standards
Quote: tutorial on floating point, rounding error, standards, and improved support of floating point [»goldD3_1991]
 Quote: programming languages do not fully support IEEE floatingpoint arithmetic; e.g., rounding direction and floatingpoint exceptions [»versD3_1997]
 Quote: SANE is a thorough implementation of IEEE Standard 754 for binary floatingpoint arithmetic [»appl_1988]
 Quote: SANE supports extended precision, NaNs, Infinities, unordered comparisons, rounding, and floating point exceptions; no signaling NaNs
 Quote: floatingpoint semantics need to allow efficient implementations with strict error bounds for proving algorithms correct [»goldD6_1992]
 Quote: floating point in Modula3 supports forward error analysis with precisely defined rounding operations and exception handling [»goldD6_1992]
 Subtopic: floatingpoint and optimization
Quote: an optimizer should not rearrange the order of floatingpoint evaluation in any way that changes the computed value or side effects
 Quote: in C, all floating arithmetic is carried out in double precision [»ritcDM7_1978c]
 Subtopic: conversion
Quote: algorithm for printing floatingpoint numbers; as freeformat, generates shortest string that converts to the same result; multiple rounding modes [»burgRG5_1996]
 Quote: efficient algorithm for correctly rounded decimaltobinary conversion; avoids highprecision arithmetic 99.6% of the time [»clinWD6_1990]
 Subtopic: simulating floating point
Quote: use i*j/k for efficient calibration, scaling, and rational approximation; e.g., multiply by pi with an error of 10^7 [»rathED_1996]
 Quote: CORDIC algorithms compute onebitatatime using small lookup tables, right shifts, and additions; represents numbers by alternating series; good for microcontrollers [»pashM9_2000]
 Subtopic: fixed point
Quote: Ada has fixedpoint numbers since commonly used in peripheral devices such as analogtodigital converters [»maclBJ_1987]
 Quote: an Ada fixedpoint constraint gives a range and a maximum delta (the absolute error bound) [»maclBJ_1987]
 QuoteRef: rtl2 ;;fixed point fractions ('x' .lt. 1) with double length intermediates eg big integer fine integers and fine fractions etc
 QuoteRef: clouMJ7_1983 ;;analysis of singleprecision fixed point arithmetic for doing arithmetic means.
 Subtopic: multiprecision arthimetic
Quote: gives a fast O(n^2) algorithm for division of multiprecision floatingpoint numbers; as fast as multiplication; accuracy to machine epsilon [»ozawK3_1991]
 Quote: fast arbitraryprecision addition and multiplication up to a thousand bits; uses floating point numbers; adaptive
 Quote: represent arbitrary precision floating point numbers with multiple, nonoverlapping terms; e.g., 1100  10.1 [»shewJR5_1996]
 Subtopic: arithmetic coding
Quote: arithmetic coding represents a message by an interval of real numbers; allows fractional bits for a symbol [»wittIH6_1987]
 Subtopic: history
Quote: interpreter represents numbers as a 24 bit mantissa and a 6 bit exponent; 8 significant decimal digits [»laniJH1_1954]
 Quote: use inline expansion to extend a machine's order code; use interpretative subroutines to reduce memory; e.g., floating point [»wilkMV_1957]
 Quote: the EDSAC used 1024 numbers of ultrasonic memory; 17 or 35 binary digits from 1 to 1 [»wilkMV_1951]
 Quote: scaling was the most difficult part of programming the EDSAC [»wilkMV_1951]

Related Topics
Group: algorithms (6 topics, 94 quotes)
Group: mathematics (23 topics, 560 quotes)
Topic: continuum in mathematics (7 items)
Topic: discrete vs. continuous (47 items)
Topic: FOCUS number system (8 items)
Topic: geometry (33 items)
Topic: integer values and operations (13 items)
Topic: kinds of numbers (24 items)
Topic: numerical error (19 items)
Topic: science as measurement (36 items)
Topic: type conversion (33 items)
Topic: unbounded precision (9 items)
Topic: units (23 items)
Topic: value as an abstraction (25 items)
