rounding -------- arithmetic operations on fl. pt. values compute results that cannot be represented in the given amount of precision. So, we round results. There are MANY ways of rounding. They each have "correct" uses, and exist for different reasons. The goal in a computation is to have the computer round such that the end result is as "correct" as possible. There are even arguments as to what is really correct. First, how do we get more digits (bits) than were in the representation? The IEEE standard requires the use of 3 extra bits of less significance than the 24 bits (of mantissa) implied in the single precision representation. mantissa format plus extra bits: 1.XXXXXXXXXXXXXXXXXXXXXXX 0 0 0 ^ ^ ^ ^ ^ | | | | | | | | | - sticky bit (s) | | | - round bit (r) | | - guard bit (g) | - 23 bit mantissa from a representation - hidden bit This is the format used internally (on many, not all processors) for a single precision floating point value. When a mantissa is to be shifted in order to align radix points, the bits that fall off the least significant end of the mantissa go into these extra bits (guard, round, and sticky bits). These bits can also be set by the normalization step in multiplication, and by extra bits of quotient (remainder?) in division. The guard and round bits are just 2 extra bits of precision that are used in calculations. The sticky bit is an indication of what is/could be in lesser significant bits that are not kept. If a value of 1 ever is shifted into the sticky bit position, that sticky bit remains a 1 ("sticks" at 1), despite further shifts. Example: mantissa from representation, 11000000000000000000100 must be shifted by 8 places (to align radix points) g r s Before first shift: 1.11000000000000000000100 0 0 0 After 1 shift: 0.11100000000000000000010 0 0 0 After 2 shifts: 0.01110000000000000000001 0 0 0 After 3 shifts: 0.00111000000000000000000 1 0 0 After 4 shifts: 0.00011100000000000000000 0 1 0 After 5 shifts: 0.00001110000000000000000 0 0 1 After 6 shifts: 0.00000111000000000000000 0 0 1 After 7 shifts: 0.00000011100000000000000 0 0 1 After 8 shifts: 0.00000001110000000000000 0 0 1 The IEEE standard for floating point arithmetic requires that the programmer be allowed to choose 1 of 4 methods for rounding: Method 1. round toward 0 (also called truncation) figure out how many bits (digits) are available. Take that many bits (digits) for the result and throw away the rest. This has the effect of making the value represented closer to 0.0 example in decimal: .7783 if 3 decimal places available, .778 if 2 decimal places available, .77 if 1 decimal place available, .7 examples in binary, where only 2 bits are available to the right of the radix point: (underlined value is the representation chosen) 1.1101 | 1.11 | 10.00 ------ 1.001 | 1.00 | 1.01 ----- -1.1101 | -10.00 | -1.11 ------ -1.001 | -1.01 | -1.00 ----- With results from floating point calculations that generate guard, round, and sticky bits, just leave them off. Note that this is VERY easy to implement! examples in the floating point format with guard, round and sticky bits: g r s 1.11000000000000000000100 0 0 0 1.11000000000000000000100 (mantissa used) 1.11000000000000000000110 1 1 0 1.11000000000000000000110 (mantissa used) 1.00000000000000000000111 0 1 1 1.00000000000000000000111 (mantissa used) Method 2. round toward positive infinity regardless of the value, round towards +infinity. example in decimal: 1.23 if 2 decimal places, 1.3 -2.86 if 2 decimal places, -2.8 examples in binary, where only 2 bits are available to the right of the radix point: 1.1101 | 1.11 | 10.00 ------ 1.001 | 1.00 | 1.01 ----- examples in the floating point format with guard, round and sticky bits: g r s 1.11000000000000000000100 0 0 0 1.11000000000000000000100 (mantissa used, exact representation) 1.11000000000000000000100 1 0 0 1.11000000000000000000101 (rounded "up") -1.11000000000000000000100 1 0 0 -1.11000000000000000000100 (rounded "up") 1.11000000000000000000001 0 1 0 1.11000000000000000000010 (rounded "up") 1.11000000000000000000001 0 0 1 1.11000000000000000000010 (rounded "up") Method 3. round toward negative infinity regardless of the value, round towards -infinity. example in decimal: 1.23 if 2 decimal places, 1.2 -2.86 if 2 decimal places, -2.9 examples in binary, where only 2 bits are available to the right of the radix point: 1.1101 | 1.11 | 10.00 ------ 1.001 | 1.00 | 1.01 ----- examples in the floating point format with guard, round and sticky bits: g r s 1.11000000000000000000100 0 0 0 1.11000000000000000000100 (mantissa used, exact representation) 1.11000000000000000000100 1 0 0 1.11000000000000000000100 (rounded "down") -1.11000000000000000000100 1 0 0 -1.11000000000000000000101 (rounded "down") Method 4. round to nearest use representation NEAREST to the desired value. This works fine in all but 1 case: where the desired value is exactly half way between the two possible representations. The half way case: 1000... to the right of the number of digits to be kept, then round toward nearest uses the representation that has zero as its least significant bit. Examples: 1.1111 (1/4 of the way between, one is nearest) | 1.11 | 10.00 ------ 1.1101 (1/4 of the way between, one is nearest) | 1.11 | 10.00 ------ 1.001 (the case of exactly halfway between) | 1.00 | 1.01 ----- -1.1101 (1/4 of the way between, one is nearest) | -10.00 | -1.11 ------ -1.001 (the case of exactly halfway between) | -1.01 | -1.00 ----- NOTE: this is a bit different than the "round to nearest" algorithm (for the "tie" case, .5) learned in elementary school for decimal numbers. examples in the floating point format with guard, round and sticky bits: g r s 1.11000000000000000000100 0 0 0 1.11000000000000000000100 (mantissa used, exact representation) 1.11000000000000000000000 1 1 0 1.11000000000000000000001 1.11000000000000000000000 0 1 0 1.11000000000000000000000 1.11000000000000000000000 1 1 1 1.11000000000000000000001 1.11000000000000000000000 0 0 1 1.11000000000000000000000 1.11000000000000000000000 1 0 0 (the "halfway" case) 1.11000000000000000000000 (lsb is a zero) 1.11000000000000000000001 1 0 0 (the "halfway" case) 1.11000000000000000000010 (lsb is a zero) A complete example of addition, using rounding. S E F 1 10000000 11000000000000000011111 + 1 10000010 11100000000000000001001 ------------------------------------------- First, align the radix points by shifting the top value's mantissa 2 places to the right (increasing the exponent by 2) S E mantissa (+h.b) g r s 1 10000000 1.11000000000000000011111 0 0 0 (before shifting) 1 10000001 0.11100000000000000001111 1 0 0 (after 1 shift) 1 10000010 0.01110000000000000000111 1 1 0 (after 2 shifts) Add mantissas 1.11100000000000000001001 0 0 0 + 0.01110000000000000000111 1 1 0 -------------------------------------- 10.01010000000000000010000 1 1 0 This must now be put back in the normalized form, E mantissa g r s 10000010 10.01010000000000000010000 1 1 0 (shift mantissa right by 1 place, causing the exponent to increase by 1) 10000011 1.00101000000000000001000 0 1 1 S E mantissa g r s 1 10000011 1.00101000000000000001000 0 1 1 Now, we round. If round toward zero, 1 10000011 1.00101000000000000001000 giving a representation of 1 10000011 00101000000000000001000 in hexadecimal: 1100 0001 1001 0100 0000 0000 0000 1000 0x c 1 9 4 0 0 0 8 If round toward +infinity, 1 10000011 1.00101000000000000001000 giving a representation of 1 10000011 00101000000000000001000 in hexadecimal: 1100 0001 1001 0100 0000 0000 0000 1000 0x c 1 9 4 0 0 0 8 If round toward -infinity, 1 10000011 1.00101000000000000001001 giving a representation of 1 10000011 00101000000000000001001 in hexadecimal: 1100 0001 1001 0100 0000 0000 0000 1001 0x c 1 9 4 0 0 0 9 If round to nearest, 1 10000011 1.00101000000000000001000 giving a representation of 1 10000011 00101000000000000001000 in hexadecimal: 1100 0001 1001 0100 0000 0000 0000 1000 0x c 1 9 4 0 0 0 8 A diagram to help with rounding when doing floating point multiplication. [ ] X [ ] -------------------- [ ] [ ][ ][ ] result in the g r combined to given produce precision a sticky bit ------------------------------------------------------------------------------ Downloaded from http://www.cs.wisc.edu/~cs354-1/cs354/karen.notes/rounding.html