- 1 bit = Sign
- 8 bit = Exponent with a bias of 127
- 23 bit = Mantissa (the 1 before the point is implicit)
How do we extract the bits in C from a floating point number?
- Masking and shifting only works on ints (preferably unsigned ints), not floating point numbers
- You can't just cast a floating point number into a unsigned int; 29.314 will become 29 which is a completely different number
- Instead copy the bits from the floating point number to an unsigned int
Copying the bits from a floating point number to an unsigned int is done by using pointers
- Create an unsigned int pointer to point to the float
- Dereference the pointer
// changing a floating point number to an unsigned int
float f = 29.314;
unsigned int x;
unsigned int *p = (unsigned int *)&f;
x = *p;
CShortcut!
// changing a floating point number to an unsigned int (shorter version)
float f = 29.314;
unsigned int x = *(unsigned int *)&f;
CHow do we extract the sign bit, exponent bit & mantissa bit?
Extracting the sign:
// Shift 31 places to the right (the rightmost position), &1 is the mask
// if the result is 1 the number is negative, if the result is 0 the number is positive
float f = 3.14159;
unsigned int i = *(unsigned int *)&f;
int sign = (i >> 31)&1;
CExtracting the exponent:
The exponent occupies 8 bits starting at bit 23
#define MASK8 0xff
#define EXPSHIFT 23
float f = 3.14159;
unsigned int i = *(unsigned int*)&f;
unsigned int exp = (i >> EXPSHIFT) &MASK8; // This result will be the actual exponent +127
int actual_exp = exp - 127;
CExtracting the mantissa:
- Rightmost 23 bits (no shift needed)
#define MASK23 ((1 << 23) -1)
float f = 3.14159;
unsigned int i = *(unsigned int*)&f;
int mantissa = (*i & MASK23); // bitwise AND with mask to extract mantissa
CAlternatively:
#define MASK23 0x7fffff
float f = 3.14159;
unsigned int i = *(unsigned int*)&f;
int mantissa = (*i & MASK23); // bitwise AND with mask to extract mantissa
C1 << 23:
This creates a bit mask with all bits set to 0 except for the 23 least significant bits, which correspond to the mantissa in IEEE notation.((1 << 23) - 1)
: This subtracts 1 from the bit mask to set all 23 bits to 1.- This contains only the digits (bits) after the points
- Not the implicit 1 before the point
Now, how do we insert the 1 before the point into the mantissa bits?
- The 23 bits already there at positions 0 - 22
- Need to insert. 1 at bit position 23
// After extracting the bits from the above code
mantissa |= (1 << 23); // Insert the 1 before the point
C- To read a bit use &
- To set a bit use |
- To clear a bit use & with an inverse of the mask (using the NOT ~ operator
0b11001100 (the original number)
AND 0b11110111 (the inverse of the mask for the 4th bit)
= 0b11000100 (the result with the 4th bit cleared)
How do we add floating point numbers?
- In scientific notation
Decimal:
- Make the exponents the same and then add the mantissas
- Shift the number with the smaller exponent to the right to make the exponent larger
The result of the addition may have 2 non-zero digits before the "point". If so, normalise by shifting the mantissa of the result to the right by one and add 1 to the exponent
The algorithm for binary floating point addition is the same
- Make the smaller exponent the same as the alrger one
- Add the mantissas
- Normalise the result
Note: If one of the number's is negative subtract the smaller number from the larger number and the sign of the result should be the sign of the larger number
If both of the numbers are negative then add the numbers together and keep the sign
How to multiply floating point numbers?
The sign of the result is just the XOR of the signs of the operands
In binary, multiplication works exactly the same way
Important
- The stored exponent has a bias of 127
- So adding the 2 exponent fields together adds the 2 biases
- Insert the leading 1 ints each mantissa before multiplying them (integer multiplication)
- You may need to re-normalise due to the mantissa having too man