I find that if I have to keep revisiting a subject over and over I haven’t fully understood it. One such subject I have revisited a lot since my university days is the IEEE floating point representations.
By converting numbers to the IEEE float format in example form I hope to put the subject to bed and keep the knowledge lodged in my brain once and for all.
Dusting off (Clements, 2000)[1] Section 4.8.3 IEEE floating point format gives us the representation of a 32-bit floating point number.
| 1 bit | 8 bits | 23 bits |
| S | Biased exponent | Fractional mantissa |
Giving the value as the formula
Where S is the sign, E is the biased exponent, B is the bias to be removed from the exponent and F is the fractional mantissa.
The bias is used so that the exponent can represent negative numbers. With a bias as 127 the exponent can be in the range -126 to +127 (An additional two values are set aside to represent zero and -infinity/+infinity/NaN.
The fractional mantissa is the normalised fraction of the number (the most significant bit and radix point removed). A little explanation;
If we take a decimal value -2345.125 we can normalise this number into scientific notation as such: -2.345125e3. To achieve this algorithmically we can use
(define (log10 n) (/ (log n) (log 10))) (define (normalise10 d) (if (= d 0) (list 0 0) ; Treat zero as a special case! (let ((ex (truncate (log10 (abs d))))) (list (/ d (expt 10 ex)) (inexact->exact ex))))) (normalise10 -2345.125) ;; => (-2.345125 3)
Normalising for base 2 (binary) numbers can use the same method, calling log2 in place of log10
(define (log2 n) (/ (log n) (log 2))) (define (normalise2 d) (if (= d 0) (list 0 0) ; Treat zero as a special case! (let ((ex (truncate (log2 (abs d))))) (list (/ d (expt 2 ex)) (inexact->exact ex))))) (normalise2 -2345.125) ;; => (-1.14508056640625 11) ;; Although we normalise for base 2, the result is ;; displayed in decimal. As a curiosity we can check ;; that the number has been normalised to binary by (string-append (number->string -1.14508056640625 2) "e" (number->string 11 2)) ;; => "-1.00100101001001e1011"
We now have all the required components to make an IEEE float: a sign, a fraction and an exponent. We could write a crude function to package up a float like this
(define (ieee-single d) (let ((v (make-bitvector 32 #f))) (if (not (= d 0)) (begin (let ((bias 127) (normal (normalise2 d))) ;; Write the sign bit into the bitvector. (bitvector-set! v 0 (< d 0)) ;; Write the biased exponent into the bitvector. (let loop ((n (+ bias (cadr normal))) (i 8)) (if (not (= 0 n)) (let ((r (remainder n 2)) (q (quotient n 2))) (bitvector-set! v i (= r 1)) (loop q (- i 1))))) ;; Write the normalised fractional mantissa into the ;; bitvector. We don't need to bother with the "1.", the ;; ieee745 assumes this bit. (let loop ((fraction (- (abs (car normal)) 1)) (i 9)) (if (not (= 0.0 fraction)) (let* ((f (* 2 fraction)) (bit (truncate f))) (bitvector-set! v i (= bit 1)) (loop (- f bit) (+ i 1)))))))) v)) (ieee-single -2345.125) ;; => #*11000101000100101001001000000000 ;; S|--E---||----------M----------|
If a number can’t be exactly represented in an IEEE single precision floating point number the above function will fail. In another post I will discuss this problem and how values are approximated.
[1] The Principles of Computer Hardware, Third Edition. Alan Clements. Oxford University Press.
