Floating point binary representation in javascript

Inspired by this post which explains how computer stores floating point number, Here is a bit of javascript code that print out a float32 number in binary format. Firstly, the main idea is, floating point number is represented similar to scientific representation of numbers using E notation.

\[ 0.07 = 7.0 * 10^{-2} = 7.0E-2 \]

Although, instead of decimal, computer representation uses binary digit. For example with \(0.5 = \frac{1}{2}\):

\[ 0.5 = 2^{-1} = 1.0 * 2^{-1} = 1.0 E_{2} -1 \]

Left of E is significant part. Right of E is exponent part. And the actual bit layout is 1 sign bit, followed by exponent followed by significant. The most significant bit (MSB) of significant part is always assumed to be 1, and is omitted from the representation.

(1 sign bit) - (8 exponent bit) - (23 significant bit)

For the 0.5 example, from the scientific representation above, we could derive the binary representation as below:

Sign bit: 0 for positive number
Exponent = -1. Binary representation = -1 + 127 = 126
Significant = 1.0. Since only the MSB is 1, which is omitted, significant part should have 23 zero bits.

This representation has the advantage of flexible position of dot, (hence the name floating point), with the exponent part control numbe of decimal places, thus trades off between precision (more decimal places) and top value range. Now let’s put out some code to verify this representation. First, to get the underlying byte sequences of a float32 number. (The complete script can be found here)

var buffer = new ArrayBuffer(4);
new DataView(buffer).setFloat32(0, 0.5);
var bytes = new Uint8Array(buffer);

// Expected output: [63, 0, 0, 0]

Assuming big endian, sign bit is obtained by:

// 0 - positive, 1 - negative
var sign = (bytes[0] >> 7) > 0 ? '-': '';

Get exponent

var exp = (bytes[0] << 1) | (bytes[1] >> 7);
// expected: 126 , in binary 01111110
// for float32, minus 127 from raw bits to get signed value
var expVal = exp - 127;
// expected: -1
var expSign = expVal > 0? '': '-';

Get the significant

// the most significant bit is 1 and is ommited
// So we put it back here
var sig = (((bytes[1] & 0x7F) | 0x80) << 16) | (bytes[2] << 8) | bytes[3];

// expected: 8388608 , which is 10...0 (23 zeros)

Putting things together, we have the mathematical binary representation of float literal 0.5. toBinaryStr and toBits are helper functions listed at the bottom of this post.

var mathRep = toBinaryStr(toBits(sig, 24)) + 'E' 
			+ expSign 
			+ toBits(Math.abs(expVal), 8).join('').replace(/^0+/,'');

//output: 1.0E-1

Try it on for another few numbers:

toMathString(0.5);
//1.0E-1 = 2 ** -1
toMathString(0.25);
//1.0E-10 = 2 ** -2
toMathString(0.375);
//1.1E-10 = 2 ** -2 + 2 ** -3 

toMathString(0.15)
//1.0011001100110011001101E-11
toMathString(0.1)
//1.10011001100110011001101E-100

Numbers like 0.5, 0.25, 0.375 is neatly representable in a binary system. On the other hand, numbers like 0.1, 0.15 are rounded. To get rounding error:

//TODO

You can try the codepen at the bottom of this post. Though this post examines single precision floating point number (32-bit), number in javascript is floating point with double precision (64-bit), minor changes are needed to print out the actual binary representation used by javascript VM.

Helper functions

// assuming BigEndian
/**
 * getting bit sequence of an input number
 */
function toBits(n, bitCount) {
	var x = n;
	var bs = [];
	for (var b = 0; b < bitCount; b++) {
		x = n >> b;		
		bs.push(x & 0x01);
	}
	return bs.reverse();
}

/**
 * nicely print a binary number. omit trailing 0 after the dot.
 */
function toBinaryStr(bits) {	
	var suffix = bits.slice(1).join('').replace(/0+$/, '');
	suffix = suffix.length == 0? '0': suffix;
	return bits[0] + '.' + suffix;
}

Reference

https://floating-point-gui.de/formats/fp/

Codepen

See the Pen Floating point binary format by Vu Minh Tam (@vuamitom) on CodePen.