Python Course #5: Adding and Subtracting Binary Numbers
After you have learned how to convert decimal numbers into the binary system in Python Course #4: Binary Numbers, you will now see how to add and subtract binary numbers and how to represent negative numbers in the binary system.
Adding Binary Numbers
Adding two binary numbers works in the same way as you would add two decimal numbers on paper. To add two decimal numbers by hand, you first write one number over the other such that the digit positions align:
Then you start adding each digit position individually. In this example, the first digits are 5 and 9, which results in 14. The 4 will be written under the two digits that you just added together. The 1 of the 14 is added to the result of the next digits:
\[\begin & 2345 \\ & 189 \\ +& \quad_1\hspace\\\hline & 4 \end\]
The next column of digits is 4, 8, and 1, which sums up to 13. Again the 3 is written under the digits just added, and the one is carried over to the next digit:
\[\begin & 2345 \\ & 189 \\ +& \quad_1\hspace_1\hspace\\\hline & 34 \end\]
Repeat this procedure until you run out of digits to add together:
\[\begin & 2345 \\ & 189 \\ +& \quad_1\hspace_1\hspace\\\hline & 2534 \end\]
In the end, you can read the result at the bottom of your calculation.
Adding two binary numbers together works the same way, but instead of using the \(+\) you use XOR \(\oplus\) to add the digits together. As introduced in Python Course #3: Introduction to Boolean Algebra, the \(\oplus\) is only 1 if the operands have different values. Let’s add the numbers 10 and 12 in binary together. First you have to convert 10 and 12 to binary which is \(10_ = 1010_2\) and \(12_ = 1100_2\). Then write one number under the other one such that the bits align:
\[\begin & 1010 \\ \oplus& 1100 \\\hline \end\]
Now start with adding the first two digits on left \(0 \oplus 0 = 0\):
\[\begin & 1010 \\ \oplus& 1100 \\\hline & 0 \end\]
Then add the second and the third column, which will both result in 1 because each time the digits are different:
\[\begin & 1010 \\ \oplus& 1100 \\\hline & 110 \end\]
In the last column are two 1 which results in 0. However, when two 1 are added together, they will generate a carry-bit. Because \(1_ + 1_ = 2_ = 10_2\). The carry-bit is written under the next column of bits:
\[\begin & 1010 \\ & 1100 \\\ \oplus& \hspace_1\quad\quad \\\hline & 0110 \end\]
This carry-bit is then added together with two zeros (leading zeros are not written down) \(0\oplus 0 \oplus 1 = 1\):
\[\begin & 1010 \\ & 1100 \\\ \oplus& \hspace_1\quad\quad \\\hline & 10110 \end\]
The result of this addition is \(10110_2 = 22_ = 10_ + 12_\). You can also check that the result is correct with the following Python code:
x = 0b1010 y = 0b1100 z = x + y print(f"z> = z:b>")
Negative Binary Numbers
Before you can start subtracting binary numbers, let’s first talk about representing negative values in the binary system.
Ones’ Complement
The ones’ complement or just complement of a binary number is the negation of every bit. When you remember Python Course #3: Introduction to Boolean Algebra, the negation \(\neg\) will turn every 0 into 1 and every 1 into a 0. This ones’ complement can be used as a negative representation of a binary number. It allows you to represent numbers ranging from \(-(2^-1)\) to \(2^-1\) where \(N\) is the number of bits (or digits) of a binary number. Let’s take a look at 8-bit numbers:
Here you can spot one very odd thing, and that is negative zero. This negative zero is because the ones’ complement states that the negative of a number is every bit negated. A number system involving two representations of one number leads to problems because exceptional cases have to be respected for operations involving this number.
Twos’ Complement
Another way of representing negative numbers in binary is the twos’ complement. The twos’ complement takes the ones’ complement of a number and then adds a one. Before you can get the negative number for \(10_ = 1010_2\) lets think about the number of bits that are necessary to represent all values. With \(N\) bits \(2^N\) values can be represented. When only postive numbers are used the numbers from \(0\) to \(2^N\) can be displayed in binary using \(N\) bits. However, when you only got \(N\) bits, a part of the \(2^N\) representable values must be used for negative numbers. For the ones’ and twos’ complement representation, half of the values are used for negative numbers. For example, if you got 4 bits you can repesent positive values from \(0\) to \(7_ = 0111_2\). So if you want to represent values up to \(15_ = 01111_2\) you have to add a bit.
Now it’s time to get the twos’ complement negative representation of \(10_ = 01010_2\):
\[\begin \neg & 01010 \\\hline & 00101 \\ \oplus& 1 \\\hline & 10110 \end\]
Adding a one to the ones’ complement avoids a negative 0 from the ones’ complement. When you got 5 bits to store your numbers, zero is represented as \(00000_2\) negating all bits results in \(11111_2\) and when you now add a single one to it and discard all bits that come after the fifth position, you will get \(00000_2\) again:
\[\begin \neg & 00000 \\\hline & 11111 \\ & 1 \\ \oplus& \hspace_1\hspace_1\hspace_1\hspace_1\hspace_1\hspace \\\hline & \cancel00000 \end\]
This means the “negative” representation of 0 in the twos’ complement is the zero itself.
Now take a look at the 8-bit numbers in two’s complement:
Adding a one to any binary number in twos’ complement will result in the correct successor number, which wasn’t the case for the ones’ complement due to the negative zero. Another nifty property of the twos’ complement is that the last bit on the left-hand side, also called the most significant bit (MSB), of every negative number is 1. This means you can identify a negative number by only looking at the MSB. The twos’ complement can represent the numbers from \(-2^\) to \(2^-1\) which is one number more than the ones’ complement.
Subtracting Binary Numbers
Now that you know how to represent negative numbers in the binary system with the twos’ complement, subtracting numbers binary numbers from one another can be achieved pretty easily. Because subtraction can be interpreted as addition with a negative number:
To subtract \(12_ = 01100_2\) from \(10_ = 01010_2\) in binary you first have to convert \(12_ = 01100_2\) into its twos’ complement (carry-bits not shown):
\[\begin \neg & 01100 \\\hline & 10011 \\ \oplus& 1 \\\hline & 10100 \end\]
Now the twos’ complement of \(12_\) which is \(10100_2\) can be added to \(10_ = 01010_2\) (carry-bits not depicted):
\[\begin & 01010 \\ \oplus& 10100 \\\hline & 11110 \end\]
The result of this addition is a negative number because the MSB is 1. But how do you figure out what negative number? You get the twos’ complement of it because:
The twos’ complement of \(11110_2\) is \(00010_2 = 2_\).
Subtracting binary numbers from each other concludes this article. You’ve learned how to add binary numbers using \(\oplus\), how to represent negative numbers with the twos’ complement, and how to subtract binary numbers from each other. I encourage you to add and subtract binary numbers to get more practice because doing things yourself is one of the best ways to learn.
If you have any questions about this article, feel free to join our Discord community to ask them over there.
Python representation of negative integers
I am having trouble figuring out how Python represents negative integers, and therefore how bit operations work. It is my understanding that Python attempts to emulate two’s complement, but with any number of bits. Therefore, it is common to use 32-bit masks to force Python to set a standard size on integers before bit operations. As you can see in my example, -4 & 0xFFFFFFFF yields a large positive number. Why does Python seem to read this as an unsigned integer, instead of a two’s complement negative number? Later, the operation ~(x ^ mask) , which should yield the exact same two’s complement bit pattern as the large positive, instead gives -4 . What causes the conversion to a signed int? Thanks!
When you’re dealing with arbitrary-length 2’s-complement integers, the sign bit is notionally an infinite distance to the left. AND with a positive number will always give a positive result, because none of the 1 bits in that positive number can possibly be the sign bit.
@jasonharper so anytime I work with a negative integer, I should treat it as if it extends infinitely to the left? And using a 32-bit mask is just to make it easier to think about as long as I unmask it in the end?
note: it is not how CPython represents int objects internally. It uses sys.int_info.bits_per_digit (30) bits per digit.
A Python 3 integer is represented internally as an array of digits of base 2^30 with the sign stored separately. That is how it does integer arithmetic of arbitrary precision. So internally it is quite different from a C long or long long .
1 Answer 1
TLDR; CPython integer type stores the sign in a specific field of a structure. When performing a bitwise operation, CPython replaces negative numbers by their two’s complement and sometimes (!) performs the reverse operation (ie replace the two’s complements by negative numbers).
Bitwise operations
The internal representation of an integer is a PyLongObject struct, that contains a PyVarObject struct. (When CPython creates a new PyLong object, it allocates the memory for the structure and a trailing space for the digits.) What matter here is that the PyLong is sized: the ob_size field of the PyVarObject embedded struct contains the size (in digits) of the integer (digits are either 15 or 30 bits digits). If the integer is negative then this size is minus the number of digits .
As you see, the inner CPython’s representation of an integer is really far from the usual binary representation. Yet CPython has to provide bitwise operations for various purposes. Let’s take a look at the comments in the code:
static PyObject * long_bitwise(PyLongObject *a, char op, /* '&', '|', '^' */ PyLongObject *b) < /* Bitwise operations for negative numbers operate as though on a two's complement representation. So convert arguments from sign-magnitude to two's complement, and convert the result back to sign-magnitude at the end. */ /* If a is negative, replace it by its two's complement. */ /* Same for b. */ /* Complement result if negative. */ >
To handle negative integers in bitwise operations, CPython use the two’s complement (actually, that’s a two’s complement digit by digit, but I don’t go into the details). But note the «Sign Rule» (name is mine): the sign of the result is the bitwise operator applied to the signs of the numbers. More precisely, the result is negative if nega negb == 1 , ( negx = 1 for negative, 0 for positive). Simplified code:
Binary formatting
On the other hand, the formatter does not perform the two’s complement, even in binary representation: format_long_internal calls long_format_binary and remove the two leading characters, but keeps the sign. See the code:
/* Is a sign character present in the output? If so, remember it and skip it */ if (PyUnicode_READ_CHAR(tmp, inumeric_chars) == '-')
The long_format_binary function does not perform any two’s complement: just output the number in base 2, preceded by the sign.
Your question
I will follow your REPL sequence:
>>> x = -4 >>> print("<> ".format(x, x)) -4 -100
Nothing surprising, given that there is no two’s complement in formatting, but a sign.
>>> mask = 0xFFFFFFFF >>> print("<> ".format(x & mask, x & mask)) 4294967292 11111111111111111111111111111100
The number -4 is negative. Hence, it is replaced by its two’s complement before the logical and, digit by digit. You expected that the result will be turned into a negative number, but remember the «Sign Rule»:
>>> nega=1; negb=0 >>> nega & negb 0
Hence: 1. the result does not have the negative sign; 2. the result is not complemented to two. Your result is compliant with the «Sign Rule», even if this rule doesn’t seem very intuitive.
>>> x = 0b11111111111111111111111111111100 >>> print("<> ".format(x, x)) 4294967292 11111111111111111111111111111100 >>> print("<> ".format(~(x ^ mask), ~(x ^ mask))) -4 -100
Again, -4 is negative, hence replaced by it’s two’s complement 0b11111111111111111111111111111100 , then XORed with 0b11111111111111111111111111111111 . The result is 0b11 ( 3 ). You take the complement unary, that is 0b11111111111111111111111111111100 again, but this time the sign is negative:
>>> nega=1; negb=0 >>> nega ^ negb 1
Therefore, the result is complemented and gets the negative sign, as you expected.
Conclusion: I guess there was no perfect solution to have arbitrary long signed number and provide bitwise operations, but the documentation is not really verbose on the choices that were made.