Thursday, April 19, 2018

Bit Twiddling: Understanding Bit Operations

This post explains bit operations and terms: And, Or, Xor, Complement, Shifting & Masking, Unsigned & Signed data types, sign extension, and more!

Bit manipulation operations operate on binary bits, which are either 0 or 1; however, often we don't know whether a bit is 0 or 1, so we will define a notion for single bit variables, which will be a, b, c, d, e, f, g, and, h.  Each of these represents either a 0 or 1.  Strung together, as abcdefgh, these represent each bit in an 8-bit byte of interest.

We'll use such an 8-bit byte as an input to show how the various bit operations produce their outputs.  Using this notation we will be able to see what is happening to the various bits.

A number of the bit manipulation operators are binary operators, meaning they consume two sources, and produce one result.  These operators are: And, Or, Xor, along with a Left-shift, and two Right-shifts.  (Some instruction sets and programming languages also include rotate instructions in addition to the shifts.)

Very often, when we perform bit manipulations we supply for the two input operands one variable, and one predetermined constant.  Of the two operands, the constant is referred to as a mask, and, performing bit manipulation operations using such a constant mask is referred to as masking.  The value of the mask has to do with both the operation we are using and the desired result we want.

Field Extraction

Lets turn to an example.  Let's say that we have a sensor that delivers some bit values as readings, and, that these bit values are packed into a single byte memory location.  For illustrative purposes, let's say there is a one 1-bit value, one 4-bit value, and one 3-bit value.  So, our memory location has a byte that is packed as follows:

    76543210     bit position (in decimal)

    abcdefgh     bit variables for illustration

where a represents the 1-bit field, bcde represents the 4-bit field, and fgh represents the 3-bit field.  For the following examples, we'll concern ourselves with the 4-bit field: bcde.  In order to treat this 4-bit field like any other number, we need to isolate the wanted portions of the byte, and then, right justify it.  Our intent is to go from abcdefgh, a packed byte, to 0000bcde, this field, now unpacked and right justified.  Typically, we'll first read the packed byte into a CPU register and then unpack it there.  These following use a CPU register of 8-bit in width.

Note that for now, we're treating the 4-bit field as an unsigned field.  In the next section will treat it as a signed field, which means that we're interested in an answer of bbbbbcde, instead of 0000bcde.

Unsigned Field Extraction

There are at least 3 techniques to do this unpacking, so we'll describe those.  The first uses a combination of shifts: (1) a Left-shift followed by a Right-shift.  The other two use a combination of (2) Right-Shift followed by a masking And operation, or (3) vice versa.

Approach (1u) — Left-shift, then Right-shift

First, we left-justify the field in our CPU register.  In our case, this means shifting it left by 1 bit, or register high bit number - field start position = 7 - 6 = 1. 

    76543210     bit position (in decimal)

    abcdefgh     our input byte, shown in binary as variables
           1     shift amount in decimal
    -------- <<  logical left shift operation
    bcdefgh0     result

The field we are interested in is now left-justified in our 8-bit CPU register.  This left justification has effectively removed any bits in higher positions than we are interested in.  Next, we shift right to right-justify the field, which requires a shift right of 4 bits, which comes from the field's length: register size - field length = 8 - 4 = 4.

    bcdefgh0     intermediate output/input
           4     shift amount in decimal
    -------- >>  logical right shift operation
    0000bcde     result

Because we want the item right justified when we're done, we do the right shifting last.

Approach (2u) — Right-shift, then masking And

First, the right shift:

    abcdefgh     our input byte, shown in binary as variables
           3     shift amount in decimal
    -------- >>  logical right shift operation
    000abcde     result


Next, the masking And.  The constant mask uses 1's where we want to keep bits from  other input and 0's where we want to ignore bits from the other input.  In this case our mask value is 15 in decimal, but the binary rendition is more illuminating.


    000abcde     intermediate output/input
    00001111     constant mask for And operation
    -------- &   And operation
    0000bcde     result

Approach (3u) — Masking And, then Right-shift

First, the masking And.  The constant mask uses 1's where we want to keep bits from  other input and 0's where we want to ignore bits from the other input.  In this case our mask value is 120 in decimal, but the binary rendition is more illuminating.


    abcdefgh     our input byte, shown in binary as variables
    01111000     constant mask for And operation
    -------- &   And operation
    0bcde000     result

Next, the right shift:

    0bcde000     intermediate output/input
           3     shift amount in decimal
    -------- >>  logical right shift operation
    0000bcde     result


To compare these sequences we have to understand a bit about the processor we're using; however all of these sequences are two instructions on most processors, so roughly comparable.

Some (older) processors shift more slowly the larger the shift count, which would mean that shift-heavy solutions would be slower; however, that isn't usually the case these days.  Certain sequences use shorter immediate (constant) values than others, and this can result in shorter instructions, which is a space and sometimes speed advantage.  For example, Approach (3) uses a constant of 120, for the mask, whereas Approach (4) uses a constant of 15 for the mask.

Signed Field Extraction

For signed field extraction, we need to replicate the field's sign bit across the rest of the CPU register.  This means that where we had 0's in the unsigned field extraction, 0000bcde, we need the sign bit instead: bbbbbcde.  In some sense this places the field's sign bit into the registers sign bit, though for correctness, this bit is not just placed there but replicated to all the rest of the high bits in the register.

The approach here is that we will first left-justify the value in the CPU register using a Left-shift, and then, right justify the value as in (1u) — but this time using an Arithmetic Right-shift, which keeps the sign bit as is, instead of shifting zero bits into the leftmost bit (as with Logical Right-shift).  We need to position the field's sign bit in the registers sign bit, using the left shift, in order to productively use the Arithmetic Right-shift operation for sign extension.  Essentially, we left justify the field in the CPU register, then right justify it using arithmetic shifting.

Approach (1s) — Left-shift, then Right-shift

First, we left-justify the field in our CPU register.  In our case, this means shifting it left by 1 bit, or register high bit number - field start position = 7 - 6 = 1. 

    76543210     bit position (in decimal)

    abcdefgh     our input byte, shown in binary as variables
           1     shift amount in decimal
    -------- <<  logical left shift operation
    bcdefgh0     result

The field we are interested in is now left-justified in our 8-bit CPU register. Next, we shift right to right-justify the field, which requires a shift right of 4 bits, which comes from the field's length: register size - field length = 8 - 4 = 4.

    bcdefgh0     intermediate output/input
           4     shift amount in decimal
    -------- >>  arithmetic right shift operation
    bbbbbcde     result

Masking Operations

Next we'll cover each bit manipulation operation.  Note that we are using constant masks, and that the mask chosen is arbitrary.  Masks can have consecutive 1's (as in 00111100), or mixed 1's and 0's: as in 01011010.

And Operation

The And operation using a constant mask: the result passes through the variable input to the result where the constant mask has a 1, and clears the result where the mask has a 0.

    abcdefgh     our input byte, shown in binary as variables
    01011010     constant mask
    -------- &   And operation
    0b0de0g0     result

This operation is useful for extracting information of interest, while ignoring information that is not of interest.

Or Operation

The Or operation using a constant mask: the result passes through the variable input to the result where the constant mask has a 0, and sets the result where the mask has a 1.

    abcdefgh     our input byte, shown in binary as variables
    01011010     constant mask
    -------- |   And operation
    a1c11f1h     result

This operation is useful for setting (to 1 or true) some bits while leaving other, neighboring bits unaffected.

Xor Operation

The Xor operation using a constant mask: the result passes through the variable input to the result where the constant mask has a 0, and complements the result where the mask has a 1.  Here we're using the notation that A is the complement of a. that is, A = ~aB = ~b, and so forth.

    abcdefgh     our input byte, shown in binary as variables
    01011010     constant mask
    -------- ^   And operation
    aBcDEfGh     result

This operation is useful for toggling some bits while leaving other, neighboring bits unaffected.

Complement Operation

The Complement operation reverses every bit.  It is like using Xor with a constant mask of all 1's

    abcdefgh     our input byte, shown in binary as variables
    -------- ~   Complement operation
    ABCDEFGH     result

This operation is useful for inverting a mask or other value, and also used in negation.

Negation Operation

Most processors have an operation to perform negation on signed numbers, however, the negation construct for 2'c complement representation of signed numbers, is to apply the complement and then add one (increment).

Numeric representation

Note: the terms signed and unsigned refer to data types.  These are ways of looking at variables.  Positive and negative are qualities of numbers of signed data types.  

Inappropriate treatment of variable & storage is a common problem in programming, especially in assembly language.  In assembly language, the programmer determines how to interpret data by each and every selection of instruction to use.  The hardware generally doesn't remember what data type was last used in what memory location or CPU register.  

Accidentally treating some data, for example, as signed when it isn't or vice versa is a very real problem, and high-level languages seek to mitigate these problems by using type systems, which detect problems either at compile time or at runtime but before the program does something really bad.  

In assembly language, however, we don't have these type systems, and sometimes these problems go undetected until values toward the edges of the range of representation are used. 

Unsigned Representation

Unsigned representation allows all bits of a number, register, or field to be used for magnitude.  Let's take a 3-bit number for example:

    3-bit value      in decimal
        000              0
        001              1
        010              2
        011              3
        100              4
        101              5
        110              6
        111              7

Signed Representation

2's complement representation uses numbers with the high bit set to indicate negative numbers.  Here's the same chart for a signed 3-bit value:

    3-bit value      in decimal
        100             -4
        101             -3
        110             -2
        111             -1
        000              0
        001              1
        010              2
        011              3

2's complement has the following qualities:
  • Negation is performed in 2's complement by inverting all the bits, and then adding one.
  • The range allows for negative numbers that are 1 past the maximum positive, so -4 to 3, or for 8-bit bytes -128 to 127, for 16-bit words: -32768 to 32767.
    • That means that negation has a non-zero chance of overflow (and if it happens the value stays the same!).
  • The same hardware circuitry can be used for addition, subtraction, multiplication, and division regardless of whether the operands are signed or unsigned, which made it heavily favored when circuit capacity was limited (as with initial microprocessors).
    • The advent of the computer chip, while a huge step forward in speed and other aspects actually placed a new (but constantly growing) hard limit on the amount of logic that could be provided on chip, whereas before cost was the main limiting factor in the amount of logic provided as individual transistors & diodes.
  • Detection of overflow for these arithmetic operations is different for signed vs. unsigned.  For example,
    • For unsigned addition, a carry out of the most significant bit indicates overflow.  
    • For signed addition, if both inputs have the same sign, and the result doesn't, it is overflow.  All other conditions either cannot overflow or indicate no overflow.
We can compare with 1's complement, which was used in the 50's and 60's before computer makers settled on 2's complement.  1's complement had the following qualities:
  • The same range of negative numbers as positive numbers.
  • Both a positive zero, and negative zero.
    • Some computers automatically converted negative 0 to positive 0 after each arithmetic operation
  • Negation is just the complement (no subsequent increment required).
  • Different hardware is required for signed vs. unsigned for all arithmetic operations

Shifting Operations

In these examples, we shift by only one bit.  Though modern hardware can shift by multiple bits in one cycle, a shift result for a higher shift count can still be thought of as multiple consecutive 1-bit shifts.

Note that some hardware has anomalies when using a shift count that is equal to or larger than the CPU register size, or when using a variable shift amount that turns out to be zero.  (I view these as unfortunate inconsistencies, shifting by zero, for example, should simply return the input value rather than something unexpected).

Left Shift

The Left-shift operation shifts the input by the shift count, and inserts zeros into vacated bits on the right.

    abcdefgh     our input byte, shown in binary as variables
           1     shift count in decimal
    -------- <<  Left shift operation
    bcdefgh0     result

This shift operation is also the same as multiplication by a power of 2, where the power is the shift count.  This is a more efficient way to multiple by such values, if you know in advance you're dealing with a power of 2.

Logical Right-shift

The logical Right-shift operation shifts the input by the shift count, and inserts zeros into vacated bits on the left.

    abcdefgh     our input byte, shown in binary as variables
           1     shift count in decimal
    -------- >>  Logical Left shift operation
    0abcdefg     result

This also divides by a power of 2, where the power of 2 is the shift count.

Arithmetic Right-shift

The arithmetic Right-shift operation shifts the input by the shift count, and inserts copies of the sign bit into vacated bits on the left.

    abcdefgh     our input byte, shown in binary as variables
           1     shift count in decimal
    -------- >>  Logical Left shift operation
    aabcdefg     result

Of particular note here is that the sign bit is kept and/or replicated depending on how we look at it.  Larger shift counts will result in a string of aaa's starting with the high bit.  This is the result we want in order to properly sign extend a smaller value to the size of a CPU register, for example.

This shift operation is similar to a divide by a power of 2 but has an issue with an input value of -1 as we would expect the result of -1/2 to be 0 when truncated to integer, but using arithmetic shifting it simply stays as -1 instead of going to 0.

Dynamic Mask Generation

1 << n

With 0 <= n <= register size (e.g. 32), this construct computes a value with a single bit set.  Here's a table of 1<<n for n:

      1 << n      n      1 << n (decimal)
     00000001     0           1
     00000010     1           2
     00000100     2           4
     00001000     3           8
     00010000     4          16
     00100000     5          32
     01000000     6          64 
     10000000     7         128

Notice that the 1's march across the word.

(1 << n) - 1

This construct generates a set of consecutive 1's.

   (1 << n) - 1   n    (1 << n) - 1 (decimal)
     00000000     0           0
     00000001     1           1
     00000011     2           3
     00000111     3           7
     00001111     4          15
     00011111     5          31
     00111111     6          63 
     01111111     7         127

((1 << n) - 1) << m

Let's say that m is one.   This generates a bit mask of consecutive ones at a position other than right justified.

   ((1<<n)-1)<<1  n
     00000000     0
     00000010     1
     00000110     2
     00001110     3
     00011110     4
     00111110     5
     01111110     6
     11111110     7

The complement of each of these can form a useful mask:

~ (1 << n)

     ~1 << n      n      
     11111110     0
     11111101     1
     11111011     2
     11110111     3
     11101111     4
     11011111     5
     10111111     6
     01111111     7

~((1 << n) - 1)

  ~((1 << n)-1)   n 
     11111111     0
     11111110     1     11111100     2     11111000     3     11110000     4     11100000     5     11000000     6     10000000     7


~(((1 << n) - 1) << m)

 ~(((1<<n)-1)<<1) n
     11111111     0
     11111101     1
     11111001     2
     11110001     3
     11100001     4
     11000001     5
     10000001     6
     00000001     7

Using dynamic mask notation for constant mask creation

In C and other languages, you can use the above dynamic mask generation with constant values for n.  That can make it easy to generate masks of interest.  For example, a 4-bit mask (e.g. for the low nibble) can be generated using the constant (1 << 4)-1.  A mask for the upper nibble of a byte is (the former) << 4.

Other Operations

Some CPUs have direct signed and/or unsigned field extraction instructions, which can replace the above sequences.

Many CPUs also have sign extension and zero extension operations for 8-bit and 16-bit inputs to convert them to 32-bit values.

Older CPUs may have only 1-bit shift operations.  In such cases, shift operations need to be used in sequence to obtain the desired shift.  If a processor is absent a variable shift amount, a loop of 1-bit shifting can be used.


See Also https://graphics.stanford.edu/~seander/bithacks.html

This post first appear on Erik Eidt's "Discussions on Programming Models" at https://erikeidt.blogspot.com/2018/04/bit-twiddling-understanding-bit.html