Thursday, April 19, 2018

Bit Twiddling: Understanding Bit Operations

This post explains bit operations and terms: And, Or, Xor, Complement, Shifting & Masking, Unsigned & Signed data types, sign extension, and more!

Bit manipulation operations operate on binary bits, which are either 0 or 1; however, often we don't know whether a bit is 0 or 1, so we will define a notion for single bit variables, which will be a, b, c, d, e, f, g, and, h.  Each of these represents either a 0 or 1.  Strung together, as abcdefgh, these represent each bit in an 8-bit byte of interest.

We'll use such an 8-bit byte as an input to show how the various bit operations produce their outputs.  Using this notation we will be able to see what is happening to the various bits.

A number of the bit manipulation operators are binary operators, meaning they consume two sources, and produce one result.  These operators are: And, Or, Xor, along with a Left-shift, and two Right-shifts.  (Some instruction sets and programming languages also include rotate instructions in addition to the shifts.)

Very often, when we perform bit manipulations we supply for the two input operands one variable, and one predetermined constant.  Of the two operands, the constant is referred to as a mask, and, performing bit manipulation operations using such a constant mask is referred to as masking.  The value of the mask has to do with both the operation we are using and the desired result we want.

Field Extraction

Lets turn to an example.  Let's say that we have a sensor that delivers some bit values as readings, and, that these bit values are packed into a single byte memory location.  For illustrative purposes, let's say there is a one 1-bit value, one 4-bit value, and one 3-bit value.  So, our memory location has a byte that is packed as follows:

    76543210     bit position (in decimal)

    abcdefgh     bit variables for illustration

where a represents the 1-bit field, bcde represents the 4-bit field, and fgh represents the 3-bit field.  For the following examples, we'll concern ourselves with the 4-bit field: bcde.  In order to treat this 4-bit field like any other number, we need to isolate the wanted portions of the byte, and then, right justify it.  Our intent is to go from abcdefgh, a packed byte, to 0000bcde, this field, now unpacked and right justified.  Typically, we'll first read the packed byte into a CPU register and then unpack it there.  These following use a CPU register of 8-bit in width.

Note that for now, we're treating the 4-bit field as an unsigned field.  In the next section will treat it as a signed field, which means that we're interested in an answer of bbbbbcde, instead of 0000bcde.

Unsigned Field Extraction

There are at least 3 techniques to do this unpacking, so we'll describe those.  The first uses a combination of shifts: (1) a Left-shift followed by a Right-shift.  The other two use a combination of (2) Right-Shift followed by a masking And operation, or (3) vice versa.

Approach (1u) — Left-shift, then Right-shift

First, we left-justify the field in our CPU register.  In our case, this means shifting it left by 1 bit, or register high bit number - field start position = 7 - 6 = 1. 

    76543210     bit position (in decimal)

    abcdefgh     our input byte, shown in binary as variables
           1     shift amount in decimal
    -------- <<  logical left shift operation
    bcdefgh0     result

The field we are interested in is now left-justified in our 8-bit CPU register.  This left justification has effectively removed any bits in higher positions than we are interested in.  Next, we shift right to right-justify the field, which requires a shift right of 4 bits, which comes from the field's length: register size - field length = 8 - 4 = 4.

    bcdefgh0     intermediate output/input
           4     shift amount in decimal
    -------- >>  logical right shift operation
    0000bcde     result

Because we want the item right justified when we're done, we do the right shifting last.

Approach (2u) — Right-shift, then masking And

First, the right shift:

    abcdefgh     our input byte, shown in binary as variables
           3     shift amount in decimal
    -------- >>  logical right shift operation
    000abcde     result


Next, the masking And.  The constant mask uses 1's where we want to keep bits from  other input and 0's where we want to ignore bits from the other input.  In this case our mask value is 15 in decimal, but the binary rendition is more illuminating.


    000abcde     intermediate output/input
    00001111     constant mask for And operation
    -------- &   And operation
    0000bcde     result

Approach (3u) — Masking And, then Right-shift

First, the masking And.  The constant mask uses 1's where we want to keep bits from  other input and 0's where we want to ignore bits from the other input.  In this case our mask value is 120 in decimal, but the binary rendition is more illuminating.


    abcdefgh     our input byte, shown in binary as variables
    01111000     constant mask for And operation
    -------- &   And operation
    0bcde000     result

Next, the right shift:

    0bcde000     intermediate output/input
           3     shift amount in decimal
    -------- >>  logical right shift operation
    0000bcde     result


To compare these sequences we have to understand a bit about the processor we're using; however all of these sequences are two instructions on most processors, so roughly comparable.

Some (older) processors shift more slowly the larger the shift count, which would mean that shift-heavy solutions would be slower; however, that isn't usually the case these days.  Certain sequences use shorter immediate (constant) values than others, and this can result in shorter instructions, which is a space and sometimes speed advantage.  For example, Approach (3) uses a constant of 120, for the mask, whereas Approach (4) uses a constant of 15 for the mask.

Signed Field Extraction

For signed field extraction, we need to replicate the field's sign bit across the rest of the CPU register.  This means that where we had 0's in the unsigned field extraction, 0000bcde, we need the sign bit instead: bbbbbcde.  In some sense this places the field's sign bit into the registers sign bit, though for correctness, this bit is not just placed there but replicated to all the rest of the high bits in the register.

The approach here is that we will first left-justify the value in the CPU register using a Left-shift, and then, right justify the value as in (1u) — but this time using an Arithmetic Right-shift, which keeps the sign bit as is, instead of shifting zero bits into the leftmost bit (as with Logical Right-shift).  We need to position the field's sign bit in the registers sign bit, using the left shift, in order to productively use the Arithmetic Right-shift operation for sign extension.  Essentially, we left justify the field in the CPU register, then right justify it using arithmetic shifting.

Approach (1s) — Left-shift, then Right-shift

First, we left-justify the field in our CPU register.  In our case, this means shifting it left by 1 bit, or register high bit number - field start position = 7 - 6 = 1. 

    76543210     bit position (in decimal)

    abcdefgh     our input byte, shown in binary as variables
           1     shift amount in decimal
    -------- <<  logical left shift operation
    bcdefgh0     result

The field we are interested in is now left-justified in our 8-bit CPU register. Next, we shift right to right-justify the field, which requires a shift right of 4 bits, which comes from the field's length: register size - field length = 8 - 4 = 4.

    bcdefgh0     intermediate output/input
           4     shift amount in decimal
    -------- >>  arithmetic right shift operation
    bbbbbcde     result

Masking Operations

Next we'll cover each bit manipulation operation.  Note that we are using constant masks, and that the mask chosen is arbitrary.  Masks can have consecutive 1's (as in 00111100), or mixed 1's and 0's: as in 01011010.

And Operation

The And operation using a constant mask: the result passes through the variable input to the result where the constant mask has a 1, and clears the result where the mask has a 0.

    abcdefgh     our input byte, shown in binary as variables
    01011010     constant mask
    -------- &   And operation
    0b0de0g0     result

This operation is useful for extracting information of interest, while ignoring information that is not of interest.

Or Operation

The Or operation using a constant mask: the result passes through the variable input to the result where the constant mask has a 0, and sets the result where the mask has a 1.

    abcdefgh     our input byte, shown in binary as variables
    01011010     constant mask
    -------- |   And operation
    a1c11f1h     result

This operation is useful for setting (to 1 or true) some bits while leaving other, neighboring bits unaffected.

Xor Operation

The Xor operation using a constant mask: the result passes through the variable input to the result where the constant mask has a 0, and complements the result where the mask has a 1.  Here we're using the notation that A is the complement of a. that is, A = ~aB = ~b, and so forth.

    abcdefgh     our input byte, shown in binary as variables
    01011010     constant mask
    -------- ^   And operation
    aBcDEfGh     result

This operation is useful for toggling some bits while leaving other, neighboring bits unaffected.

Complement Operation

The Complement operation reverses every bit.  It is like using Xor with a constant mask of all 1's

    abcdefgh     our input byte, shown in binary as variables
    -------- ~   Complement operation
    ABCDEFGH     result

This operation is useful for inverting a mask or other value, and also used in negation.

Negation Operation

Most processors have an operation to perform negation on signed numbers, however, the negation construct for 2'c complement representation of signed numbers, is to apply the complement and then add one (increment).

Numeric representation

Note: the terms signed and unsigned refer to data types.  These are ways of looking at variables.  Positive and negative are qualities of numbers of signed data types.  

Inappropriate treatment of variable & storage is a common problem in programming, especially in assembly language.  In assembly language, the programmer determines how to interpret data by each and every selection of instruction to use.  The hardware generally doesn't remember what data type was last used in what memory location or CPU register.  

Accidentally treating some data, for example, as signed when it isn't or vice versa is a very real problem, and high-level languages seek to mitigate these problems by using type systems, which detect problems either at compile time or at runtime but before the program does something really bad.  

In assembly language, however, we don't have these type systems, and sometimes these problems go undetected until values toward the edges of the range of representation are used. 

Unsigned Representation

Unsigned representation allows all bits of a number, register, or field to be used for magnitude.  Let's take a 3-bit number for example:

    3-bit value      in decimal
        000              0
        001              1
        010              2
        011              3
        100              4
        101              5
        110              6
        111              7

Signed Representation

2's complement representation uses numbers with the high bit set to indicate negative numbers.  Here's the same chart for a signed 3-bit value:

    3-bit value      in decimal
        100             -4
        101             -3
        110             -2
        111             -1
        000              0
        001              1
        010              2
        011              3

2's complement has the following qualities:
  • Negation is performed in 2's complement by inverting all the bits, and then adding one.
  • The range allows for negative numbers that are 1 past the maximum positive, so -4 to 3, or for 8-bit bytes -128 to 127, for 16-bit words: -32768 to 32767.
    • That means that negation has a non-zero chance of overflow (and if it happens the value stays the same!).
  • The same hardware circuitry can be used for addition, subtraction, multiplication, and division regardless of whether the operands are signed or unsigned, which made it heavily favored when circuit capacity was limited (as with initial microprocessors).
    • The advent of the computer chip, while a huge step forward in speed and other aspects actually placed a new (but constantly growing) hard limit on the amount of logic that could be provided on chip, whereas before cost was the main limiting factor in the amount of logic provided as individual transistors & diodes.
  • Detection of overflow for these arithmetic operations is different for signed vs. unsigned.  For example,
    • For unsigned addition, a carry out of the most significant bit indicates overflow.  
    • For signed addition, if both inputs have the same sign, and the result doesn't, it is overflow.  All other conditions either cannot overflow or indicate no overflow.
We can compare with 1's complement, which was used in the 50's and 60's before computer makers settled on 2's complement.  1's complement had the following qualities:
  • The same range of negative numbers as positive numbers.
  • Both a positive zero, and negative zero.
    • Some computers automatically converted negative 0 to positive 0 after each arithmetic operation
  • Negation is just the complement (no subsequent increment required).
  • Different hardware is required for signed vs. unsigned for all arithmetic operations

Shifting Operations

In these examples, we shift by only one bit.  Though modern hardware can shift by multiple bits in one cycle, a shift result for a higher shift count can still be thought of as multiple consecutive 1-bit shifts.

Note that some hardware has anomalies when using a shift count that is equal to or larger than the CPU register size, or when using a variable shift amount that turns out to be zero.  (I view these as unfortunate inconsistencies, shifting by zero, for example, should simply return the input value rather than something unexpected).

Left Shift

The Left-shift operation shifts the input by the shift count, and inserts zeros into vacated bits on the right.

    abcdefgh     our input byte, shown in binary as variables
           1     shift count in decimal
    -------- <<  Left shift operation
    bcdefgh0     result

This shift operation is also the same as multiplication by a power of 2, where the power is the shift count.  This is a more efficient way to multiple by such values, if you know in advance you're dealing with a power of 2.

Logical Right-shift

The logical Right-shift operation shifts the input by the shift count, and inserts zeros into vacated bits on the left.

    abcdefgh     our input byte, shown in binary as variables
           1     shift count in decimal
    -------- >>  Logical Left shift operation
    0abcdefg     result

This also divides by a power of 2, where the power of 2 is the shift count.

Arithmetic Right-shift

The arithmetic Right-shift operation shifts the input by the shift count, and inserts copies of the sign bit into vacated bits on the left.

    abcdefgh     our input byte, shown in binary as variables
           1     shift count in decimal
    -------- >>  Logical Left shift operation
    aabcdefg     result

Of particular note here is that the sign bit is kept and/or replicated depending on how we look at it.  Larger shift counts will result in a string of aaa's starting with the high bit.  This is the result we want in order to properly sign extend a smaller value to the size of a CPU register, for example.

This shift operation is similar to a divide by a power of 2 but has an issue with an input value of -1 as we would expect the result of -1/2 to be 0 when truncated to integer, but using arithmetic shifting it simply stays as -1 instead of going to 0.

Dynamic Mask Generation

1 << n

With 0 <= n <= register size (e.g. 32), this construct computes a value with a single bit set.  Here's a table of 1<<n for n:

      1 << n      n      1 << n (decimal)
     00000001     0           1
     00000010     1           2
     00000100     2           4
     00001000     3           8
     00010000     4          16
     00100000     5          32
     01000000     6          64 
     10000000     7         128

Notice that the 1's march across the word.

(1 << n) - 1

This construct generates a set of consecutive 1's.

   (1 << n) - 1   n    (1 << n) - 1 (decimal)
     00000000     0           0
     00000001     1           1
     00000011     2           3
     00000111     3           7
     00001111     4          15
     00011111     5          31
     00111111     6          63 
     01111111     7         127

((1 << n) - 1) << m

Let's say that m is one.   This generates a bit mask of consecutive ones at a position other than right justified.

   ((1<<n)-1)<<1  n
     00000000     0
     00000010     1
     00000110     2
     00001110     3
     00011110     4
     00111110     5
     01111110     6
     11111110     7

The complement of each of these can form a useful mask:

~ (1 << n)

     ~1 << n      n      
     11111110     0
     11111101     1
     11111011     2
     11110111     3
     11101111     4
     11011111     5
     10111111     6
     01111111     7

~((1 << n) - 1)

  ~((1 << n)-1)   n 
     11111111     0
     11111110     1     11111100     2     11111000     3     11110000     4     11100000     5     11000000     6     10000000     7


~(((1 << n) - 1) << m)

 ~(((1<<n)-1)<<1) n
     11111111     0
     11111101     1
     11111001     2
     11110001     3
     11100001     4
     11000001     5
     10000001     6
     00000001     7

Using dynamic mask notation for constant mask creation

In C and other languages, you can use the above dynamic mask generation with constant values for n.  That can make it easy to generate masks of interest.  For example, a 4-bit mask (e.g. for the low nibble) can be generated using the constant (1 << 4)-1.  A mask for the upper nibble of a byte is (the former) << 4.

Other Operations

Some CPUs have direct signed and/or unsigned field extraction instructions, which can replace the above sequences.

Many CPUs also have sign extension and zero extension operations for 8-bit and 16-bit inputs to convert them to 32-bit values.

Older CPUs may have only 1-bit shift operations.  In such cases, shift operations need to be used in sequence to obtain the desired shift.  If a processor is absent a variable shift amount, a loop of 1-bit shifting can be used.


See Also https://graphics.stanford.edu/~seander/bithacks.html

This post first appear on Erik Eidt's "Discussions on Programming Models" at https://erikeidt.blogspot.com/2018/04/bit-twiddling-understanding-bit.html

Saturday, March 3, 2018

Understanding Overflow with Integer Arithmetic

Introduction

In programming we use a lot of arithmetic.  Much of it is simple addition, a fair amount of multiplication, some subtraction, etc...  Use of arithmetic can also be more exposed or more hidden depending on the languages we use, say, C vs. assembly.

Arithmetic falls into two categories: that for which overflow simply cannot occur, and that for which it can.  Generally speaking it is the programmer's responsibility to know which is which!

Understanding when Integer Arithmetic is Safe from Overflow

Much of the integer arithmetic calculation that we do is in access to our (in-memory) data structures.  Such calculations require additions, subtractions, multiplications (and shift) and are known as — especially in the C and assembly language — "pointer arithmetic".

The rule of thumb is that if we accesses memory using pointer arithmetic, we do not need to worry about overflow in our pointer arithmetic. 

However, to be sure, there are a number of caveats, as not all pointer arithmetic is overflow-free:
  • One catch is that we must be sure the memory we're trying to reach exists.

    How can we be sure that the memory indeed exist?

    1. Global variables, including arrays and other data structures always exist or else your program would not be running.  If there was insufficient memory for the operating system to load the program (code and global data) it would not be running.

    2. Always check dynamic memory allocation to see if it succeeds or fails!
      • if the allocation fails, then obviously, the memory does not exist, and our programs must not attempt to use it!
      • if the allocation succeeds, we can avoid overflow checking (subject to the following)

    3. Do not access memory prior to, or past an object, where an object is either:
      • a simple variable or struct or data structure whose address was taken to obtain a pointer or obtained by dynamic memory allocation, or,
      • an array,
      (It ok for a pointer to be computed as a terminal value (a value used for comparison only), as one byte prior to or one byte past an object as long as we don't access memory through that pointer.)

    4. For arrays, understand the provenance of indexing variables or indexing expressions. 

      • The values of indexing variables & expressions can come from algorithms.  When indexing in a simple loop that knows the actual allocated size of an array, we can be sure that overflow will not occur. When using complex indexing expressions that involve algorithmic computation, where we trust the algorithm.  For example, a heap data structure is an array whose elements have children at computable positions in the array.  A heap sort or priority queue algorithm (both use this kind of heap data structure) will operate on the elements and their children using complex indexing computations, that (given a valid algorithm) always access existing elements in the array.

      • However, pointer arithmetic involving indexing values from external and/or unverified provenance, can result in overflow.

    5. For objects, understanding the type of the object.

      This is particularly important when we have objects that share common initial fields and otherwise variable lengths — which can be done with classes & subclasses, tagged unions, or untagged/undiscriminated unions.  The latter — untagged and undiscriminated unions, if allowed by the programming language — can be problematic, since they are not necessarily automatically detected by the runtime of the programming language (and hence the programmer's concern).

  • Another catch is that we must use integers that are (at least) the same size as pointers (i.e. addresses) for the given programming environment. 
    • For 16-bit environments (16-bit pointers & 16-bit integers), the integers almost certainly need to be unsigned as well. 
    • For 32-bit environments, we can generally use 32-bit signed integers, as long as the operating system will not allow us to allocate larger than half the address space (i.e. 2gb), or allocated on the 2gb boundary.  This is generally the case.  If not however, then integers need to be unsigned.
    • For 64-bit environments, we can assume that the operting system will not allow us to allocate objects larger than 2^63, and, will not locate objects on the 2^63 boundary, so we can use signed 64-bit integers.

When should we have doubts about Overflow?

  • When doing non-pointer arithmetic.
  • Whether pointer arithmetic or not, if input is used from external sources is used e.g. values comes from the user or a file or the network.

What should we do when we have doubts?

  • Establish and agree upon bounds and limitations in API contracts, and, validate input values against agreed upon limitations.
  • Use overflow checking arithmetic, and, document conditions upon which overflow will cause failures, such as implementation choices for the size of fixed-sized numbers.
  • Use alternative data types, such as larger integers, floating point types, decimal types, and variable length types like BigInteger.  (Overflow with these other data types is beyond the scope of this post.)

Final Notes

There are many issues with using pointers safely, such as accessing, by pointer, an object that has been deleted, which will almost certainly cause problems for the program.  However, in accessing memory for an object has been deleted (and/or reclaimed for another purpose) overflow in pointer arithmetic is not the immediate concern (because the memory itself still exists), rather the concern is the mismatch between the expected object and the real object at that memory location.

This post first appear on Erik Eidt's "Discussions on Programming Models" at http://erikeidt.blogspot.com/2018/03/understanding-overflow-with-integer.html

Tuesday, February 20, 2018

Understanding Recursion

To illustrate recursion, we start with a very simple function for computing string length, which obviously, will operate on strings.  So, a quick recap of strings in C, which are arrays of characters followed by a null terminating character.

    char *ptrHello = "hello";

In memory, say at address, 1000 we have the string:
    1000 h
    1001 e
    1002 l
    1003 l
    1004 o
    1005 '\0' -- the null character

    ptrHello holds 1000

Let's also observe that ptrHello+1 would be 1001 and that is also a valid string — though shorter by 1.

Further, let's note that the C standard library has a function strlen, which we will use, and eventually replace.  When we call the library's strlen, we won't worry about how it works, rather we simply trust that it does.

This is the point of view of the consuming client programmer: one of trusting that something (in a library or somewhere else) just works without knowing how it works.

As programmers, we alternate hats: sometimes we are an implementer crafting an algorithm or function, and other times we are consumers, just using some function(s).

As consumers we trust in the contract that some function will do what it says, if we provide it with the right arguments, and we don't worry much about specific implementation details.  This is the nature of abstraction, hiding certain details so we can focus on more immediate concerns.  This will be important to do in our look into recursion.

Here is our first version of stringLength — and this one not recursive:

 (1)
        int stringLength ( char *strPtr ) {
        return strlen ( strPtr );
    }

This implementation simply calls strlen.  This is a valid implementation of a function to compute the length of a string.  It does almost no work, simply invoking strlen.  We should be able to convince ourselves with very high degree of confidence that stringLength produces the same answers as strlen.

Now, we will handle one case — the empty string — and then turn all the other cases over to strlen:

(2)
    int stringLength ( char *strPtr ) {
        if ( *strPtr == '\0' )
            return 0;
        return strlen ( strPtr );
    }

This implementation (2) either returns 0 for the empty string, or else, it uses strlen to compute the length of a non-empty string.  Here strlen is still doing almost all of the work.  The only case we're handling directly is the case of the empty string.

Next, we're going to change our function to do a bit more work as follows:  Recall that here an expression strPtr + 1 points to the second character of a string.  (In our above string example, ptrHello + 1 would be 1001, which is the string "ello".)

(3)
    int stringLength ( char *strPtr ) {
        if ( *strPtr == '\0' )
            return 0;
        return 1 + strlen ( strPtr + 1 );
    }

This implementation (3) also handles the empty string directly (as does (2)).  It still calls strlen, however, it is now counting the first character of the string itself and only calling strlen to count the rest of the characters.  Notice that, compared to (2), it advances the pointer that it passes to strlen by 1 character.  So if we called stringLength ( ptrHello ), then strlen would be invoked with "ello", and after computing the length of "ello" and returning 4 to us, our stringLength adds 1 to that result, so returns 5, the correct answer for "hello".

We know it is ok to do the pointer arithmetic of strPtr+1 because we have first checked and handled the case of the empty string.  So, if we reach the second return statement, we know the string is not empty, and thus must have at least one character.

Now for the fun part: as we have confidence that stringLength (3) produces correct answers, and in fact, the same answers stringLength (1) and (2) and as strlen itself, we can use it internally instead of strlen:

(4)
    int stringLength ( char *strPtr ) {
        if ( *strPtr == '\0' )
            return 0;
        return 1 + stringLength ( strPtr + 1 );
    }

Our stringLength, version (4), is now recursive.

As much as possible I encourage looking at the recursive invocation of stringLength for what it does for us, the caller, rather than that it does recursion at all.

In particular, let's compare this version (4) with version (3).  Version (3) works by relying upon strlen.  Version (4) simply substitutes strlen for another function we know works the same: stringLength, and, by simply trusting that stringLength works we can expect it to continue to work for us internally.  So, without diving into the recursion — we can see it will work properly.

So, one trick with recursion is to pretend that we are simply using the recursive routine rather than thinking through the complete recursion.  In order to do this, we need to switch our hats from the implementer of stringLength to a user of stringLength.  This is a bit tricky because normally we switch our hats between writing implementations (of functions) and using them at different places in our code.  However, with recursion, we have to switch our hats within the implementation.

Now, we can also follow to observe the recursive approach.  One way is to expand the source code in our minds, something like this:

(5)
stringLength ( "hello" ) 
        1+stringLength ( "ello" )
            1+(1+stringLength ( "llo" ))
                1+(1+(1+stringLength ( "lo" )))
                    1+(1+(1+(1+stringLength ( "o" ))))
                        1+(1+(1+(1+(1+stringLength ( "" )))))
                            1+(1+(1+(1+(1+0))))

Each line represents the next recursive invocation, along with the addition of 1 for each character that is removed by each level of recursive invocation.

The following equivalent implementation uses expression syntax instead of statement syntax:
             
(6)
    int stringLength ( char *strPtr ) {
        return strPtr[0] == '\0' ? 0 : 1 + stringLength ( strPtr + 1 );
    }

Using the expression syntax we can also inline one level of the recursion:

        return strPtr[0] == '\0' ? 0 : 1 + (strPtr[1] == '\0' ? 0 : 1 + stringLength ( strPtr + 2 ));

and after a bit of optimization on some of the constants:

        return strPtr[0] == '\0' ? 0 : (strPtr[1] == '\0' ? 1 : 2 + stringLength ( strPtr + 2 ));

To wrap up the discussion, recursive approaches have several elements:
  1. one or more terminating conditions
  2. decomposition of the problem into a smaller problem or problems
  3. a recursive call or calls used on those smaller problem or problems
  4. some operations that compose the smaller solution into the larger solution
In our examples,

  • the terminating condition is the empty string, for which the answer is zero.  
  • the decomposition is working on a string that is 1 character shorter.
  • we have the recursive call, and,
  • after the recursive invocation solves the 1-shorter string problem for us, we simply add 1 to that result to compose the smaller solution into the larger solution.
That concludes our introduction to recursion.  The next section looks at a special kind of recursion involving a tail call.



Next, we look at enabling tail call.  Tail call optimization is an optimization technique that allows a compiler or assembly writer to remove an invocation in certain cases, substituting it with a branch, which does not create an additional stack frame.  When applicable to and applied to a recursive function, this can result in significant performance and space gains over the recursive approach.

Tail call optimization cannot be expressed in high level language, but it can be enabled or precluded by the specific code in the function.

Enabling tail calls requires that the recursive invocation returns directly to the caller without any additional computation on the result.  This is not the case in (4):

(4r)
        return 1 + stringLength ( ... );

Here we see that the return value from the recursive invocation is used in a simple computation, an addition of 1.  This computation on the value returned by the recursive invocation precludes the tail call optimization.

However, we can transform our solution into one that does directly return the result of the recursive invocation; we do this by moving the addition of one before the recursive function that we're calling.

We're going to need a helper function, stringLengthPlus, and it will be the recursive one.

As with any recursion, we will attempt to understand this function first from the perspective of a consuming client programmer without looking inside the recursive implementation.

The contract for stringLengthPlus is that it receives an extra parameter: a length value.  So, it takes two parameters: a string, and a length. It returns to the caller:
the length of the string + that initially supplied length value.

If we supply the initial length value with zero, then it will return just the length of the string.  That's what stringLength does below:

(7)
    int stringLength ( char *strPtr ) {
        return stringLengthPlus ( strPtr, 0 );
    }

    int stringLengthPlus ( char *strPtr, int accumulatedLength ) {
        if ( *strPtr == '\0' )
            return accumulatedLength
        return stringLengthPlus ( strPtr + 1, 1 + accumulatedLength );
    }

In the case of the empty string, stringLengthPlus simply returns the parameter-supplied length for the answer, as in this case there are no characters in the string to add to that length.  Hopefully, you can see that if stringLengthPlus is invoked on the empty string, and with a length value of zero it will return zero.

However, if stringLengthPlus is invoked on a non empty string, it will compute that strings length + any initially provided length value.

Internally, it uses that feature to request (of itself) to compute the length of a 1 shorter string.  As it does this, it provides a meaningful length value that accounts for the characters already handled.  For example, the following recursive calls are made in computing the length of "hello":

    stringLengthPlus ( "hello", 0 )
        stringLengthPlus ( "ello", 1 )
            stringLengthPlus ( "llo", 2 )
                stringLengthPlus ( "lo", 3 )
                    stringLengthPlus ( "o", 4 )
                        stringLengthPlus ( "", 5 ) => 5

As compared with (4), (7) effectively moves composition computation (the adding of one) from after the recursive invocation to before the recursive invocation.  In other words, instead of computing (the add one operation) on the returned value as in (4), our computation (the add one operation) takes place during passing of parameters in (7).

Because there is no work being done after the recursive call, the underlying machine code implementation could eliminate the creation of each stack frame for the recursive call, potentially improving performance quite a bit.

When the underlying system uses the tail call optimization, the return statement from the inner most recursive call returns directly to the outer caller (in our case that would be stringLength)

The astute reader may notice that tail call optimization can be applied to non-recursive functions as well.  For example, stringLength, version (7), is  a candidate for tail call optimization: as it itself returns return value of stringLengthPlus unmodified.  If tail call optimization were applied here as well then the inner most recursive call to stringLengthPlus would return directly to the caller of stringLength!

This post first appear on Erik Eidt's "Discussions on Programming Models" at http://erikeidt.blogspot.com/2018/02/understanding-recursion.html

Thursday, August 18, 2016

How to use closure compiler advanced optimizations to build a chrome extension

Running the source code for a chrome extension through the closure compiler with advanced optimization is easier than you might think, and I will demonstrate below.


Background

Chrome's extension model is a bit like client-server.  You can have independent pages for UI (client), which communicate with a background page (the server), for business logic, persistence, and global event handling and awareness.  The difference is that while client and server typically communicate using some form of network protocol (e.g. http), the chrome extension UI components can access the background page more-or-less directly using JavaScript.

The closure compiler provides an ADVANCED_OPTIMIZATIONS mode that minifies not only local variables, but also members as well, plus it does inlining and dead code removal, all great stuff that we want it to do for us!  The problems with using the closure compilier on the chrome extension environment stem from the fact that the client UI pages and the (server) background page are all different contexts, yet need to share name minifications.


My code before I was using the Closure Compiler

My background page hooks a number of event handlers, such as those of chrome.windows, chrome.bookmarks, and chrome.tabs (which I will not show).  It also provides an API for my other pages, for the purposes of this example, I will show that in regards to my options page.

background.js:

const _options = {};

const optionsPageInterface = {
getOption ( name ) {
return _options [ name ];
},
setOption ( name, value ) {
_options [ name ] = value;
}
};

options.js:

var opi = chrome.extension.getBackgroundPage().optionsPageInterface;

opi.setOption ( "some-option", 100 );
var someOptionValue = opi.getOption ( "some-option" );
alert ( someOptionValue );

We can see that the options page uses chrome.runtime.getBackgroundPage() to access the background page, and then selects the global variable optionsPageInterface, using a member expression (the .optionsPageInterface part) then stores that value in its own global variable (opi) for later use.

To use the closure compiler, first, we need it to generate separate files for background and options pages.  We'll use the closure compiler's --module option.  We also want it to forgo minification of chrome.* APIs, which we do with the --externs option.  (see closure-compiler chrome extensions declarations)  Here's my command line:

java -jar bin\compiler.jar
--compilation_level ADVANCED_OPTIMIZATIONS
--assume_function_wrapper
--formatting PRETTY_PRINT
--externs chrome_extensions.js
--property_renaming_report propertyRenamingReport.txt
--variable_renaming_report variableRenamingReport.txt
src\background.js --module background:1: 
src\options.js --module options:1:background

Sadly, the result with advanced optimizations enabled is an empty background.js and all that's in options.js is the following:

var a = chrome.extension.getBackgroundPage().a;
a.c("some-option", 100);
var b = a.b("some-option");
alert(b);

Obviously, with the background page having been stripped, that's not going to run.  We need to tell the closure compiler that there is a linkage between the background page and the options page, as we want the closure compiler to keep the background functions used by the options page, and also to minify optionsPageInterfacegetOption, and setOption  to the same value for both the background and options pages.

The closure compiler documentation recommends using exports via string literals as those are never modified.  However, this suggests switching from simple dot-property notation to subscript-property notation with string literals, and also forgoing minification of property names across the chrome extension client-server boundary.  In our case this would mean assigning to:

window [ "optionsPageInterface" ] = { ...

in background.js and consuming in options.js as:

var opi = chrome.extension.getBackgroundPage()[ "optionsPageInterface" ];

This is more restricted than strictly necessary for a chrome extension, if we can get the closure compiler to use the same minified names in the background page as in the client pages, and also if we're willing to live with a restriction that the whole chrome extension is versioned together.


Changes for use with the Closure Compiler

Simply change the declaration of the global interface variable and assignment to it, into a window member assignment instead:

background.js:

const _options = {};

window.optionsPageInterface = { /***was var optionsPageInterface = {***/
getOption ( name ) {
return _options [ name ];
},
setOption ( name, value ) {
_options [ name ] = value;
}
};

For good measure, I also wrap the code in options.js in an anonymous function closure.  It doesn't have any exports, so that keeps all its variables out of the global namespace and from interfering unnecessarily with the global level variables in the background page during the minification process.

options.js:

(function () {
var opi = chrome.extension.getBackgroundPage().optionsPageInterface;

opi.setOption ( "some-option", 100 );
var someOptionValue = opi.getOption ( "some-option" );
alert ( someOptionValue );
})();

Run thru the closure compiler, these files generate the following background.js and options.js:

background.js (minified, with advanced optimizations)

var b = {};
window.a = {b:function(a) {
  return b[a];
}, c:function(a, d) {
  b[a] = d;
}};

options.js (minified, with advanced optimizations)

var c = chrome.extension.getBackgroundPage().a;
c.c("some-option", 100);
alert(c.b("some-option"));

As you can see the closure compiler has minified even the property named  optionsPageInterface and is using the same name on both the client pages and the background page.

I have tested this on a fairly large chrome extension with multiple client UI pages.

Even though components of my chrome extension communicate internally with each other using minified names, that is fine with me because the only consumers of the background page-provided APIs are all versioned together with the provider -- in other words, the background is really providing an internal-only API for the client pages.  The same is also true for uses of sendMessage(), as the closure compiler is able to minify names shared between client UI pages and the background page.

You may likely still need to use subscript-property notation with string literals for any property names / members that escape the extension, for example, query parameters.

I also found -- and fixed in my copy -- a bug of incompleteness in the declaration of chrome.windows.* (from chrome_extensions.js) which made the closure compiler fail to recognize when some property names should not have been minified under advanced optimizations.  See issue closure-compiler issue 1961 at github.