Bit-field Packing in GCC and Clang

Recently for a school assignment we were given a C bit-field and tasked with packing a similar struct into the bit-field without the help of the compiler. All struct and bit-field manipulations had be be done manually using only the bytes of the structures. Amazingly, I didn’t even know what a bit-field was before starting this assignment, so I’ll give a little background before getting into the nitty-gritty details.

What is a bit-field? #

So, almost anyone familiar with C knows that it allows you to pack a group of variables into a new type called a ‘struct’. They look like this in C code:

struct mystruct {
    int a;
    short b;
    unsigned int c;
};

The C standard specifies that each platform will define a specific set of ‘alignments’ that will specify how structures like the one above are actually represented in memory. For example, the linux 32 bit standard specifies that chars can be aligned on any address, shorts are aligned on addresses that are multiples of two, and integers and larger are aligned on addresses that are multiples of four.

In addition to normal structures, C also allows structures to have ‘bit-fields’,
they look like this:

struct mystruct {
    int a : 8;
    short b : 7;
    unsigned int c : 29;
};

The part after the field name, specifies a specific bit-width for the field.
In the example above field a is explicitly declared to be eight bits wide.
As you might guess, this features primary function is to allow the programmer to save space. Because bit-fields are supposed to take up less space than their full-width counterparts, the C standard specifies special rules for how they are laid out in memory [2]:

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

Basically, a compiler is allowed to pack as many bit-fields as it can into
a block of bytes in whatever way it wishes, which really doesn’t help anyone
who is trying, for whatever reason, to build one of these bit-fields manually.

Because of this lack of specificity, I was curious about what rules compilers like clang and gcc actually used to build these things. Below is the results of my research that basically involved a lot of trial and error, print-outs of bit-fields, and a little dumb luck.

How gcc and clang build bit-fields #

Note that each byte in the following examples is printed low bit->high bit (instead of the normal high->low). This just makes it easier to see the segments of the bit fields, if the bytes are printed high->low, it looks like there are bits in the middle of various sequences and it makes things confusing. Also, all of this was discovered through trial and error, there may be corner cases I didn’t catch, but it seems to hold up.

Before I give some examples, I’d like to point out that in the C standard [1] it says that the bits of each bit-field are put into, or packed into, ‘addressable storage units’ [1]. You can think of as a block of bytes. Clang follows this procedure to store each bit-field:

  1. Jump backwards to nearest address that would support this type. For example if we have an int jump to the closest address where an int could be stored according to the platform alignment rules.
  2. Get sizeof(current field) bytes from that address.
  3. If the number of bits that we need to store can be stored in these bits, put the bits in the lowest possible bits of this block.
  4. Otherwise, pad the rest of this block with zeros, and store the bits that make up this bit-field in the lowest bits of the next block.

Here’s some examples (remember that all bytes are printed from lowest bit
to highest bit):

struct test {
    char f0 : 7;
    char f1 : 2;
}

This struct will be packed like this:

f0 set: 
0x0 11111110 00000000
f1 set: 
0x0 00000000 11000000

When attempting to store f1 clang checks the sizeof(type of f1) from the nearest address that can support that block size. Since the sizeof(type of f1) is 1, the nearest block is the preceding one. Now, that block is 8 bits wide and we’re trying to pack 2 bits into it. Since packing the 2 additional bits with the original 7 would require overflowing the block, we skip to the next block and put the bits there. Now, lets see what happens when I change the struct slightly:

struct test {
    char f0 : 7;
    short f1 : 2;
}

The struct gets packed like this:

f0 set: 
0x0 11111110 00000000
f1 set: 
0x0 00000001 10000000

Now f1 is straddling the boundary between the first and second bytes. When trying to pack the bits for f1, clang jumped to the first address that could store a sizeof(type of f1) block (2 bytes), grabbed that block and noticed that only 7 of the 16 available bits were occupied, so it just stored the two bits of f1 in the lowest block. Now, if we change the bit-width of f1 to 10, the struct will pack like this:

f0 set: 
0x0 11111110 00000000 00000000 00000000
f1 set: 
0x0 00000000 00000000 11111111 11000000

The method for packing this struct is similar to the previous struct. When we want to pack field f1 we grab the short sized block at address 0 and notice that since there are 7 taken of the 16 total, we can’t store the additional 10 bits of f1 without overflowing the block. So, we jump to the next block and write the 10 bits of f1 to the lowest bits.

Just to show that this procedure can scale to even complex bit-fields, I’ll try and pack a larger, more complex bit-field:

struct big_bitfield {
    unsigned short f0 : 8;
    long           f1 : 16;
    unsigned long  f2 : 29;
    long long      f3 : 9;
    unsigned long  f4 : 2; 
    unsigned long  f5 : 31;
};

Here’s how clang packs it:

f0 set:
0x0  11111111 00000000 00000000 00000000
f1 set:
0x0  00000000 11111111 11111111 00000000
f2 set:
0x4  11111111 11111111 11111111 11111000
f3 set:
0x4  00000000 00000000 00000000 00000111
0x8  11111100 00000000 00000000 00000000
f4 set:
0x8  00000011 00000000 00000000 00000000
f5 set:
0xc  11111111 11111111 11111111 11111110

(remember that sizeof(long) with -m32 is 4).

First off, f0 is easy enough, we grab the first block of two bytes, there are no bits stored in it, so we just put all 8 bits of the field into the first bits of the block. The second field f1 is four bytes wide, so we grab the first four bytes at address 0x0, there are 8 bits already taken, but since we’re only trying to store 16 we won’t overflow the block. We just add the 16 bits after the 8 bits from f0.

Next, for field f2 we grab the first block of four bytes, we’ve already used 24 of the 32 available bits so we won’t be able to fit our 29 bit in that block, we have to move to the next one. The next four byte block is located at address 0x4, so we write our bits there.

Field f3 is a little more interesting. Since sizeof(long long) is 8, we have to grab the nearest 8 byte block, since we just put the bits of field f2 into the four byte block at address 0x4 and long long is aligned on 4 byte blocks, we grab the 8 byte block from 0x4 to 0xc because it’s the nearest. Since only 29 of that block’s 64 bits are used, we write our 9 bits into lowest available bits of the 8 byte block.

Fields f4 and f5 should be relatively easy, and are left as an exercise
for the reader.

There are two other interesting cases related to packing bit-fields, the first
is mixed structures:

struct test2 {
    short f0 : 8;
    long f1 : 16;
    int f2;
    char f3 : 7;
}

In this example f2 is not a bit-field. The structure will be packed like this:

f0 set:
0x0  11111111 00000000 00000000 00000000
f1 set:
0x0  00000000 11111111 11111111 00000000
f2 set:
0x4  11111111 11111111 11111111 11111111
f3 set:
0x8  11111110 00000000 00000000 00000000

Essentially f2 is treated like a bit-field with the maximum number of bits set, it can’t possibly be packed so it’s simply put into the next available block.

The other interesting case is unnamed, zero-width fields:

struct test2 {
    short f0 : 8;
    long f1 : 16;
    int : 0;
    char f3 : 7;
}

This struct is packed like this:

f0 set:
0x0  11111111 00000000 00000000 00000000
f1 set:
0x0  00000000 11111111 11111111 00000000
f3 set:
0x4  11111110 00000000 00000000 00000000

The field in-between f1 and f2 basically forces the struct to pad to the next block of size int, at which point packing continues as normal. It allows you to have a small amount of control over how the individual bits are actually laid out.


[1]: section 6.7.2.1, paragraph 10
[2]: based on section 3.6, section 3.2, and section 6.2.6.1 paragraph 4
[3]: section 6.7.2.1, paragraph 10

 
122
Kudos
 
122
Kudos

Now read this

Random Linux Oddity #1: ru_maxrss is Inherited

These days I do roughly 100% of my development on and for systems running Linux. Since my work and personal interests are pretty “low level”, I have spent quite a bit of time investigating the weird, and surprising details of the Linux... Continue →