tag:jkz.wtf,2014:/feedJosh Kunz2022-06-30T23:36:09-07:00Josh Kunzhttp://jkz.wtfjosh@kunz.xyzSvbtle.comtag:jkz.wtf,2014:Post/howto-gargle-authentication-in-jupyter2022-06-30T23:36:09-07:002022-06-30T23:36:09-07:00Howto: Gargle Authentication in Jupyter<p><strong>TL;DR:</strong> See the <a href="#a-fix-at-last_1">code snippet</a> at the bottom of this post for the code needed to get this to work.</p>
<p><a href="https://gargle.r-lib.org/">gargle</a> is a great little library for the R programming language that implements several authentication mechanisms against Google accounts. It’s used to power useful integrations like <code class="prettyprint">googlesheets4</code>, and <code class="prettyprint">googledrive</code> that allow R scripts to load data out of Google cloud resources. I recently started trying out <a href="https://www.tillerhq.com/">Tiller</a> for managing my personal finances, and I wanted to access my data from a JupyterLab notebook. When running on the command line, or in RStudio, gargle helpfully executes the OAuth flow by running a little local webserver that processes the authentication flow. In my JupyterLab setup, the kernel isn’t running on my machine directly. I tried running googlesheets4’s <code class="prettyprint">gs4_auth</code>, but got back:</p>
<pre><code class="prettyprint">Error in `gs4_auth()`:
! Can't get Google credentials.
ℹ Are you running googlesheets4 in a non-interactive session? Consider:
• Call `gs4_deauth()` to prevent the attempt to get credentials.
• Call `gs4_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <https://gargle.r-lib.org/articles/non-interactive-auth.html>
</code></pre>
<h1 id="attempting-to-use-oob-oauth_1">Attempting to use OOB OAuth <a class="head_anchor" href="#attempting-to-use-oob-oauth_1">#</a>
</h1>
<p>Luckily, the OAuth standard recognizes that there may be cases where the OAuth flow can’t be done automatically via browser redirects. In these cases OAuth allows for “out of bounds” (OOB) authentication, where the application prints out a URL, you paste that URL in your browser, do the authentication flow, get a token, and then paste that token back into the application. Basically doing the typical flow “by hand”. gargle supports this option with the <code class="prettyprint">use_oob</code> parameter. Here we pass it to <code class="prettyprint">gs4_auth</code> (a googlesheets4 specific wrapper around gargle) and get:</p>
<pre><code class="prettyprint">Error in `gs4_auth()`:
! Can't get Google credentials.
ℹ Are you running googlesheets4 in a non-interactive session? Consider:
• Call `gs4_deauth()` to prevent the attempt to get credentials.
• Call `gs4_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <https://gargle.r-lib.org/articles/non-interactive-auth.html>
</code></pre>
<p>Again! Ugh! What’s going on here? Jupyter should be plenty “interactive” to do an OOB flow. Why won’t it work…</p>
<h1 id="interactive-sessions_1">Interactive Sessions <a class="head_anchor" href="#interactive-sessions_1">#</a>
</h1>
<p>After a brief spelunking expedition, I found <a href="https://github.com/r-lib/gargle/blob/d7e5465ddcdd44d02a5a5cf1f691e8d2b50f5caf/R/Gargle-class.R#L239">this line</a>. gargle will refuse to execute the OAuth flow if it doesn’t think the session is interactive. It uses the <code class="prettyprint">is_interactive()</code> function to make that decision. Turns out this function comes from <a href="https://rlang.r-lib.org/reference/is_interactive.html">rlang</a> a standard suite of tools used by tidyverse packages. Luckily, it’s configurable, so I tried again, this time with a slightly more complex invocation:</p>
<pre><code class="prettyprint">rlang::with_interactive({ gs4_auth(use_oob=TRUE) })
</code></pre>
<p>And… The same error! Ugh!</p>
<h1 id="interactive-sessions-again_1">Interactive Sessions, Again <a class="head_anchor" href="#interactive-sessions-again_1">#</a>
</h1>
<p>At this point I decided to turn on debug logs for gargle, and noticed that gargle was encountering a stop signal. Further spelunking tracked it down to a check in the httr HTTP client library. This library is wrapped by gargle, and it performs the underlying OAuth flow. Unfortunately, it uses <a href="https://github.com/r-lib/httr/blob/21ff69f219ad11298854a63b8f753389088cf382/R/oauth-init.R#L230/">a different</a> (more standard) definition of an interactive session: the built-in <code class="prettyprint">interactive()</code> function. Of course running <code class="prettyprint">interactive()</code> in my Jupyter session showed my worst fears were reality:</p>
<pre><code class="prettyprint">> interactive()
FALSE
</code></pre>
<p>Cool.</p>
<p>Well, I guess I’m not running R directly. Maybe there’s some good reason that IRKernel isn’t recognized as interactive. Maybe there’s some option I can enable… Searching around turned up <a href="https://github.com/IRkernel/IRkernel/issues/236">this issue</a> from the IRKernel Github Repo, a 6 year old bug with the ominous line:</p>
<blockquote class="short">
<p>OK, this is pretty hard:</p>
</blockquote>
<p>Turns out that R bases <code class="prettyprint">interactive</code> solely on the <code class="prettyprint">--is-interactive</code> flag passed on interpreter startup which restricts the ways that input can be provided, and breaks IRKernel. The IRKernel developers swiftly <a href="https://bugs.r-project.org/show_bug.cgi?id=17134">filed a feature request on R</a> to make this more flexible, but that request has sat safely untouched for the intervening 6 years.</p>
<h1 id="a-fix-at-last_1">A Fix at Last <a class="head_anchor" href="#a-fix-at-last_1">#</a>
</h1>
<p>Great! Well, luckily, the developers of httr left us one final hope. They wrapped the <code class="prettyprint">interactive</code> builtin with another function called <code class="prettyprint">is_interactive</code> (though notably, <em>not</em> the <code class="prettyprint">rlang</code> provided <code class="prettyprint">is_interactive</code>). Since this is R, maybe, we can just overwrite that function to do what we want, and everything will work great. We can use the built in <code class="prettyprint">assignInNamespace</code> to perform this dubious action. Putting this together with our earlier hack for gargle, we get:</p>
<pre><code class="prettyprint">assignInNamespace("is_interactive", function() { TRUE }, "httr")
rlang::with_interactive({ gs4_auth(use_oob=TRUE) })
</code></pre>
<p>And, that’s the ticket! gargle, or maybe httr (I didn’t bother to check), prints out the authentication URL, and displays a textbox for us to type our authentication code into. Hooray!</p>
tag:jkz.wtf,2014:Post/random-linux-oddity-1-ru_maxrss2020-07-05T00:01:20-07:002020-07-05T00:01:20-07:00Random Linux Oddity #1: ru_maxrss is Inherited<p>These days I do roughly 100% of my development on and for systems running Linux. Since my work and personal interests are pretty “low level”, I have spent quite a bit of time investigating the weird, and surprising details of the Linux kernel. Every time I spend hours or days tracking one of these little oddities down, I have a strong desire to shout it from rooftops, and “share it with the world”. Unfortunately, these tidbits are too small to really fill out a full-length “blog entry”. To date, I have avoided posting them because of this. But today, after spending <em>another</em> 6 hours investigating yet another quirk, I’ve yet again become overwhelmed with the need to share. So today, I’m creating a series of all the small annoyances and surprising wonders I’ve run into working on Linux called “Random Linux Oddities”. Each will be numbered starting from 1 for no particular reason.</p>
<p>The oddity that has the honor of being the first in the series, is the one I spent several hours on today that has prompted this post: <code class="prettyprint">ru_maxrss</code>.</p>
<h1 id="background_1">Background <a class="head_anchor" href="#background_1">#</a>
</h1>
<p>Today I was working on <a href="https://github.com/joshkunz/ashuffle">ashuffle</a>, a little music shuffling client I wrote for the music playing server <a href="http://musicpd.org">MPD</a>. One of the earliest <a href="https://github.com/joshkunz/ashuffle/issues/14">bug reports</a> I got for ashuffle, was that it would often crash or get OOM killed when running on memory constrained devices (think Raspberry Pis) for users with large (10k-50k) music libraries. This is unfortunately not entirely surprising. Due to the quirks of the MPD protocol, I have to keep a copy of the “URI” for every song the user wants to shuffle. For must users, this is their entire library. The URI for a song is just the path to that song on the user’s local filesystem (with the common “library root” prefix removed), so these can be a bit long. When you scale that to tens of thousands of songs, running out of memory is plausible.</p>
<h1 id="investigation_1">Investigation <a class="head_anchor" href="#investigation_1">#</a>
</h1>
<p>One of the most important things I’ve learned about performance work, is that <em>measuring</em> the performance you care about is very important. Otherwise you’re playing whack-a-mole in the dark, just guessing on what is and isn’t causing you problems. The first step was to write a test harness that could say how much memory ashuffle was using. From previous experience I knew that the <a href="https://man7.org/linux/man-pages/man2/wait4.2.html"><code class="prettyprint">wait4</code> syscall</a> provides a helpful <code class="prettyprint">rusage</code> parameter that reports several interesting statistics. One of those looked useful: <code class="prettyprint">ru_maxrss</code>, the maximum number of kilobytes the process ever had mapped into its address space. Seems like a perfect fit!</p>
<p>I wrote a new test that ran ashuffle against a large test library, and then logged out the <code class="prettyprint">ru_maxrss</code> value received from <code class="prettyprint">wait4</code>. Much to my dismay, it was ~800MiB for a library of “only” 20k or so songs! That could certainly be an issue on a device like the Raspberry Pi Zero that has only 512 MiB of RAM in total.</p>
<p>So, with the target clear, I set to work on figuring how all that memory was being used. I hooked up a heap profiler (specifically the one bundled with <a href="https://github.com/google/tcmalloc">tcmalloc</a>). After re-running the test with the profiler enabled, I got this:</p>
<p><a href="https://svbtleusercontent.com/4iAWy91CVz9p69JJxSmo1R0xspap.png"><img src="https://svbtleusercontent.com/4iAWy91CVz9p69JJxSmo1R0xspap_small.png" alt="Memory usage graph depicting 60 MiB used by the `main` function"></a></p>
<p>Only 60MiB in use when the program exited! That’s quite a bit less than the 840MiB reported by <code class="prettyprint">ru_maxrss</code>. What is going on here?</p>
<h1 id="investigating-the-investigation_1">Investigating the Investigation <a class="head_anchor" href="#investigating-the-investigation_1">#</a>
</h1>
<p>I’m no expert in memory profilers, so at this point, I didn’t know what to think. Since I was just looking at the final memory usage, and <code class="prettyprint">ru_maxrss</code> measures peak usage, I figured there was probably <em>something</em> I was missing given the 10x increase. I re-ran the profiler, but this time had it dump samples for every 10MiB that was allocated. Looking at each of the samples, I didn’t see any weird peak-y behavior. The in-use memory increase was linear and just gradually ramped up to the final 60MiB count.</p>
<p>So where to go from here? The <code class="prettyprint">ru_maxrss</code> value was consistently high, it wasn’t just a one off. <code class="prettyprint">ru_maxrss</code> is measuring the maximum “Resident Set Size” (RSS) which is not quite the same thing the heap profiler was checking. The heap profiler captures all memory allocated through “normal” user-level memory allocation APIs like <code class="prettyprint">malloc</code>/<code class="prettyprint">free</code> or C++‘s <code class="prettyprint">new</code>/<code class="prettyprint">delete</code>. The resident set size contains other things, like memory retrieved from the operating system using <code class="prettyprint">mmap</code> or <code class="prettyprint">sbrk</code>. Maybe the “resident set size” captures some other big chunk of memory as well?</p>
<p>After a bit of digging, I found out an alternate way to get the RSS of a process: <code class="prettyprint">/proc/<pid>/statm</code>. This file contains several helpful numbers that represent the current memory usage of the process. The MAN page for <code class="prettyprint">proc</code> also helpfully explained that RSS is actually the sum of three different values:</p>
<ul>
<li>The number of shared memory pages mapped into the virtual memory of the process.</li>
<li>The number of files mapped into the process (including things like the binary itself, and any shared libraries).</li>
<li> The number of “anonymous” pages mapped into the process. Anonymous pages are what I would think of a “process memory”. Pages used exclusively by that process.</li>
</ul>
<p>To figure out if some of these other components of the RSS, were messing with my <code class="prettyprint">ru_maxrss</code> value; I rigged up a bit of code to print the size of the RSS, shared+file mapped pages, and anonymous pages right at the start of my program (first line of main) and right before I called <code class="prettyprint">exit</code>. The numbers lined up perfectly with the results from the heap profiler:</p>
<pre><code class="prettyprint lang-text">Startup: rss 1.61 MiB, anon 164.00 KiB, shared 1.45 MiB
Exit: rss 67.34 MiB, anon 63.74 MiB, shared 3.60 MiB
Change: rss 65.73 MiB, anon 63.58 MiB, shared 2.15 MiB
</code></pre>
<p>A growth of ~60MiB over the life of the program. So the heap profiler was right, but where the hell was that 800 MiB coming from!?</p>
<p>As a final lark, I decided to just print the <code class="prettyprint">ru_maxrss</code> from within the process itself using the <a href="https://man7.org/linux/man-pages/man2/getrusage.2.html"><code class="prettyprint">getrusage</code></a> syscall (rather than getting from <code class="prettyprint">wait4</code>). I thought that maybe the semantics of <code class="prettyprint">wait4</code> were different, or my Go-based test runner was using different units. The result was even more surprising:</p>
<pre><code class="prettyprint lang-text">Startup MaxRSS: 846.71 MiB
Startup: rss 1.61 MiB, anon 164.00 KiB, shared 1.45 MiB
...
</code></pre>
<p>The <code class="prettyprint">ru_maxrss</code> I was getting from <code class="prettyprint">wait4</code> matched exactly what I was seeing starting on the first line of <code class="prettyprint">main</code>. How did I manage to allocate 846MiB before my program even started?</p>
<h1 id="original-sin_1">Original Sin <a class="head_anchor" href="#original-sin_1">#</a>
</h1>
<p>Googling didn’t turn up anyone else surprised to see their <code class="prettyprint">ru_maxrss</code> value being sky high on the first line of <code class="prettyprint">main</code>, so I started digging into the Linux source. Grepping for <code class="prettyprint">ru_maxrss</code>, I found <a href="https://github.com/torvalds/linux/blob/35e884f89df4c48566d745dc5a97a0d058d04263/kernel/sys.c#L1768">this line</a> in the implementation for <code class="prettyprint">getrusage</code>:</p>
<pre><code class="prettyprint lang-c"> if (maxrss < p->signal->maxrss)
maxrss = p->signal->maxrss;
</code></pre>
<p><code class="prettyprint">p</code> in this case is the current task. From my previous adventures into the kernel, I knew that <code class="prettyprint">p->signal</code> is used to keep track of lots of “process” related stuff (i.e. stuff related to all threads in a process), not just signal information. Further grepping for <code class="prettyprint">signal->maxrss</code> turned up <a href="https://github.com/torvalds/linux/blob/35e884f89df4c48566d745dc5a97a0d058d04263/fs/exec.c#L1117">this bit of code</a>:</p>
<pre><code class="prettyprint lang-c"> if (old_mm) {
mmap_read_unlock(old_mm);
BUG_ON(active_mm != old_mm);
setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm);
mm_update_next_owner(old_mm);
mmput(old_mm);
return 0;
}
</code></pre>
<p>This code is run during the implementation of <code class="prettyprint">execve</code>. Specifically in <code class="prettyprint">exec_mmap</code> which officially switches the new task to it’s new, clean, empty, virtual memory map. This code says that if the newly exec’d task already had a memory map (<code class="prettyprint">if (old_mm) {</code> which is always true for userland tasks), then we update the process’s <code class="prettyprint">maxrss</code> value to be whatever the previous memory map’s <code class="prettyprint">maxrss</code> was. This means that <code class="prettyprint">task->signal->maxrss</code> is effectively inherited! A new process is created with a <code class="prettyprint">fork()</code> call followed by an <code class="prettyprint">execve()</code> call. <code class="prettyprint">fork()</code> preserves memory mappings, so it will preserve the <code class="prettyprint">maxrss</code> value. Then as we see here, on <code class="prettyprint">execve()</code>, this value is copied from the parent memory map to the child process. If one of our parents pegged <code class="prettyprint">task->signal->maxrss</code> to some high value, say 800MiB, then our process too will see that as our max RSS size.</p>
<p>Why is this? I have no clue. Like for most other oddities, probably some legacy reason lost to time. That’s just how it worked, and we cannot break backwards compatibility, so that is how it shall always work no matter how surprising or useless it is. <a href="https://www.hyrumslaw.com/">Hyrum’s law</a> dictates that someone, somewhere, must be relying on this.</p>
<p>After this discovery, I did a more targeted search and found <a href="https://github.com/golang/go/issues/32054">this Go issue</a> where someone else found this surprising behavior (and Ian Lance Taylor confirmed <code class="prettyprint">ru_maxrss</code> is inherited). I’m not the first person to have been tripped up by this. Hopefully this post will save someone else a bit of time.</p>
<h3 id="ps_3">P.S. <a class="head_anchor" href="#ps_3">#</a>
</h3>
<p>Another good takeaway from this experience, always RTFM. While I was adding links to this post, I searched “exec” in the <code class="prettyprint">getrusage</code> manual page and found this helpful line in the NOTES section:</p>
<blockquote class="short">
<p>Resource usage metrics are preserved across an <code class="prettyprint">execve(2)</code>.</p>
</blockquote>tag:jkz.wtf,2014:Post/selective-force-re-sync-with-syncthing2017-02-11T04:18:06-08:002017-02-11T04:18:06-08:00Selective "force" re-sync with syncthing<p>To synchronize files from one of my servers to my local machine I use <a href="https://syncthing.net/">syncthing</a> with the server’s folder in <a href="https://docs.syncthing.net/users/foldertypes.html#folder-sendonly">“Send Only”</a> mode. Usually this works flawlessly. As files are added to the directory, syncthing copies them to my local machine without a hitch. However, sometimes it doesn’t work, and the remote files will just never show up. In this case syncthing provides the <a href="https://docs.syncthing.net/users/foldertypes.html#folder-sendonly">“Override Changes”</a> button, but this is usually not what I want to do. I don’t want to re-sync all of the files from master to my local machine, I just want to sync a single file, or folders worth of files.</p>
<p>In this case there’s a neat little trick you can do, which is obvious in hindsight but it took me a while to think of it: just update the mtimes. SSH onto the server and run something like:</p>
<pre><code class="prettyprint lang-bash">$ find /path/to/folder/to/force/sync -type f -print0 | xargs -0 -n1 touch
</code></pre>
<p>This will go through every file in the folder we want to sync and update its mtime (using <code class="prettyprint">touch</code>) to be the current time. Once syncthing re-scans the directory, it will see these updated mtimes and push the files down.</p>
tag:jkz.wtf,2014:Post/bit-field-packing-in-gcc-and-clang2013-10-19T00:00:45-07:002013-10-19T00:00:45-07:00Bit-field Packing in GCC and Clang<p>Recently for a school assignment we were given a C bit-field and tasked with packing a similar struct into the bit-field without the help of the compiler. All struct and bit-field manipulations had be be done manually using only the bytes of the structures. Amazingly, I didn’t even know what a bit-field was before starting this assignment, so I’ll give a little background before getting into the nitty-gritty details.</p>
<h2 id="what-is-a-bitfield_2">What is a bit-field? <a class="head_anchor" href="#what-is-a-bitfield_2">#</a>
</h2>
<p>So, almost anyone familiar with C knows that it allows you to pack a group of variables into a new type called a ‘struct’. They look like this in C code:</p>
<pre><code class="prettyprint lang-C">struct mystruct {
int a;
short b;
unsigned int c;
};
</code></pre>
<p>The C standard specifies that each platform will define a specific set of ‘alignments’ that will specify how structures like the one above are actually represented in memory. For example, the linux 32 bit standard specifies that chars can be aligned on any address, shorts are aligned on addresses that are multiples of two, and integers and larger are aligned on addresses that are multiples of four. </p>
<p>In addition to normal structures, C also allows structures to have ‘bit-fields’,<br>
they look like this:</p>
<pre><code class="prettyprint lang-C">struct mystruct {
int a : 8;
short b : 7;
unsigned int c : 29;
};
</code></pre>
<p>The part after the field name, specifies a specific <em>bit-width</em> for the field.<br>
In the example above field <code class="prettyprint">a</code> is explicitly declared to be eight <em>bits</em> wide.<br>
As you might guess, this features primary function is to allow the programmer to save space. Because bit-fields are supposed to take up less space than their full-width counterparts, the C standard specifies special rules for how they are laid out in memory [2]:</p>
<blockquote>
<p>An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.</p>
</blockquote>
<p>Basically, a compiler is allowed to pack as many bit-fields as it can into<br>
a block of bytes in whatever way it wishes, which really doesn’t help anyone<br>
who is trying, for whatever reason, to build one of these bit-fields manually.</p>
<p>Because of this lack of specificity, I was curious about what rules compilers like <code class="prettyprint">clang</code> and <code class="prettyprint">gcc</code> actually used to build these things. Below is the results of my research that basically involved a lot of trial and error, print-outs of bit-fields, and a little dumb luck.</p>
<h2 id="how-gcc-and-clang-build-bitfields_2">How gcc and clang build bit-fields <a class="head_anchor" href="#how-gcc-and-clang-build-bitfields_2">#</a>
</h2>
<p><em>Note</em> that each byte in the following examples is printed low bit->high bit (instead of the normal high->low). This just makes it easier to see the segments of the bit fields, if the bytes are printed high->low, it looks like there are bits in the middle of various sequences and it makes things confusing. Also, all of this was discovered through trial and error, there may be corner cases I didn’t catch, but it seems to hold up.</p>
<p>Before I give some examples, I’d like to point out that in the C standard [1] it says that the bits of each bit-field are put into, or packed into, ‘addressable storage units’ [1]. You can think of as a block of bytes. Clang follows this procedure to store each bit-field:</p>
<ol>
<li>Jump backwards to nearest address that would support this type. For example if we have an <code class="prettyprint">int</code> jump to the closest address where an <code class="prettyprint">int</code> could be stored according to the platform alignment rules.</li>
<li>Get <code class="prettyprint">sizeof(current field)</code> bytes from that address.</li>
<li>If the number of bits that we need to store can be stored in these bits, put the bits in the lowest possible bits of this block.</li>
<li>Otherwise, pad the rest of this block with zeros, and store the bits that make up this bit-field in the lowest bits of the next block.</li>
</ol>
<p>Here’s some examples (remember that all bytes are printed from lowest bit<br>
to highest bit):</p>
<pre><code class="prettyprint lang-C">struct test {
char f0 : 7;
char f1 : 2;
}
</code></pre>
<p>This struct will be packed like this:</p>
<pre><code class="prettyprint lang-text">f0 set:
0x0 11111110 00000000
f1 set:
0x0 00000000 11000000
</code></pre>
<p>When attempting to store <code class="prettyprint">f1</code> clang checks the <code class="prettyprint">sizeof(type of f1)</code> from the nearest address that can support that block size. Since the <code class="prettyprint">sizeof(type of f1)</code> is 1, the nearest block is the preceding one. Now, that block is 8 bits wide and we’re trying to pack 2 bits into it. Since packing the 2 additional bits with the original 7 would require overflowing the block, we skip to the next block and put the bits there. Now, lets see what happens when I change the struct slightly:</p>
<pre><code class="prettyprint lang-C">struct test {
char f0 : 7;
short f1 : 2;
}
</code></pre>
<p>The struct gets packed like this:</p>
<pre><code class="prettyprint lang-text">f0 set:
0x0 11111110 00000000
f1 set:
0x0 00000001 10000000
</code></pre>
<p>Now <code class="prettyprint">f1</code> is straddling the boundary between the first and second bytes. When trying to pack the bits for <code class="prettyprint">f1</code>, clang jumped to the first address that could store a <code class="prettyprint">sizeof(type of f1)</code> block (2 bytes), grabbed that block and noticed that only 7 of the 16 available bits were occupied, so it just stored the two bits of <code class="prettyprint">f1</code> in the lowest block. Now, if we change the bit-width of <code class="prettyprint">f1</code> to 10, the struct will pack like this:</p>
<pre><code class="prettyprint">f0 set:
0x0 11111110 00000000 00000000 00000000
f1 set:
0x0 00000000 00000000 11111111 11000000
</code></pre>
<p>The method for packing this struct is similar to the previous struct. When we want to pack field <code class="prettyprint">f1</code> we grab the <code class="prettyprint">short</code> sized block at address 0 and notice that since there are 7 taken of the 16 total, we can’t store the additional 10 bits of <code class="prettyprint">f1</code> without overflowing the block. So, we jump to the next block and write the 10 bits of <code class="prettyprint">f1</code> to the lowest bits.</p>
<p>Just to show that this procedure can scale to even complex bit-fields, I’ll try and pack a larger, more complex bit-field:</p>
<pre><code class="prettyprint lang-C">struct big_bitfield {
unsigned short f0 : 8;
long f1 : 16;
unsigned long f2 : 29;
long long f3 : 9;
unsigned long f4 : 2;
unsigned long f5 : 31;
};
</code></pre>
<p>Here’s how <code class="prettyprint">clang</code> packs it:</p>
<pre><code class="prettyprint">f0 set:
0x0 11111111 00000000 00000000 00000000
f1 set:
0x0 00000000 11111111 11111111 00000000
f2 set:
0x4 11111111 11111111 11111111 11111000
f3 set:
0x4 00000000 00000000 00000000 00000111
0x8 11111100 00000000 00000000 00000000
f4 set:
0x8 00000011 00000000 00000000 00000000
f5 set:
0xc 11111111 11111111 11111111 11111110
</code></pre>
<p>(remember that <code class="prettyprint">sizeof(long)</code> with <code class="prettyprint">-m32</code> is 4).</p>
<p>First off, <code class="prettyprint">f0</code> is easy enough, we grab the first block of two bytes, there are no bits stored in it, so we just put all 8 bits of the field into the first bits of the block. The second field <code class="prettyprint">f1</code> is four bytes wide, so we grab the first four bytes at address 0x0, there are 8 bits already taken, but since we’re only trying to store 16 we won’t overflow the block. We just add the 16 bits after the 8 bits from <code class="prettyprint">f0</code>.</p>
<p>Next, for field <code class="prettyprint">f2</code> we grab the first block of four bytes, we’ve already used 24 of the 32 available bits so we won’t be able to fit our 29 bit in that block, we have to move to the next one. The next four byte block is located at address 0x4, so we write our bits there.</p>
<p>Field <code class="prettyprint">f3</code> is a little more interesting. Since <code class="prettyprint">sizeof(long long)</code> is 8, we have to grab the nearest 8 byte block, since we just put the bits of field <code class="prettyprint">f2</code> into the four byte block at address 0x4 <em>and</em> <code class="prettyprint">long long</code> is aligned on 4 byte blocks, we grab the 8 byte block from 0x4 to 0xc because it’s the nearest. Since only 29 of that block’s 64 bits are used, we write our 9 bits into lowest available bits of the 8 byte block.</p>
<p>Fields <code class="prettyprint">f4</code> and <code class="prettyprint">f5</code> should be relatively easy, and are left as an exercise<br>
for the reader.</p>
<p>There are two other interesting cases related to packing bit-fields, the first<br>
is mixed structures:</p>
<pre><code class="prettyprint lang-C">struct test2 {
short f0 : 8;
long f1 : 16;
int f2;
char f3 : 7;
}
</code></pre>
<p>In this example <code class="prettyprint">f2</code> is not a bit-field. The structure will be packed like this:</p>
<pre><code class="prettyprint">f0 set:
0x0 11111111 00000000 00000000 00000000
f1 set:
0x0 00000000 11111111 11111111 00000000
f2 set:
0x4 11111111 11111111 11111111 11111111
f3 set:
0x8 11111110 00000000 00000000 00000000
</code></pre>
<p>Essentially <code class="prettyprint">f2</code> is treated like a bit-field with the maximum number of bits set, it can’t possibly be packed so it’s simply put into the next available block.</p>
<p>The other interesting case is unnamed, zero-width fields:</p>
<pre><code class="prettyprint lang-C">struct test2 {
short f0 : 8;
long f1 : 16;
int : 0;
char f3 : 7;
}
</code></pre>
<p>This struct is packed like this:</p>
<pre><code class="prettyprint">f0 set:
0x0 11111111 00000000 00000000 00000000
f1 set:
0x0 00000000 11111111 11111111 00000000
f3 set:
0x4 11111110 00000000 00000000 00000000
</code></pre>
<p>The field in-between <code class="prettyprint">f1</code> and <code class="prettyprint">f2</code> basically forces the struct to pad to the next block of size <code class="prettyprint">int</code>, at which point packing continues as normal. It allows you to have a small amount of control over how the individual bits are actually laid out.</p>
<hr>
<p>[1]: section 6.7.2.1, paragraph 10<br>
[2]: based on section 3.6, section 3.2, and section 6.2.6.1 paragraph 4<br>
[3]: section 6.7.2.1, paragraph 10</p>