dlclose(): Not Even Once
In another life1 I had the (mis)fortune of working on a project where I needed to implement a PKCS-112 library in C++3 that acted as a shim to translate signature requests from PKCS-11's interface to that of a another service on the box available over D-Bus.
If that last paragraph made sense to you then I'm so sorry.
In either case, while going about making this horrible thing become a reality I ran into a fun bug that seemed
to happen every so often. After the application loading the library had finished reading the certificates and
keys provided by the library, and performed a signing operation, it would crash with a
SIGBUS or a
SEGFAULT. This was unusual, because it really wasn't meant to do that.
After an hour or so of hopless
printf debugging I finally Did The Right Thing and reached for good
gdb caught the address fault, but (being somewhat unfamiliar with
gdb at the time4) it took me a while to figure out what
exactly was causing the address fault. Eventually I presented
gdb with a query, whose response I
hope to never see again.
(gdb) print *$pc Cannot access memory at address 0xe8e7a948
To translate: my program counter (the number the CPU uses to keep track of where it's heading) was pointing to memory that isn't accessible. Understandably this makes it difficult for the CPU to figure out what the next instruction it should run is.
After some more debugging (honestly I don't remember this part, but I'm sure it involved hitting my desk at
least a few times an hour) I came to the realization that the thread that mysteriously was trying to execute
memory that semed to have nothing behind it was coming from a worker thread from the GDBus library, a library
that's part of GLib's GIO set of libararies, that
made using D-Bus
Did GLib betray me? Are the GNOME devs actively out to eat my lunch by having GIO jump to wild far-flung addresses in an attempt to ruin my day? These questions can't possibly be answered, but what I do know is that we eventually have to get back to the title of this post, and we've reached that point now.
dlclose() is the
deceptively pleasant and even banal sounding counterpart to
don't be fooled,
dlclose() hates you. It hates you, it hates your family, and it will eat
your lunch if given the chance.
You see, PKCS-11 was implemented as a C library meant to be dynamically opened with
was the style at the time5.
dlopen() helpfully loads an arbitrary
.so file and its runtime dependencies into the current address space, and provides a handle to get
functions within it.
dlclose(), on the other hand, mercilessly rips the aforementioned
.so file and its dependencies right out of mapped memory no matter how you or your
dependencies feel about it6.
Which brings us back to GLib. As it turns out, GLib's worker thread was still running when
dlclose() was called on my library. So GLib messed up right? Well, I didn't really tell it its
instructions might be unmapped right out from under it. And GLib doesn't have a way for me to tell it "hey, you
got like 1 second, be ready to be completely unmapped from memory, sg ok?".
And really, it's fair it doesn't do that. It's hard enough to write a C library that can handle all the weird edge cases around what process its workers are in, or how threads may or may not be working in any particular state. Add on top of that needing to have complex code to clean up all your state and prepare to be unmapped at any moment in time, just isn't worth it.
Heck, even widely used libraries that are, in theory, meant to be able to be properly
don't always get this right. Take this example I found in
openCryptoki in which
every time one
dlcose()es the library a file descriptor is leaked7.
So how do we fix this? It's simple, never
dlclose(). Just don't do it. I know you might think
you're some Real Smart Programmer That Really Actually Knows What They're Doing, but even if you're right the
benefits simply aren't there. Once you've
dlopen()'d you've opened pandora's box, there's no way
you're goin to cleanly stuff all those bits back inside. And what do you gain when you run
dlclose() anyway? You maybe get back a few megabytes of address space. Not memory,
address space. If you're on a 16 bit machine, sure save that address space (how the heck are you
using a POSIX-compliant
dlclose() to begin with?!?). But if you're in the modern 64-bit world you
should never use
But what if you're in my unfortunate position where you're writing the poor little library that just wanted to
live free with its instructions mapped until the day of
exit()ing. Well it's pretty simple,
actually, you just
dlopen() the library that's being
By doing this you bump the reference counter for your library and its dependencies and put an end to any
possibility of your instruction space being unceremoniously unmapped out from under you again. Unless whatever
application is using you is poorly written and is double (or triple!) closing your library for some reason. In
that case, maybe just
dlopen() yourself a few times. 10 should do.
 about five years ago
 not to imply present familiarity
 interestingly, the POSIX spec for
dlclose() mentions this approach of ripping out the library and its dependencies is
completely optional. musl, for exmaple, opts to take the "do nothing because we don't have to" approach. what
some may consider an inefficiency in musl's implementation, I consider a strategy deep in wisdom (or laziness,
but in this case it's serendipitous laziness)
 i have no idea if this was ever patched (i nih'd my way out of needing to use openCryptoki as you
might guess from this post) but if you have a long-running process that happens to
dlclose() openCryptoki over its lifetime and you're running into file descriptor exhaustions, you
might want to fix that (or better yet, as mentioned above, just stop using