Utility of Unions in C
“Why are unions in c useful?” I was actually driven to write this post because I couldn’t find any compelling examples of the utility of the union keyword in C. There were a lot of great examples of reference material breaking down the differences between struct and union and the mechanics of union, but not a lot depicting what I felt was a compelling example of how union could be powerful. I only found one that I felt was exemplary but it dealt with it at the byte level, which is great if you’re doing embedded programming, but not relatable all if you’re coming from a different background. So I’ll give a short direct explanation for how I was able to wrap my brain around union and why I think this mental model is better than a lot of the explanations on the internet. I think of union as a way of declaring members in a group that overlap one another in the same area of memory. That immediately clarifies the behavior of union and why things need to be accessed “one at a time”. Really what it means is you’re referencing the same block of memory. disclaimer: I don’t know if that’s actually how it works but I went ahead and wrote a program to see if I could validate my mental model, and lo and behold it worked. So here’s an example I would call useful. Consider this: you’re writing code for a business that sells books at scale, you need it to be performant, or maybe it’s a really old system - at any rate you have to write it in C. You sell physical books and are extending the system to sell eBooks, so naturally you think to yourself “it’d be great if I could just write the extension and leave most of the existing code alone.” Let’s see some sample code, contemplate the following snippet as part of your existing code
struct Paperback { char bookId[25]; char bookTitle[200]; char author[200]; }; void printPaperbackInfo(struct Paperback pb) { printf( "This is a book with ID %s and Title of %s written by %s.\n", pb.bookId, pb.bookTitle, pb.author ); } int main(int argc, char* argv[]) { struct Paperback theFall; strcpy(theFall.bookId, "PHY255ACTF"); strcpy(theFall.bookTitle, "The Fall"); strcpy(theFall.author, "Albert Camus"); printf("Okay so far we have...\n"); printPaperbackInfo(theFall); }
You think to yourself, “Okay, well I need to add some concepts about eBook so I’d probably create an eBook specific stuff and use it as I need it.” So that’s what you do:
struct ElectronicBook { char bookId[25]; char bookTitle[200]; char author[200]; char format[10]; char DRMS[50]; }; void printElectronicBookInfo(struct ElectronicBook eb) { printf( "This is a book with ID %s and Title of %s written by %s.\n", eb.bookId, eb.bookTitle, eb.author ); printf( "This book is in format %s signed with %s.\n", eb.format, eb.DRMS ); }
Well now you notice you have duplicated code. There also seems to be a lot ElectronicBook and Paperback share in common. Either way you’re left wondering something to the effect of “I guess I could re-use void printPaperbackInfo(struct Paperback pb) and just cast my structs everytime I want to use it, but either way I have two distinct structs I’d be using and really I wish I had the one.” Enter union. So at this point I would say, “Hey I want a union of these two data structures, because they’re basically the same thing in memory with some added stuff, and I don’t want to be casting stuff all the time and having to deal with compiler errors” so your code would evolve to look like this
struct Paperback { char bookId[25]; char bookTitle[200]; char author[200]; }; struct ElectronicBook { char bookId[25]; char bookTitle[200]; char author[200]; char format[10]; char DRMS[50]; }; union Book { struct Paperback pb; struct ElectronicBook eb; }; void printPaperbackInfo(struct Paperback pb) { printf( "This is a book with ID %s and Title of %s written by %s.\n", pb.bookId, pb.bookTitle, pb.author ); } void printElectronicBookInfo(struct ElectronicBook eb) { printf( "This book is in format %s signed with %s.\n", eb.format, eb.DRMS ); }
okay well now we have some definitions, this seems strictly better than the previous duplicated code we had, let’s see where this goes Its usage might look something like:
int main(int argc, char* argv[]) { union Book theFall; strcpy(theFall.pb.bookId, "PHY255ACTF"); strcpy(theFall.pb.bookTitle, "The Fall"); strcpy(theFall.pb.author, "Albert Camus"); printf("Okay so far we have...\n"); printPaperbackInfo(theFall.pb); // but wait maybe we reach out to some part of our system // and discover this is available as an eBook and we // want to treat it as such now printf("We're assigning ebook data\n"); strcpy(theFall.eb.format, "kindle"); strcpy(theFall.eb.DRMS, "Digital Rights Management Signature"); printf("That's cool the paperback code is still happy...\n"); printPaperbackInfo(theFall.pb); printf("But so is the eBook code...\n"); printElectronicBookInfo(theFall.eb); return 0; }
So, I don’t know about you but I’m feeling pretty confident in this mental model. In summary, unions allow us to manipulate the same area of memory in code for the “overlapping” members (aka a union of the members). This is useful (as I’ve shown above) because we’re able to define a union with label Book and then use that union in parts of the code where the code only knows about Paperback and where the code only knows about ElectronicBook by passing around pb and eb respectively. I think this is a sound analysis on my part, that or maybe I’m getting lucky that the references to pb are still there in memory because of deallocation that hasn’t happened. However, this example is congruent with the expected behavior from the accepted StackOverflow answer referenced above so that seems unlikely. Hopefully this helps people out. If it backfires on you horribly, I’d also like to hear about that so I’m not misinforming people.










