Fixed Size Memory Pool is my close friend
For low level code, such as network code, good performance and dynamic memory handling are closely related. Free/malloc or new/delete pairs cost significant CPU cycles and their implementation may or may not be cache efficient for a given scenario. A common way to improve this is to use thread/cache aware implementations such as tcmalloc or implement your own allocator. Here I'd like to talk about a variation of a typical memory pool for fixed size objects.
When memory allocation patterns are known in advance, rolling out your own memory pool/allocator may be a good solution. Recently, I needed a data structure as well as a memory pool for the following requirements:
Fast memory pool for fixed size objects
Access these objects with an integer index key
Given an object fetch its integer key
Walk a list of currently allocated objects
Typically I would have used a hash table and a memory pool where I'd include the key inside the object itself. But I wanted to investigate if I could provide all this functionality with a custom memory pool.
template<typename T> struct Pool { struct Item { T data; int flags; IListNode node; }; IList freeList; IList allocList; size_t poolSize; Item *items; bool initialize(int size); void shutdown(); T *allocate(); void deallocate(T *&obj); int getIndex(T *obj); T *getObj(int index); };
Pool provides initialize(), shutdown(), allocate(), deallocate(), getIndex(), getObj() methods for the requirements above, where IList is a simple intrusive list:
struct IListNode {
IListNode *prev; IListNode *next; }; struct IList { IListNode root; void prepend(IListNode *node) { link_ilist_node( node, &root, root.next); } void append(IListNode *node) { link_ilist_node( node, root.prev, &root); } void remove(IListNode *node) { unlink_ilist_node(node); } }; void unlink_ilist_node(IListNode *node) { node->next->prev = node->prev; node->prev->next = node->next; node->next = 0; node->prev = 0; } void link_ilist_node(IListNode *node, IListNode *prev, IListNode *next) { node->next = next; node->prev = prev; next->prev = node; prev->next = node; }
The implementation of Pool is also very simple. By decoupling free/allocated list (intrusive lists) from actual memory allocation / object construction, we can easily keep this code exception safe as well:
bool initialize(int size) { assert(size>0); if (items) return false; // already-initialized items = static_cast<Item*>( malloc( sizeof(Item)*size) ); if (!items) return false; // out of mem poolSize = size; freeList.clear(); allocList.clear(); for (int idx = 0; idx < size; ++idx) { freeList.append( &(items[idx].node) ); items[idx].flags = FLAGS_FREE; } } T *allocate() { if (freeList.empty()) return false; // out of memory Item *item = fetch_base<Item,IListNode>(freeList.root.next, offsetof(Item,node); assert(item); assert(item->flags == FLAGS_FREE); T *obj = new(&(item->data)) T(); assert( obj == &(item->data)); item->flags = FLAGS_ALLOC; freeList.remove( &(item->node) ); allocList.append( &(item->node) ); return obj; } void deallocate(T *&obj) { Item *item = fetch_base<Item,T>(obj,offsetof(Item,data)); assert(item); assert(item->flags == FLAGS_ALLOC); obj->~T(); obj = 0; item->flags = FLAGS_FREE; allocList.remove( &(item->node)); freeList.prepend( &item->node)); }
This code not only catches double delete or invalid pointer delete, but it also has some basic buffer overrun checks using 'flags'. This could be expanded to include more stats/counters and more sophisticated canary values for stricter buffer overrun checks.
One important, but elusive trick is to 'prepend' to the freelist in deallocate(). This is important, because we'd like to put such cache hot items in front of our free list. The impact of this small change was visible in my performance tests.
Another variation to this pool is to add a boundary marker to keep track of how much of the pool is used. Using this, we can avoid the big for-loop in initialize() and instead gradually add more items to circulation.
In summary, as simple as this code is, it easily beats default new/delete performance significantly. Running a series of memory functions with this code, I observed over 3X performance boost in my environment.












