Update: one of my friends linked to this presentation from NVIDIA (pdf) which specifically states: "Do not create a single FBO and then swap out attachments on it."
At the time I was implementing this I didn't know, and it was the most obvious approach within the architecture of the renderer we were using. Using separate FBOs to store the rendered output for each frame seems like a more reasonable approach with this warning in mind. Apart from that the synchronization guards below still make sense.
I know I know, you must all think I'm crazy, but bear with me while I try to explain.
Imagine a situation where you have a renderer. This renderer renders a scene (with animation and so on) in real-time. For the purposes of this exercise, real-time means 24 or 30 frames per second. The UI has to be responsive and "fluid", which means it should be able to update around 60 frames per second. I'm sure you can already see a problem here.
This could be a non-issue if you could decouple the UI completely from OpenGL. When you are trying to use a framework where drawing OpenGL content into a window means the whole window must be drawn with OpenGL, then you don't really have much choice.
So with the scene set let us dig into the problem. We obviously have to use threading, because our UI and renderer must be able to update more-or-less independently. Fortunately, that is possible with OpenGL, although the driver and GPU usually only has one command queue, so you have to ensure that you synchronize the OpenGL commands to prevent race conditions.
We are in the fortunate situation that the end result from the renderer is a texture attached to a frame buffer (let's call it the output frame buffer). This texture can simply be bound in the UI thread and drawn on a fullscreen quad symbolizing our viewport.
To allow the renderer to render somewhat independently of the UI thread, we must have multiple attachments to the output frame buffer, so the renderer can render to one while the UI is painting the texture from the other to screen, and so on.
This means the problem is more or less a producer-consumer problem. Some requirements on the application mean changes from the classic problem. We have pre-allocated elements in the queue that are filled in by the renderer. If there are multiple elements rendered and none left to render into, it may take the oldest and overwrite its contents, as long as the newest render is always available to the UI, and the renderer doesn't re-render the element it just rendered (e.g. because the UI wasn't done painting it when the renderer was done).
To solve this I created an AttachmentQueue. This queue holds elements representing the pre-allocated attachments on the output frame buffer. Each element has an internal unique ID, a name1, a time changed, a synchronization object, and a status.
The synchronization object is used because the number of commands the renderer pushes to the GPU is huge, and we need to make sure that the GPU has actually drawn to the attachment we want to paint in the UI before the UI command to bind the texture is put onto the command queue. This is the race condition I mentioned above. The status simply says whether the attachment has been rendered to yet.
In addition to a list of elements, the queue also has a mutex to ensure only one thread at the time makes modifications to it. It has a reference to the element representing the attachment current held by the UI and renderer. And also a wait condition and a reference to the last rendered attachment which are explained in further detail below. The references to attachment elements are all initialized to the empty element (EMPTY).
There is a pair of functions, reserveAttachment() and releaseAttachment(), for both the UI and renderer. The main difference is that the UI, in the case where it can't reserve an attachment will simply not get one, and will instead paint some alternative UI to show it could not get the texture. In contrast, the renderer needs an attachment to be able to render and will have to wait if there is no attachment available.
The queue should always have the newest rendered attachment at the front which the UI will grab, and the renderer will take from the back.
These requirements leave us with a reserve and release function for the UI (in my best C++ pseudo-code).
void reserveUIAttachment() { AttachmentWriteLock lock (_mutex); if (_queue.empty() || !_queue.first().isRendered) return; _uiAttachment = _queue.first(); _queue.pop_first(); if (_uiAttachment.sync) { glWaitSync (_uiAttachment.sync, 0, GL_TIMEOUT_IGNORED); _uiAttachment.sync = 0; } }
Pretty straight forward stuff. The only thing to note is the use of an OpenGL sync object, if it exist for this attachment we have to put a wait onto the command queue.
The release is not too complicated either, except we have to cover the case where the attachment at the front of the queue is older than the current attachment, in that case we have to add it back to the front. We also notify the wait condition. This means the renderer will be released if it's waiting, and does nothing if the renderer is not waiting.
void releaseUIAttachment() { AttachmentWriteLock lock (_mutex); if (_uiAttachment == EMPTY) return; if (_queue.empty() || _queue.first().timeChanged < _uiAttachment.timeChanged) _queue.push_front (_uiAttachment); else _queue.push_back (_uiAttachment); _uiAttachment = EMPTY; _waitCondition.notify_one(); }
The render side is very similar, but with a few crucial differences. We do not overwrite the last attachment we rendered to, and will wait if we can't get an attachment (rather than returning immediately like the UI reservation does).
void reserveRenderAttachment() { AttachmentWriteLock lock (_mutex); if (_queue.empty() || _queue.last() = _lastRenderedAttachment) _waitCondition.wait (_mutex); _renderAttachment = _queue.last(); _queue.pop_last(); _lastRenderedAttachment = EMPTY; }
For releasing the rendered attachment, we have to remember to push a sync object.
void releaseRenderAttachment() { AttachmentWriteLock lock (_mutex); _renderAttachment.sync = glFenceSync ( GL_SYNC_GPU_COMMANDS_COMPLETE, 0); glFlush(); _queue.push_front (_renderAttachment); _lastRenderedAttachment = _renderAttachment; _renderAttachment = EMPTY; }
Note that we have a call to glFlush() right after creating the fence, and before we release the mutex for the queue. This means the fence will be pushed to the GPU before the wait is put on the queue in reserveUIAttachment().
And that's basically all there is to it… Well, not really. You also have to use the attachments.
If you remember above, I said we store a name for the attachment. This is set up under initialization to ensure that we set up the right number of attachments. When we want to render to an attachment in the output frame buffer, then we ask what base name the render pass wants to write to. If it matches the base name of the attachments in the output frame buffer, then we map it to the name for the _renderAttachment from the AttachmentQueue. The call to glDrawBuffers (…) then has the correct buffer.
Similarly, when we want to get the texture ID from the UI for the attachment we have received, then the frame buffer finds the attachment with the name from _uiAttachment and returns the corresponding texture ID that can be bound with glBindTexture (…).
Now you know about as much as me on the topic of multi-threaded rendering. You can always ask. As an experiment I have also turned on replies to this post, so we'll see how that goes.
The renderer uses these names internally to represent each attachment. ↩︎