OpenGL FDTD Tutorial: Episode 2 – Simulation Cycle and Shader Programs


Here we go, the second episode of my tutorial on how to implement real-time physical modelling synthesis on the GPU, using C++ and OpenGL shaders. The first episode can be found here.

We’ll finish with the main file, working out the simulation cycle. Then, we’ll work on the shaders finally! This means having a close look to the solver and its visual representation.
The list of covered files:

  • main.cpp [second–and last–part]
  • fbo_vs.glsl and fbo_fs.glsl [FBO shader program]
  • render_vs.glsl and render_fs.glsl [render shader program]

Main is still in C++, while the shader files are written in OpenGL Shading Language [GLSL to friends].

Straight to Business!



We have to start by explaining the STRUCTURE of the SIMULATION CYCLE

We left off from a perfectly initialized code, ready to flip audio samples in a continuous simulation cycle!

In the ideal situation of running our drum head in real-time [and pushing the computed samples directly to the audio output], we would syncronize the advancement of the simulation with the audio rate set by the audio interface. This could be easily done by implementing the simulation cycle inside the audio callback function likely present in the real-time audio programming environment we are using. In particular, per each audio call we would:

  • advance the simulation [FBO shader program calls] for as many steps as the size of the audio buffer we have to fill
  • retrieve the block of computed samples from the audio texture portion and copy them into the audio buffer
  • render the latest status of the simulation on screen [a single render shader program call]

Then return and chill out until the next callback happens.

Well, this can be theoretically done on any OS, by using a cross-platform real-time audio programming environment like PortAudio, or native solutions like Core Audio, Alsa and the Windows equivalent whose name I don’t know. However, all these solutions will have slightly different details [e.g., multi-period buffers, extra callback parameters]. To avoid confusion, for this tutorial I remained as generic as possible and instead of choosing a specific platform I simply made the code run offline, but maintaining a buffer-based structure. In other words, a number of total samples is defined a priori together with the size of a dummy audio buffer; then the steps needed for the simulation cycle to compute a buffer [the 3 bullet points] are looped, until the number of synthesized samples satisfies the chosen limit; per each iteration, instead of pushing the computed buffer to the audio card, the samples are appended to a global array where, in the end, the whole simulated audio is at our disposal.
Too many words, here is some pseudo code:

_Set number of total samples
_Set audio buffer size
_Compute number of needed buffers
_While buffer counter is less than needed buffers
___While sample counter is less than buffer size
_____Advance simulation
___Retrieve audio texture samples
___Append audio texture samples to global array
___Render to screen
_End of simulation


Not that complicated at this level of detail!
Furthermore, moving back from this approach to a real-time architecture should be a piece of cake.

Before analyzing the code, it’s good to write down some more words about the “Advance simulation” line of the pseudo code, since it’s the core of the algorithm.
As explained in this section of Episode 1, we deal with 2 copies of the domain, a snapshot of the model at time step N and a snapshot at time step N-1, at start respectively stored in quad0 and quad1. When we want to advance the simulation of one step, we use both the snapshots to calculate the new pressure values and store them in the N-1 domain; this is done in the shader, as we’ll soon. Then we swap the indices, so that the N-1 domain becomes N and, vice versa, the N domain becomes N-1. This is done on the CPU side. According to this scheme, quad0 and quad1 alternatively contain the most recent pressure values.
This swapping also affects the way we fill the audio texture portion in the shader. When copying the audio from the listener’s position into the buffer [quad2], we have to alternatively pick from quad0 and quad1, so that all the synthesized samples are correctly lined up in the buffer. The safest way to do this, is to always sample the LEAST UPDATED quad. It sounds weird, I know, but there is a reason…
When we advance the simulation we have to execute a simulation step [update of one of the main quads] followed by an audio step [to fill the audio texture]. The 2 steps are done with 2 OpenGL calls that are non blocking, this means that if in the simulation step we update quad0 and then, in the audio step, we try to immediately read quad0 listener’s pressure value, we run the risk to pick an old value, cos the shader might not have finished to update all the fragments yet. This indeterminacy resolves into noise in the output audio stream [sometimes the picked sample is correct, sometimes it’s not…]. In this example, the solution is to pick from quad1 [which is done for sure] and, more in general, to read from the least updated quad.
This introduces a delay of one sample. Can you hear the difference?

To handle this swapping scheme into the code, we define 4 states on the CPU side, that drive the operations within the shader:

  • state0: draw quad0 [simulation step]
  • state1: read audio from quad1, cos quad0 might not be ready yet [ audio step]
  • state2: draw quad1 [simulation step]
  • state3: read audio from quad0, cos quad1 might not be ready yet [ audio step]

The simulation starts from state0, then the states are incremented and looped [0 and 1, then 2 and 3, then 0 and 1, then 2 and 3…] until the application is running.
Hopefully the code will help understand this simple concept that sounds scary in words!

Note that we haven’t talked about how these states are actually implemented in the FBO shader yet! We are simply describing how we cycle through the commands in the simulation cycle.




Now it’s time for the CODE for the SIMULATION CYCLE [main.cpp]

Let’s see in detail how the central portion of the pseudo code—the lines “While sample counter is less than buffer size” “Advance simulation”—translates into code:

This is the cycle that makes the simulation advance enough to fill a single audio buffer. It is a straightforward implementation of the pseudo code, we simply add some simple extra features.

First, we pass the current excitation value to the shader as a single floating point number. More info about the excitation in a few lines.

Then, it comes the simulation step. To advance the simulation we first have to tell the shader that we are either in state0 or in state2, according to which quad needs to be updated next [i.e., the N-1 portion of the domain]; once we have sent state as uniform to the shader, we can trigger the actual update of the chosen quad by passing its vertices to the function glDrawArrays(…). At this point the FBO shader program asynchronously starts to compute the new pressure values [more details in the shader program section].

Next is the audio step. The shader needs 2 uniforms for this step. The first one is the next available audio texture’s fragment, to incrementally fill the buffer; we pass wrCoord[2], which contains both the x coordinate of the fragment [y is fixed, on top of the texture] and the index of the next available RGBA channel, so that we line up 4 samples in each channel. The second uniform is again state, which before being passed is incremented to turn state0 in state1, or state2 in state3. The shader is then called by passing to glDrawArrays(…) the vertices of quad2; the call will trigger the copy of the pressure value of the listener point in the coordinates specified in wrCoord[2] [more details in the shader program section].

The next cycle is prepared by swapping the quads’ indices and properly incrementing wrCoord[2]. We also have to compute the next excitation sample. In this simple application the excitation is an impulse train, i.e., a long line of zeros that periodically turn into a spike, according to a fixed frequency.  In the above code we use two user define values to achieve this signal: the frequency of the train [expressed as the number of zero-samples between spikes, excitationDt] and the amplitude of the spike [maxExcitation].

The final sync call to glMemoryBarrier(…) is fundamental. As I wrote before, whenever we trigger the shaders with  glDrawArrays(…) the pressure updates happen asynchronously on the GPU, while the CPU program keeps going forward. With the memory barrier, the CPU code stops and waits for all the GPU threads to be done before resuming its execution. This allows us to be sure that the computation of the new pressure values all over the domain is done and the current listener sample has been copied in the audio texture; then we can safely start the computation of the next iteration [simulation and audio step].


The previous code block is contained in the following lines, again translated from the pseudo code [plus some extra operations]:

Some remarks.
To compute audio samples we make use of the FBO shader, while the render shader is needed to display the result on screen. This means that we have to constantly switch back and forth between the 2 shader programs and their settings [e.g., viewport size]. In this example, we decide to render only at the end of each full buffer; going faster would be useless, since the frame rate is physically bounded to the refresh rate of the monitor we use [VSync is disabled, wait for Episode 3 to see how].

The 2 shaders have slightly different viewport settings. For the FBO shader we pass to glViewport(…) the whole texture, to grant full access to it; indeed the simulation step updates one quad and reads from the other one, while the audio step has to fill the audio texture—so we need it all. When rendering to screen we don’t need the full texture, since what we need to show is the domain only [unless we are debugging this weird structure]; however, to make the window match either quad0 or quad1 [the domain], we have to go for the full viewport again, also including the magnifier factor we applied to the window when created. Since we put no offset in the viewport, the quad that will overlap with the window will be quad0…so there’s no point in drawing quad1. Since the buffer size is generally a power of 2, at the end of the inner for loop the most updated is quad1…so why we pass quad0’s vertices? There is a trick that explains this…the most attentive of you might have guessed it already, but I will disclose it at the end of this post.
Last comment, at the end of the simulation we have all of our freshly made samples stored in totalAudioBuffer, ready to be used as we please. This marks the end of our big, inelegant main function.

Here we’re using standard OpenGL functions only (:


This concludes the main file, as promised here you can find the full C++ source file:

It is in an a slightly rearranged form [better structured hopefully] and it also includes some EXTRA features, useful for debugging (;
More files to come!




Alright, time to see how things are dealt with on the GPU side.
We have 2 shader programs, each composed of a vertex and a fragment shader. As you probably noticed, we work with VERY FEW vertices and a quite complicated texture layout, so it’s easy to guess that the vertex shaders will be quite simple, while most of the magic will happen in the fragment ones.
Let’s start from the FBO shader program and in particular by it’s vertex component:

This is quite self-explanatory, I just want to remark how we place all the vertices on the same plane and how the in and out variables are conceived to conveniently packed the data [vertex attributes] coming from the CPU and pass it along to the fragment shader.
Yes, we DON’T DO ANYTHING here!


Moving on to the fragment shader of the FBO program, here is the list of variables [in, out, uniforms, const and parameters] that we use:

The texture is a sampler2d variable, named inOutTexture to underline that we both read its pixels and update a portion of them.
The only out variable is frag_color, cos this shader simply updates the RGBA channels [a color…] of the fragments belonging to the passed quad.
The uniform deltaCoordX  contains the width of each fragment and it will come super handy to sample points starting from the coordinated of the current one.
Finally, have a close look at the 3 float simulation parameters mu, rho and gamma cos they will be part of the FDTD implementation that I will show shortly!


The main function of this guy:

Nice! —> at this point you should immediately get why we have this structure, right?!


Let’s focus on computeFDTD()  first, a.k.a. the simulation step, a.k.a the FDTD solver.
Here is the mathematical implementation, which shows how we actually use all the data we made available between CPU and GPU:


Equation number 2 describes the boundary conditions.
As mentioned before, both the equations contain the 3 simulation parameters that are defined as user variables in the shader!

And here is the code implementation (:

Nothing to highlight here, except for the parallel computation of the neighbors, which makes the code slightly more complicated.
Let’s conclude with  saveAudio(), which implements the audio step, i.e., lines up the audio samples into the audio texture [quad2] sampling from the listener fragment:

Again, nothing to declare here, other than the fact that we have to retrieve the index of the quad we read from [readState = 0 or 1] from the state uniform [whose values here are 1 or 3].





It’s time to conclude this long post with the render shader program!
The render shader program does nothing but sticking the main texture where we do all of our calculations on the passed quad and display it on the window…sort of…since each fragment will be characterized by a color that depends on the RGBA channels of the texture, according to the type of grid point it is related to, but it is not directly copied from the texture.
Here its the super simple vertex shader:

Here we use only a subset of the attributes we pass from the main file, namely the position of the vertex and the coordinates of the current fragment. The attributes are shared between the 2 shader programs [as well as between the 2 shaders in each program], so we can only choose from the list of attributes we prepared for the FBO shaders.



Finally, here we have the fragment shader.
This guy’s main tasks consist of detecting what type of grid point the current texture coordinate is associated to and of coloring it accordingly, putting particular care in a proper representation of the pressure values contained in each regular point.
First are the used variables [in and out, uniforms, const and parameters] and a function that indeed turns the a pressure into a colored field:

Again, the only out var is frag_color, the new color of the current fragment. We computed it from the data carried in inputTexture. We also declare a bunch of useful colors and some settings  that are used in get_color_from_pressure() to turn the passed pressure value into a color field that fluctuates between dark blue [smallest pressure value] to bright white [maximum pressure value], passing through pitch black.


Awesome, now let’s see how the main uses the RGBA channels of the texture to color the fragments:

Everything’s straightforward, the type of the texture fragment is decoded from the BA channels and we assign either a fixed color [to boundaries and excitations], or a dynamic color obtained by passing to get_color_from_pressure() the current pressure saved in the R channel.

There is only one important comment left!
If you paid a lot of attention to these first 2 episodes, you’ll probably remember that tex_c does NOT exactly contain the texture coordinates of the current fragment, rather it contains the coordinates fragment AT THE PREVIOUS TIME STEP [i.e., on the opposite side of the texture, in the “dual” domain]. This is done to optimize the number of vertex attributes passed through the VBO, as explained in Episode 1. Sooo, whenever, we check the type of the local grid point [to compute the fragment’s color] we actually access the corresponding fragment on the other side of the domain. Luckily, the type do not change over time, they are always the same at every time step, so checking either side of the texture is fine.
We also access the previous listener fragment with listenerFragCoord; this is a uniform, which means it may change dynamically in a real-time application, making concrete the risk of displaying a non-updated position for a frame…
beside the fact that this would not be that big of a deal [one frame delay, come on!], the listener fragments for quad0 and quad1 are crossed [see the initialization code in Episode 1], to make sure that every sample placed in the audio texture has been correctly updated before sampling. So, in the end we’re so lucky to get to display the latest listener position without bothering at all!




Holy cow, we’re done with the second episode too!
Before jumping into the third and last episode, here are the 4 GLSL files for the 2 shader programs.
FBO shader program:

Render shader program:

In the next episode you’re gonna get the source code for the full project.


c u later_

Victor Zappi

Leave a reply