Logo TooMuchIdle
OpenGL FDTD Tutorial: Episode 1 – Simulation Initialization

Welcome to the first episode of my tutorial to implement real-time physical modelling synthesis on the GPU, using C++ and OpenGL shaders. More context can be found in this introduction.

We’re gonna focus on how to set up OpenGL to run the simulation, giving some details about the physical model we want to implement.
In the context of the full [but small] project, this means looking at two files only:

  • main.cpp [first part]
  • openGL_functions.h

The main file contains the C++ initialization code, while the header file has all the prototypes of the functions that wrap the standard OpenGL calls; the trick lies in what we tell the standard functions to do. The actual implementations of these wrappers will be discussed in Episode 3



This tutorial is not meant as an introduction to OpenGL C++ programming,
nor the code snippets are supposed to be tested one by one in an incremental way!
Here I am just dissecting a whole [but small] project, that shows you what I think is the simplest way possible to implement a synthesis algorithm based on numerical analysis physical modelling using OpenGL shaders.



First of all, let’s introduce the PHYSICAL MODEL

The physical model we need to implement has to be capable of simulating the propagation of vibrations on a flat surface [the head of a virtual drum in our specific case]. To do so, we can adapt an FDTD solver for the generic simulation of pressure propagation in an empty 2D domain. This domain is a matrix of grid points, each characterized by a value of pressure that changes over time, according to external excitation and to the physical characteristics of the domain itself [i.e., the simulated material]. We can then include some boundaries in the domain to simulate the rim of the drum head and interpret the pressure changes as the variation in time of the head’s displacement with respect to the rest position.  When excited by an impulse [like a drumstick hit], the variations in displacement describe vibrations travelling on the surface the drum that we can pick up and turn into sound by sampling any of the grid points, much like using a contact mic on a percussive instrument. For simplicity, throughout the whole tutorial I will keep addressing the head’s displacement in a specific point as local pressure .
I am not a physicist nor a mathematician, so I am not gonna go into details here; however, it’s good to know that our solver is based on an explicit method, meaning that, to compute the next state of a specific point, we only need to know the system’s state at the current and previous time steps [i.e., we don’t need to know additional info about the next state of other points].
In English this means that per each new pressure point we need to know:

  • the current pressure value of the local point
  • the previous pressure value of the local point
  • the current pressure value of the top neighbor point
  • the current pressure value of the bottom neighbor point
  • the current pressure value of the left neighbor point
  • the current pressure value of the right neighbor point

Finally, the simulation is affected by the type of the grid points too. These can be:

  • regular points, through which the excitation waves feely travel
  • excitation points, that we use to inject an excitation. They also behave as regular points when not externally excited
  • boundary points, which reflect any wave they are hit by

Other physical characteristics of the simulation are shared among all the points and define:

  • the speed at which the sound wave travel through in the simulated material
  • the absorption of the simulated material

This is the most compressed list of things you have to know about the physical model to understand the structure we are about to code!


How are we gonna slam this directly inside the graphics pipeline?
Well, if we map every grid point to a fragment of a texture, we could theoretically calculate all the next pressure values in parallel using a shader program! An FBO would allow us to update the texture in place at a fast rate, even audio rate. Doing so, the texture would become the domain, that carries all the needed info from above within its RGBA channels [pressure values and point types at time N and N-1] and gets updated step by step. We could use uniforms to inject sound at run time into the texture through excitation points [or better fragments] and, vice versa, we could sample a specific fragment of the texture to get an output audio stream! All this stuff sounds awesome, we could even render the whole texture to screen time to time, to have a useful visual feedback of the system’s status. That would be overall simple and definitely pretty sick.


There’s only one little problem to complicate things and some of you might have guessed already:
how do we READ from and WRITE to the SAME TEXTURE in parallel on ALL THE FRAGMENTS without incurring RACE CONDITIONS?


The problem specifically arises when dealing with neighbor points, which have to be read while possibly updated by concurrent threads. Luckily we have a solution.
We can store two copies of the domain in the same texture, two side-by-side portions that contain snapshots of the system at time step N [current] and time step N-1 [previous]. When we want to advance to step N+1, we will:

  • read the 5 current values [local point+neighbors] from portion N
  • read the previous local value from portion N-1 [at this point we have everything we need to compute the next pressure]
  • update the N-1 local point with the newly computed N+1 pressure
  • then swap indices [N becomes N-1 and N-1 becomes N] and restart the loop

Here is a graphical representation of this point [or fragment] access algorithm [figure 0]:


This decouples read and write operations and saves the day, cos everything can safely happen in parallel!
Let’s have a closer look. The first step of the algorithm, the reading of the current values [the yellow squares on the green side of figure 0], can be recklessly parallelized, since portion N of the texture is not changed. The writing process is not a problem either; each parallel thread modifies only the local fragment on portion N-1 [the yellow square in the red side of figure 0], allowing this process to happen concurrently on each point.



Before diving into the code, it’s good to have clear in mind how this can be done in OpenGL.
From OpenGL 4.5, we can “easily” update [render] a fragment of a texture while reading [sampling] another texture portion, as long as the latter is not rendered. We need to set up an FBO to read from and write to the same texture, in a kind of bizarre way.
These are the required steps:

  • define an FBO that renders to our texture
  • define two adjacent quads [sets of 4 co-planar vertices], where to place the two portions of the texture [these vertices are showed as 6 white dots in figure 0 – 2 couples are overlapped]
  • add to each vertex the coordinates of the current texture fragment [pn-1 in the figure 0] and the coordinates to read the 5 fragments on the opposite texture portion [pn and pu,d,l,r in figure 0] —> the figure refers only to the texture coordinates needed by the left texture portion; the right case is analogous, in the opposite direction
  • make successive FBO draw calls alternatively passing the first quad [left texture portion] and the second quad [right texture portion]



Finally, how about the output audio stream?
We could hypothetically sample the texture from the CPU at every simulation step, but this would not work in real-time, cos reading from the video memory takes a lot of time. The solution is to save the audio sample at each step into a buffer in video memory and then read it from the CPU only once every audio callback. It sounds easy, but this approach requires the definition of an extra texture portion, a single pixel line on top of the whole texture where to save the audio output, like an audio buffer. If we couple it to another thin quad, we can update it by calling the same FBO at each simulation step; the update will simply copy the pressure value taken from a chosen domain point/texture fragment [called listener point], a fairly lightweight operation. At this point, we can sample this texture portion once in a while from the CPU and copy its content in the audio buffer directed to the audio card, without excessively affecting the overall performance of the system.
The final texture/quad layout is showed in figure 1:


As you can see, there is a further texture portion [in grey in figure 1]. This is a simple separator, with empty/black fragments, that is needed to prevent the neighbor sampling of the topmost fragments of the domain from sampling as top neighbors the audio fragment values. This portion is not coupled with any quad/vertices and is never updated with draw calls. The combination of this extra texture portion plus the audio texture [2 lines of pixels on top of each other] we’ll be referred to as “ceiling”.

Fantastic, we have defined a super cool structure to access and modify all the data needed to our FDTD solver, but I spent no words on its mathematical implementation [i.e., how to update a pressure point]! For sake of simplicity and mental order, this stuff will be described in the next episode (:




Now let’s delve into the CODE
The whole project makes use of the glfw3 library, which is one of the most convenient OpenGL libs you can find out there. Hence, the top of the main file looks like this:

If you have troubles installing the libs on your machine, you might find a fix in Episode 3, at least for Linux. Another good source is Dr. Anton Gerdelan’s C-OpenGL tutorials.
We understood the physical model and we have a plan on how to run it in OpenGL, so let’s start the implementation. We have to initialize in C++ the OpenGL structures we’re gonna use.

Trivially, the first thing we want to create is a window tied to an OpenGL context and then load the two shader programs we are gonna need, one to render using the FBO, the other to render to screen.
[The following code can be carelessly put inside a big, inelegant main function]

The used wrapper functions are declared in openGL_functions.h as follows:

The window should be as big as the domain, but what’s magnifier? As explained before, in our implementation each grid point basically is a fragment on the texture. By default, when we render the texture on its quad, each fragment becomes a pixel. Given the typical resolution of modern GPUs, this translates into a TINY window! So we use a magnifier as a multiplier factor to stretch the window, so that each fragment can be represented by more pixels. Same domain resolution, nothing magic, just a bigger visualization and slightly more intense rendering process.
In the code I interchangeably use the terms “fragment” and “pixel”, despite this small eventual difference.

The function initOpenGL(…) is quite self explanatory, it creates the window and the context. The only important thing to remark is the fact that it disables V-Sync, so that the screen render does not have to wait for the refresh rate of the display, which would cause several buffer underruns.

With loadShaderProgram(…) we compile the 2 couples of vertex and fragment shaders we want to use for our FBO and screen rendering. The shaders are simple textual files [more info in Episode 3].



And here comes the tough part, the implementation of the 3 quads represented in figure 1.
This means defining an array of vertices [their positions] in that specific layout
which also contains the texture coordinates showed in figure 0!

avoid to include among the texture coordinates the one related to the local previous pressure value pn-1 [more detailed explanation in the section about the texture initialization].

We also separatetly save the indices of the two main quads:

We will use vertices[][] to alternatively draw the two main quads within the simulation cycle, covered in the next episode.



Then, we pass all the attributes to the VBO and properly set up the VAO, so that the shaders can distinguish vertices’ positions and texture coordinates:

The used wrapper functions from openGL_functions.h:



We move on to the initialization texture. The texture is supposed to contain two copies of the domain and two empty rows of fragments on top.
We first allocate and initialize in CPU memory a flattened multidimensional float array that can contain all the fragments [RGBA channels] we’ll use to fill the texture:

What’s with that previous pressure? We need two channels only to define the type of the local point [B and A], while the R channel is used to store the current pressure. This allows us to reserve the G channel to the storage of the previous local pressure value! Basically, instead of retrieving the previous value from the texture, at every cycle we save a copy in the fragment’s channel, which is always readily accessible. This is quite cool, it can be implemented directly in the shader [next episode] and makes possible to reduce the number of texture coordinates passed along with every vertex position, as mentioned in the section about the used attributes, saving up some memory.
We can do this only because we have a spare channel though! When implementing other FDTD solvers [e.g., implicit] or other physical models we might not have this luxury…indeed we often run short of channels. In those cases, the fallback plan is to simply line up the texture coordinate with the other attributes, as originally conceived.

Once texturePixels[] is allocated, we can focus on the actual domain. The domain is just a portion of the whole texture with pixels carrying particular B and A channel values [beta and excitation terms] that define regular points, boundaries and excitation.  We can build it from the CPU side, by simply using a two-dimensional array each containing 2 values for the B and A channel values [so 3 dimensions in total]. The domain we are gonna use is totally empty [regular points], except for a rim of boundary points that surrounds it and an arbitrary excitation point:

We finish with the texture by copying pointType[][][] [the domain] in the first portion of the array of fragments texturePixels[] [the red pixels in figure 1] and then passing everything to the wrapper function that copies these bits in GPU memory, where the actual texture is allocated:

As stressed out in the comment, in the FBO shader program the texture will be both read and written, in line with the targeted algorithm, while in the render shader we will simply read its content and accordingly draw pixels on screen. The variable texture [not an original name] contains the handler to the actual texture in GPU memory.
This is the prototype of th used wrapper functions from openGL_functions.h:



Next we initialize the FBO and the PBO.

The variable buffer_size is the size in samples of the audio buffer.
Here we use these OpenGL wrapper functions:

The function setUpFbo(…)  initializes an FBO that writes to the passed texture [to actually update the new pressure values in the texture].
The function setUpPbo(…) initializes a PBO that we will use to sample the texture from the CPU [to read and pass the samples stored in the upper part of the texture to the audio buffer].
More details about the actual code in the last episode.



We quickly compute some useful values that will be needed by the shaders:

The variable listenerPos[2] contains the integer coordinates [row and column of the pixel] of the point of the domain from where we want to sample the audio. It can be freely chosen. From this position we extract the coordinates of the corresponding fragment on the texture. It is important to notice that, since there are 2 copies of the domain, listenerFragCoord[2][2] contains 2 different sets of coordinates linked to the same listener position, one on quad0, the other on quad1. Strangely enough, the fragment for quad0 [listenerFragCoord[0]] points to the listener’s location on quad1, while the fragment for quad1 [listenerFragCoord[1]] points to the location on quad0…they are crossed! I will explain the reason for this crossing in the next episode. Sorry for keeping you hanging /:



Then we conclude the initialization with the uniforms, i.e., the variables the CPU can use to send data to the shaders.
We divide unifroms in 2 groups: static and dynamic.
Static uniforms are set only once, at the beginning of the simulation, hence we immediately init them without saving their location; these are typically related to fixed characteristics of the simulation, like the domain size. The dynamic ones are for values that need to be modified within the simulation cycle, like for example the current excitation value; we save location and update them in the loop.
We start with those for the FBO shader program [shader program = combination of passed vertex and fragment shaders]:

In the next episode I’ll show how exactly we use these uniforms and why we init the static ones with these values.
The texture uniform (sample2d type) is a case on its own. It is needed to give the FBO shader the capability to read from it [so far it could only write to it]; it is in a sense dynamic, in the sense that it will be constantly changed, but at the same static, since we are not directly updating its values from the CPU [the shader will do that]. Here we are simply initializing it by mapping it to the first texture we created [index 0].

Then same thing for render shader program uniforms:

In this case we only have static uniforms, cos the render pass depends exclusively on the pressure values read from the texture and is not influenced by any direct external parameter [see next episode].
The texture itself in this case is accessed only [there is no FBO], as suggested by the name inputTexture.  But it is actually the same texture we use with the FBO! It is still indexed as 0 indeed.







That’s all for now, stay tuned for the next post:
Episode 2 – Simulation Cycle and Shader Programs [in progress]

Hopefully it will add an important piece to the puzzle and clarify some doubts. It will also contain the full downloadable version of the main.cpp file.
In the meanwhile, always happy to answer questions and correct possible mistakes/imprecision.






Victor Zappi

Comments are closed.