Tips to boost performance of OpenGL based Application
Tips to Boost Performance of OpenGL Based Application
OpenGL is a software interface of graphics hardware used for rendering of 2D and 3D vector graphics. It is just a specification provided to graphics-hardware vendors. This specification tells only about expected output and it is choice of hardware-vendor itself, how they implement it. We the programmers, directly use the software interface (i.e. a set of commands in OpenGL) into our application for the rendering of 2D and 3D graphics.
Let’s Make It Simple:
To give you an analogy, consider a car manufacturing company which doesn’t manufacture tyres by itself and instead consults a “decision-making authority”. This “decision-making authority” makes the specifications (say radius, strength, etc) regarding the tyre and publish it to all tyre manufacturing companies. Now any tyre manufacturing company which satisfies the specifications can provide their tyre to the car company. So as a car manufacturing company, you don’t care about tyre company as far as it is satisfying the specifications.
- “car manufacturing company” is analogous to your “application” [ like any CAD/CAM/CAE based application or video games].
- “tyre specification” is analogous to “OpenGL” [ which includes OpenGL commands].
- “decision-making authority” is analogous to “Architecture Review Board”.
- “tyre manufacturing company” is analogous to “graphics hardware vendors” [like NVIDIA, ARM, INTEL, Mozilla, Google etc].
Only Thing That Is Constant Is Change:
“Graphics hardware vendors“ are putting continuous efforts to improve and enhance performance and innovate new features. These graphics vendors approach the “Architecture Review Board” (Now “Khronos Group”) to add any new features they have come up with, to the OpenGL specification. If these new features are specific to one vendor only then it can be added as extensions in OpenGL. However, if multiple vendors agree to provide that feature then it can be added as OpenGL Specification itself in the new version of OpenGL. New versions are decided by collective opinion and agreement between the Group’s members, including graphics vendors, operating system designers etc.
This whole scenario is analogous to say one “tyre manufacturing company” coming up with the new feature of tubeless-tyres and asking the “decision-making authority” to add this feature to “tyre specification”. Now if multiple tyre manufacturing companies agree to provide the feature of tubeless-tyres, then this feature can be added to the new updated version of “tyre specification”.
Now, to have the best performance it is the responsibility of “car manufacturing company” to use the latest “tyre specification” considering that “tyre manufacturing company” is manufacturing tyres with the latest specification.
So with continuously changing world and increasing competition, to have better performance and speed at par with the latest scenarios, applications should use the latest features provided in latest OpenGL versions.
To test the effect on performance for rendering the same model using old (Fixed function Pipeline) and latest(Programmable Pipeline) OpenGL features, we created a prototype application to demonstrate the same.
For the Same Model, Fixed Function Pipeline gives 4.84 Frames per second
whereas Programmable Pipeline gives 14.97 Frames per second.
Here with the same graphics hardware, we are getting inferior performance result if we are using deprecated features in OpenGL So the performance of an application not only depends on graphics hardware you are using, it also depends on lots of other factors.
Let’s go through some of these factors and some tips and tricks to improve the performance of the display engine of OpenGL based applications.
Profile your application to know whether performance is affecting due to CPU operations like computations or rendering operations in GPU.
- Reduce entities that need to be submitted to the rendering pipeline
- Frustum Culling: No need to render entities that are not included in the view frustum
- Back face culling: No need to render entities that won’t be visible anyway
Occlusion Culling: No need to render entities that are not currently seen by the camera.
Level of Detail:
Use a simpler version of the mesh based on factors like the distance and the number of pixels (the area on the screen) the mesh is occupying.
- switch the mesh with simpler one(i.e reduced complexity)
- dynamically tessellate the mesh
- All deprecated functionality isn’t however slow. This is NOT because they are deprecated, they are deprecated because they were slow. All deprecated functionality isn’t slow, however
- So, deprecated functionality like direct mode (glBegin() … glEnd(), etc.) should be replaced by modern alternatives. [as seen in Programmable pipeline related snapshot above]
OpenGL state changes:
Avoid unnecessary state changes as it may cause re-computation of internal state and introduce delays.
Heterogeneous Parallel Programming:
- Use parallel programming features like CUDA and OpenCL to hugely enhance throughput
- Thousands of cores on GPU that run thousands of threads parallelly can do computation extremely fast as compared to quad/octa core CPUs
Few GPU optimizations:
- Avoid branching in shaders
- Set Z-buffer off if not using
- Draw scene front to back
- Use compressed textures
Few CPU optimizations:
- Use the number and size of batches wisely
- Use inline functions if possible
- Use SIMD (Single instruction multiple data) when possible
- Use cache-friendly memory jumps
- Use VBOs with the “right” amount of data
To keep the performance of any OpenGL based application up to date / optimized, try to use the latest functionality like shaders, etc provided by the OpenGL/graphics hardware and try to reduce and/or shift some CPU based computations to GPU.