Multi-threading is often causing problems for multiple reasons: synchronization, deadlocks, concurrent access to memory… But with my recent experience with multi-threading in XNA/C#, you can really do simple and efficient optimizations.
While looking for possible performance improvements in Spring Up Harmony for Xbox 360 (PC), I have seen a simple situation really suited for multi-threading. You can see that situation with the following screenshot of my in-game profiler (click to enlarge):
I have two tasks with performance issues since I started to run the game on Xbox 360 : the dynamics of the background effect and the computation of the estimation of the launch of the ball (running a simplified physics simulation multiple times, through Box2D.XNA). On this screenshot, these tasks are called VelGrid.U(9) and Launchers.U(14). The first part of VelGrid.U (Comp.Balls,10) uses the data of the physics simulation but DynaColLines(11) doesn’t. Therefore, the VelGrid.DynaColLines and Launchers.U are totally independent with each others. No data is shared between these code sections. Perfect candidates for my initiation to multi-threading in XNA! 🙂
The e-book found here (recommended in multiple threading articles) is really awesome and helped me find the algorithm I need to use. I want to start the Launchers.U at the same time than VelGrid.DynaColLines and wait until both are finished. I know this is not optimal, but it allows me to simplify the problem and avoid so many threading pitfalls.
I just have to use two AutoResetEvents. The Launchers.U section is created in a thread and waits for a “Go” signal from VelGrid. After doing its job, it sets a Ready signal so that VelGrid knows when to carry on after doing its own stuff. Two different cases can happen, depending on which section finishes before the other.
Here are screenshots of the in-game profiler showing the differences (taken on X360). Every code section prefixed by a * means it’s executed on the thread.
Case 1 : The code on the main thread completed before the code on the worker thread:
The Velgrid section has to wait for the Launchers to complete, shown with the red arrow.
Case 2 : the code on the main thread completed after the code on the worker thread:
The thread is idle while the main thread still works (red arrow).
With this simple threading mechanism (coded and tested in a few hours only), I have been able to optimize the game. In the basic test level shown here, instead of 2.3 + 3.1 = 5.4 milliseconds spent on the main thread for both features, the game now spends 3.6 milliseconds, running two threads at the same time (33% gain). In this test scene, the Velgrid.DynaColLines is “free”.
Since I took these screenshots, I have made a few more changes : now, the whole Box2D update used while the game is in motion is threaded too.
Last important thing to know, on Xbox 360, you must choose which core to use for your thread. Just read the msdn page about the SetProcessorAffinity, everything you need to know is here. You can see in this article that I still have a few cores available 🙂
In conclusion, if you have independent sections of code that take some time to run, you can improve that very quickly using this method.
Nice! Multithreading really is quite easy if you think a bit beforehand. My rule for easy multithreading is ‘never cross the streams’.
Even if things seem completely seperate, they may still read and write some shared state. I just make them one frame out of sync, and any writes to them happen later on in the main thread (reads happen as normal) which keeps data issues at a minimum 🙂
I don’t think I will try to multithread Update and Draw because of all the shared data, but I might try something more complex if I still need to optimize the next time 😉
That profiler looks amazing, any chance of sharing it?
Thanks Xerios. TBH, I never thought of releasing it because of the way the code is integrated in my other libs and game. But it was quite simple to do: it is basically using the StopWatch class to save the timings in a data structure allowing me to parse it and draw it using rectangles.
Wow, thanks a lot! I plan to implement this to separate my AI from my main thread so it doesn’t lock up when I am doing my large calculations. If I can ever get the AI to work that is lol.
I was wondering if this would be an okay strategy. Let’s say you can have 4 threads running simultaneously. Then wouldn’t it be ideal to slice the workload by 4.
So let’s say each thread can now do 4x the amount of time for update/draw and the game will still run at 60fps. E.g.
Thread A – Spends 40MS performing the updates.
Thread B – Spends 30MS performing the updates.
Thread C – Spends 25MS performing the updates.
Thread D – Spends 45MS performing the updates.
Thread A Completes in 40ms then draws it’s output. Thread B is already done then draw’s it’s output Thread C is already done then draws it’s output D is already done and draws it’s output. This gave 16.67 x 3 ms which is about 50ms for thread A to complete they already completed their updates and draw the output.
Comments are closed.