Archive for the ‘profiling’ Category

3 Simple Tips to Avoid Memory Allocations with XNA/C#

Friday, May 21st, 2010

Long time without posting. I’ve spent a lot of time working on the game, contracting art, buying sound effect and music assets, designing the levels… New screenshots/video should come very soon.

Anyway, back on topic: when developing from C++ to C#, I found out very strange that you can allocate memory but you cannot free it and have to let the garbage collector do it. But I finally got quickly used to it and found it quite handy… Until I found out that garbage collection is not free and usually causes framerate drops, especially on X360.

clrprof

Using the CLR Profiler (screenshot above), you can easily see where you are wasting memory (well, it can be scary the first time you see it but in fact, it’s easy). I nearly removed all my runtime allocations with this tool, here are the few tips that can be useful:

1. Be careful with the class string

String is allocating memory every time it returns a new string. StringBuilder is the key to the problem. I didn’t even know what StringBuilder was before working on memory and now I use it everywhere I need to take care about memory. There are some awesome tips (and source code) about StringBuilder on Gavin Pugh’ blog. Especially this and this. You have everything you need now :)

2. Reuse your lists

I had multiple sections of code creating a temporary List<> in order to add elements to need a special treatment. This was in an update function in my case. It is handy and fast to write code like that, but it allocates memory. To fix this, I know have the List<> as a member of the class. I clear it and reuse it when needed. When creating a list, you should use the constructor with an int used to define the default capacity of the list. (List<Vector2> l = new List<Vector2>(64) for instance).

3. foreach loops

I converted my foreach loops in C++ like for loops and it saved me some allocations. Memory allocated due to foreach loops on lists is showing up in the CLR Profiler as System.Collections.Generic.List<T>.

Conclusion

I hope you’ll find this useful. In Spring Up Harmony, I still have some run-time allocations, mostly in the particle engine, when a new fx is launched (I use Mercury for now). However, it seems that the garbage collector is not causing hick-ups therefore I’m ok with it.

Yes, multi-threading in XNA/C# can be that simple!

Wednesday, April 14th, 2010

I never really liked multi-threading, because of all the additional problems it creates: synchronization, deadlocks, concurrent access to memory… But with my recent experience with multi-threading in XNA/C#, you can really do simple and efficient optimizations.

While looking for possible performance improvements in Spring Up Harmony for Xbox 360 (PC), I have seen a simple situation really suited for multi-threading. You can see that situation with the following screenshot of my in-game profiler (click to enlarge):
Frozax-4
I have two tasks with performance issues since I started to run the game on Xbox 360 : the dynamics of the background effect and the computation of the estimation of the launch of the ball (running a simplified physics simulation multiple times, through Box2D.XNA). On this screenshot, these tasks are called VelGrid.U(9) and Launchers.U(14). The first part of VelGrid.U (Comp.Balls,10) uses the data of the physics simulation but DynaColLines(11) doesn’t. Therefore, the VelGrid.DynaColLines and Launchers.U are totally independent with each others. No data is shared between these code sections. Perfect candidates for my initiation to multi-threading in XNA! :)

The e-book found here (recommended in multiple threading articles) is really awesome and helped me find the algorithm I need to use. I want to start the Launchers.U at the same time than VelGrid.DynaColLines and wait until both are finished. I know this is not optimal, but it allows me to simplify the problem and avoid so many threading pitfalls.

I just have to use two AutoResetEvents. The Launchers.U section is created in a thread and waits for a “Go” signal from VelGrid. After doing its job, it sets a Ready signal so that VelGrid knows when to carry on after doing its own stuff. Two different cases can happen, depending on which section finishes before the other.

Here are screenshots of the in-game profiler showing the differences (taken on X360). Every code section prefixed by a * means it’s executed on the thread.

Case 1 : The code on the main thread completed before the code on the worker thread:

thread_slower
The Velgrid section has to wait for the Launchers to complete, shown with the red arrow.

Case 2 : the code on the main thread completed after the code on the worker thread:

thread_faster

The thread is idle while the main thread still works (red arrow).

With this simple threading mechanism (coded and tested in a few hours only), I have been able to optimize the game. In the basic test level shown here, instead of 2.3 + 3.1 = 5.4 milliseconds spent on the main thread for both features, the game now spends 3.6 milliseconds, running two threads at the same time (33% gain). In this test scene, the Velgrid.DynaColLines is “free”.

Since I took these screenshots, I have made a few more changes : now, the whole Box2D update used while the game is in motion is threaded too.

Last important thing to know, on Xbox 360, you must choose which core to use for your thread. Just read the msdn page about the SetProcessorAffinity, everything you need to know is here. You can see in this article that I still have a few cores available :)

In conclusion, if you have independent sections of code that take some time to run, you can improve that very quickly using this method.

On-screen profiling for XNA

Wednesday, April 7th, 2010

A few days ago, I read an article called Among Friends: How Naughty Dog Built Uncharted 2. The interesting screenshots on the third page gave me the urge to create what I call an “on-screen profiling” tool for Spring Up XNA. I talked before about profilers and how to use them to find which sections of your code need to be optimized. But a visual tool is really handy, because it’s real time, and with Edit and Continue on PC, you can even see your improvements live!

So, I spend some time working on it, and here is a sample screenshot (click to enlarge):

onscreen_profiling_sample_x360The bars show the timeline of the current frame. In the bars, there is the name of the section. For clarity, all profiled sections are also displayed in plain text with the following informations: Name, Time elapsed this frame (in millisecond), smooth time elapsed, percentage of the 60 fps frame (16.6 ms). The vertical magenta line is the limit of the 60 fps frame. On this screenshot, my objective of 60 fps is not met, there are some bars displayed after it :)

I am really happy with the results, you instantly see where optimization is needed and you also see the results quickly. For instance, after seeing the previous screenshot, I concentrated on the “Velgrid.D” section (section drawing the background effect that is seen in motion here). After some improvement, it still was not fast enough. I noticed that the problem was simply the overhead of calling SpriteBatch.Draw more than 5000 times. I therefore decided to write my own vertex+pixel shaders and you can see the result here:

onscreen_profiling_final_x360It’s now so fast that we can’t even see the name of the section in the bars and I have to have a look at the list to see the exact time taken by this section of the game (number 15).

The profiling works obviously on both Xbox 360 and PC so I could compare both hardware. As I could read here and there, the X360 build is much slower than the Windows one. I made two screenshots of the same scene with a scale of the bars so that the length in pixels of the full frame is around the same on each platform. Here is the screenshot of the windows build that you can compare to the first screenshot of this post: 
onscreen_profiling_pc_scaled

The “slower” sections are not the same. Globally, the X360 is better at using its GPU and the PC using its CPU. Of course, this is heavily dependant on the PC platform. For information, I am using a 3.4 GHz computer with a X850 graphics card.

And just for fun :) , the first screenshot of the profiler on X360, before I started any optimizations:

onscreen_profiling_first version x360The frame took 165% of my objective target frame (60 fps). I can’t even see it all on screen. I am now at 75% of the frame, and with an additional feature (glow around objects, section 9 on the second screenshot).

So the small amount time taken to make this in-game tool is very well spent. It’s obviously quicker than running a profiler and waiting for the results, even if it’s not as precise. And the great thing is that this was easy and interesting to program :)

Measuring game performance : framerate is not everything

Thursday, March 11th, 2010

I am currently having a few problems with performance in Spring Up XNA. I have two features that are too expensive in processing power : the background effect and the computation of the preview trajectory of the ball.

I know these features are not usable at this cost because as soon as I activate one of them, my framerate drops and the game does not run at 60 FPS. When your game runs at 60 FPS, everything’s fine. When it does not, how can you tell how “far” you are from it. And because XNA is trying to catch up when your game is slow, this is even more obscure (there is a very good article on Shawn Hargreaves Blog about this topic called Understanding GameTime).

fpsTherefore, in order to have accurate and useful performance information in real-time , I am displaying on screen the time spent to compute and draw the last frame, in milliseconds.

This is done easily using the Stopwatch class. This class is standard C# system library, it is not part of XNA. It is very easy to use:

stopwatch.Reset();
stopwatch.Start();
// Code to time here
stopwatch.Stop();

You just have to time the whole Update and Draw functions. Then you can read the number of milliseconds directly in the ElapsedMilliseconds member of Stopwatch. However, as this value is a long type, I prefer to compute the number of millisecond more precisely using this code:

(float)StopwatchUpdate.ElapsedTicks/(float)Stopwatch.Frequency*1000.0f

And to display it properly, I use the following formatting:

string fps = string.Format( "fps:   {0:00.0}\nupdate:{1,4:00.0}\n
draw:  {2,4:00.0}", _fps,
(float)StopwatchUpdate.ElapsedTicks/(float)Stopwatch.Frequency*1000.0f,
(float)StopwatchDraw.ElapsedTicks/(float)Stopwatch.Frequency*1000.0f );

I first thought that the numbers would change so quickly that it would not be usable but it’s pretty constant in my game. I can see huge differences when disabling some GameComponents. It is also useful to see if you spend more time in the update part or the draw part. You must have separated game logic and rendering properly, of course.

I told before in this blog that you must not spend time optimizing when you do not need too. But having this few lines of code is harmless and can be useful while adding new features:

  • If you see that you spend 5 more milliseconds in this session than in your previous one, you may need to have a closer look at this new code.
  • As Spring Up XNA is heavily using dynamics (through BOX2D), I can also compare the performance of the different levels.
  • You can see when a single frame takes much longer to update/draw (choppy framerate).

Hope this helps!

NProf : A Simple, Efficient and Free XNA Profiler (3.1 and 4.0)

Sunday, February 14th, 2010

Update: I have many visitors coming here from search engines. So here is a quick update if you want to use NProf with XNA 4.0. When you run NProf with your game, if the UI of NProf stays empty after running your game, it’s because you are running a XNA 4.0 game. There is a workaround here: Run NProf.exe with a .bat containing:

set COMPLUS_ProfAPI_ProfilerCompatibilitySetting=EnableV2Profiler
NProf.exe

Original post:

On its google code page, NProf is described as a statistical profiler for .NET applications. Good news: XNA being .NET, it can be used to profile games.

It’s well known that in order to optimize, you first need to find what part of your code is slow. That’s where a profiler comes handy, it finds the bottlenecks of your game and then you can optimize what needs to be.

The title of the post says it all, NProf is trivial to use, efficient and free. Once you download it, you select your XNA game executable, play it for some time and that’s it! I spent some time trying to find great profilers when I developed for PC and Mac but could not find anything really usable and affordable. I finally timed critical code sections with custom code and studied the log manually. Fortunately, NProf does all that for me in XNA.

I had a few surprises when I ran NProf. For instance, yesterday I could see that 21% of the game was spent in one method:
Color.Color( Vector3 )

This XNA constructor converts a Vector3 in a Color and it is called a lot in my game to create my background effect (see an old version in motion here). In this constructor, the expensive call is PackUtils.ClampAndRound. I do not have access to this code, but I can use another basic constructor and do the conversion myself. Here is what I ended up doing:

Before
Color col = new Color( vColor +
    new Vector3( start_val, start_val, start_val ) );
After
int r = (int)(( vColor.X + start_val) * 255 );
int g = (int)(( vColor.Y + start_val) * 255 );
int b = (int)(( vColor.Z + start_val) * 255 );
Color col = new Color( ( r > 255 ) ? (byte)255 : (byte)r,
    ( g > 255 ) ? (byte)255 : (byte)g,
    ( b > 255 ) ? (byte)255 : (byte)b );

The new code doesn’t clamp values below zero but I don’t need that feature. The really important thing is that it is now more than 12 times faster than before!

Without NProf, I would not have guessed that this constructor was that critical.

The main problem I see with NProf is that it only profiles your game on Windows and bottlenecks may be different on the X360.

Anyway, I really advise you to run your game in this profiler before you optimise anything. Be sure to use a release build.

Feel free to share your experiences with NProf!