Optimizing Graphics Performance on iOS and Android

The main device is used during the development of Don’t Feed the Trolls is an Acer Iconia Tab A500. It’s quite powerful and I never met performance issues with it.

Of course, I tested the game on many other iOS and Android devices. I was surprised to discover strong performance issues on a HTC Desire.

In this article, I will explain the reasons of these low performances and how I fixed them. I will also give you tips on how to find out where are the potential problems.

Problem 1 : in-game, the game was slower and slower when getting closer to the end of the level

O03. Screenshot - Slap Party Game ModeOf course, the difference at the end of the level is that many bears and trolls have been displayed so the problem probably has to do with that. However, there was never more than 3 or 4 characters visible at the same time.

Also, to avoid memory allocation during gameplay, I pre-allocate all the character sprites used in the level at the start of the level. So the problem was not memory related. Finally, the game code dealing with inactive or invisible characters was doing nearly nothing.

I finally found out the problem is caused by characters that disappeared. They are displayed, but fully transparent. I used a CCFadeOut action to hide the sprites. At the end of the CCFadeOut, the object was still displayed, but with an opacity of 0. An easy fix is to hide the sprite (CCHide action) at the end of the fade out to avoid a useless draw call.

Fix 1 : Make sure you do not have invisible sprites (completely transparent or with very low opacity). Destroy or hide them instead.

Problem 2 : the main menu is very slow

05 - Screenshot - Menu - Achievements ListMy menus have many animated transitions, for instance when you choose the “achievements” menu, the game mode selection screen leaves to the left, while the achievements screen comes from the right. There is also a credits screen, a level selection screen (for the classic mode) and a difficulty selection screen (for the versus mode). All these screens are built once when loading the main menu. They are simply off-screen when not displayed to the player.
The performance problem was therefore obvious: all the offscreen items were slowing down the whole program. I thought cocos2dx or the device itself would not display offscreen items but I was wrong. Simply hiding these items improved the performance greatly. In the achievements list (see screenshot on the left), I hide every line and sprite as soon as it’s out of the screen. It requires specific code but it’s really worthwhile.

Fix 2 : Do not have elements displayed off screen. Remove them from the scene or hide them instead.

Problem 3 : the main menu is slow when the rankings are displayed

07 - Screenshot - Menu - RankingsIn the main menu, when the rankings are displayed (see screenshot on the left), the fps drops heavily. Obviously, this is caused by the quantity of text. Cocos2dx optimizes the texts by using a single OpenGL draw call to display a string, instead of drawing each character one by one. It’s good but not enough in my case. I initially used a simple (and logical) approach and had one CCLabelBMFont per element of the rankings. There are up to 10 lines with 3 elements per line (position, name, score), and two headers. That’s a total of 32 draw calls here, only for texts. I updated the code so that all the rankings texts are now displayed with a single draw call (see below for implementation details). The game is now running smoothly.

Fix 3 : Use as few draw calls as possible by grouping sprites together

How to reduce draw calls with Cocos2d-x?

The easiest way to reduce draw calls is to use a CCSpriteBatchNode. If you have multiple sprites using the same texture, they can be displayed in one call if they share the same CCSpriteBatchNode parent. Hopefully, as I use texture pages, my sprites are nearly all in the same texture (see How to pack your textures). Here are some examples of optimizations I did in Don’t Feed the Trolls:

  • All the elements of the UI (timer bar, troll head, red cross…) now share the same CCSpriteBatchNode (up to five draw calls removed)
  • The bears are displayed with two sprites, the trolls with three sprites. Each character is now displayed as a CCSpriteBatchNode (one or two draw calls removed per character)
  • In the main menu, the rankings background is using the same sprite twice, they now share the same CCSpriteBatchNode (one draw call removed)

For the fonts, the CCLabelBMFont is a CCSpriteBatchNode itself (derivation). However, I have so much text in the rankings that it still slows down the game (Problem 3). I made some code to regroup all the texts of the rankings together under one CCLabelBMFont. As I used different scale, colors, and alignment, it’s not trivial and can’t be done automatically with a single CCLabelBMFont. I decided to have the CCLabelBMFont class positions the letters properly, and then move each letter under a “master label”, and finally deleting the original label. Here is the raw code I used (_mega_label is the node that contains all the text, bmf is the label we want to add to the mega label (temporary object)) :


	// copy the label to mega_label and delete it
	CCArray *children = bmf->getChildren();
	if( children )
	{
		// loop through all characters
		while( children->count() )
		{
			CCNode *ch = (CCNode*)(children->objectAtIndex( 0 ));
			// remove characters from current label (retain it before removing it)
			ch->retain();
			bmf->removeChildAtIndex( 0, true );
			Vector2 char_pos = ch->getPosition();
			// compute proper position taking scale into account
			char_pos.x = char_pos.x*scale + p.x;
			char_pos.y = char_pos.y*scale + p.y;
			ch->setPosition( char_pos );
			ch->setScale( scale );
			// add it to label
			_mega_label->addChild( ch );
			// Don't Feed the Trolls specific : I keep a list of animated sprites
			// (they have there opacity animated)
			// I do it manually instead of using an action
			if( is_local_player )
			{
				// need an animation, add the sprite to a specific list
				_animated_sprites.push_back( (CCSprite*)ch );
			}
			ch->release();
		}
	}
	// free the original label
	bmf->release();

How to find out where to optimize?

You don’t need to optimize everything, just the slow sections of your game. Here is how I found out where to optimize:

  • I display on screen the draw call count. To do that, I hacked into cocos2d-x code and simply added a global counter before each call to glDrawArray or glDrawElements. I used a #define to easily remove this code in released versions of the game. Be careful to ignore the sprites used to display the debug text itself.
  • I also have a function that goes recursively inside a CCNode and count visible/hidden nodes, nodes with low opacity (less than 10) and so on. Very handy to detect items not visible because they are out of the screen or with a very low opacity.
    Here is the code used, adapt it to your needs:

    
    int nb_nodes, nb_visible_nodes, nb_rgba_nodes, nb_not_letters_nodes,
    	nb_not_letters_visible_nodes, nb_low_opacity_nodes;
    
    // recursive function counting
    // please note that I'm using an old version of cocos2d-x, with
    // convertToLabelProtocol and convertToRGBAProtocol: I found the
    // new version with RTTI a terrible idea and reverted it
    void DEBUGCountNodes( CCNode *n, bool parent_visible, bool is_letter )
    {
    	nb_nodes++;
    	bool we_are_visible = n->getIsVisible() && parent_visible;
    	bool father_is_label = is_letter;
    	bool we_are_label = is_letter || (n->convertToLabelProtocol() != NULL);
    
    	if( we_are_visible )
    		nb_visible_nodes++;
    	if( we_are_visible && !father_is_label )
    		nb_not_letters_visible_nodes++;
    	if( !father_is_label )
    		nb_not_letters_nodes++;
    
    	if( n->convertToRGBAProtocol() )
    	{
    		// count low opacity nodes (if below 10/255)
    		if( n->convertToRGBAProtocol()->getOpacity() < 10 && we_are_visible )
    			nb_low_opacity_nodes++;
    		nb_rgba_nodes++; 
    	}
     	// recursive part
    	if( n->getChildren() )
    	{
    		for( unsigned int i = 0; i < n->getChildren()->count(); i++ )
    		{
    			DEBUGCountNodes( (CCNode*)(n->getChildren()->objectAtIndex(i)),
    				we_are_visible, we_are_label );
    		}
    	}
    }
    
    // initial function to call, usually with the initial node of your sprite
    // hierarchy (mine is called _main_node)
    string fgGame::DEBUGCocosNodes( CCNode *start )
    {
    	if( start == NULL )
    		start = _main_node;
    	nb_nodes = nb_visible_nodes = nb_rgba_nodes = nb_not_letters_nodes =
    		 nb_not_letters_visible_nodes = nb_low_opacity_nodes = 0;
    	DEBUGCountNodes( start, true, false );
    	char str[256];
    	sprintf( str, "All:%d/%d NoTxt:%d/%d LowOp:%d", nb_visible_nodes, nb_nodes,
    		nb_not_letters_visible_nodes, nb_not_letters_nodes, nb_low_opacity_nodes );
    	return str;
    }
    

The great thing with this logging is that you can run it on the Windows build.

Conclusion

Make sure you test your app on a few less powerful devices. If you keep in mind the tips in this article and display debug information from time to time, you should easily be able to optimize the slowest code of your game.
From my tests, I found out that trying to have a maximum of 30 draw calls is a good way if you want to achieve 50 or 60 frames per second on nearly all medium-end devices. Of course, this is heavily empirical and you’ll have to do you own tests.

I hope this helped, feel free to comment and talk about your optimization techniques in the comments, on twitter or on facebook.

And don’t forget to try and rate Don’t Feed the Trolls (Free download, iTunes, Google Play)!