Stop Pasting Code

2008-05-31

This topic seems almost too basic to even discuss but I've run into these issues fairly often so appearently they do need discussion

When coding somethings to keep in mind, how can we code both efficiently and defensively. In other words, how can we get things done faster and with less bugs.

Here's a few ideas. They might seem obvious but like I said, I see lots of code that doesn't follow these ideas and so I'm bringing them up.

#1) If you do something a lot, figure out a way to make it easier to do.

Here's two examples from my own experience. I have a library to read .ini files. .Ini files are something that Windows first introduced me to. They are a very simple text based format that look something like this

; ---- game settings ----
[player]
numlives=5

[ogre]
hitpoints=20
weapon=sword
speed=1.34

[dragon]
hitpoints=300
weapon=firebreath
speed=6.5

I wrote a library to read these files. It parses the file into sections and label−value pairs. Let's assume I started off with an interface something like this

// some global vars because it's simple to understand
int    g_playerNumLives;
int    g_ogreHitPoints;
string g_ogreWeapon;
float  g_ogreSpeed;
int    g_dragonHitPoints;
string g_dragonWeapon;
float  g_dragonSpeed;

// ...

const IniFile* pIni = ReadIniFile("c:/test/game_setttings.ini");
const IniSection* pSection = pIni->GetFirstSection();
while (pSection != NULL)
{
    if (strcmp(pSection->GetName(), "player") == 0)
    {
        const IniLine* pLine = pSection->GetFirstLine();
        while (pLine != NULL)
        {
            if (strcmp(pLine->GetId(), "numlives") == 0)
            {
               g_playerNumLives = atol(pLine->GetValue());
            }
            pLine = pSection->GetNextLine(pLine);
        }
    }
    else if (strcmp(pSection->GetName(), "ogre") == 0)
    {
        const IniLine* pLine = pSection->GetFirstLine();
        while (pLine != NULL)
        {
            if (strcmp(pLine->GetId(), "hitpoints") == 0)
            {
               g_orgeHitPoints = atol(pLine->GetValue());
            }
            else if (strcmp(pLine->GetId(), "weapon") == 0)
            {
               g_orgeWeapon = pLine->GetValue();
            }
            else if (strcmp(pLine->GetId(), "speed") == 0)
            {
               g_orgeSpeed = (float)atof(pLine->GetValue());
            }
            pLine = pSection->GetNextLine(pLine);
        }
    }
    else if (strcmp(pSection->GetName(), "dragon") == 0)
    {
        const IniLine* pLine = pSection->GetFirstLine();
        while (pLine != NULL)
        {
            if (strcmp(pLine->GetId(), "hitpoints") == 0)
            {
               g_dragonHitPoints = atol(pLine->GetValue());
            }
            else if (strcmp(pLine->GetId(), "weapon") == 0)
            {
               g_dragonWeapon = pLine->GetValue();
            }
            else if (strcmp(pLine->GetId(), "speed") == 0)
            {
               g_dragonSpeed = (float)atof(pLine->GetValue());
            }
            pLine = pSection->GetNextLine(pLine);
        }
    }
    pSection = pIni->GetNextSection(pSection);
}

That seems fine and I've seen lots of code like that but let's look at ways to improve it both for efficiency and for preventing bugs. And, as a someone who creates lots of tools to support less techie users, let's try to make it more user friendly.

Some of the things that come to mind.

We could add a way to find a specific section instead of having to loop through all of them. We could provide functions for getting values by name within a section. If we did that the code would then look something like this.

// ...

const IniFile* pIni = ReadIniFile("c:/test/game_setttings.ini");
const IniSecton* pSection;
pSection          = pIni->FindSection("player");
g_playerNumLives  = pSection->GetInt("numlives");
pSection          = pIni->FindSection("ogre");
g_orgeHitPoints   = pSection->GetInt("hitpoints");
g_orgeWeapon      = pSection->GetString("weapon");
g_orgeSpeed       = pSection->GetFloat("speed");
pSection          = pIni->FindSection("dragon");
g_dragonHitPoints = pSection->GetInt("hitpoints");
g_dragonWeapon    = pSection->GetString("weapon");
g_dragonSpeed     = pSection->GetFloat("speed");

Of course it took a little more thought and time to write IniFile::FindSection() and the IniSection::Get???() functions but in the end, all the new code I write is much smaller and much simpler.

Once I've changed to this style of API it's easier to add new functionally to make it more robust. For example, originally when I read "numlives" the code used atol(). Well, the code in IniSection::GetInt handles not only formats like 123, −456 but it also handles hex formats like 0xABCD or $ABCD or even ABCDh as well as binary formats like 0101010b. That means I only have to change that code in the library and all programs that use the library get that new functionality.

Another is the find function. It doesn't just use strcmp but instead uses stricmp making it case−insensitive which is better for less techie users. Because it's in one place, all code that uses the library gets the same treatment. If we leave the compare external like before, then it's up to the whim of each programmer if they call strcmp or stricmp or something else. Since we put it in a library we get more consistancy. We don't have to give some rules to try to get all programmers to conform. Conformancy and consistency become free and it's even less work.

Other things we could do. We could change the interface so that each Get___ function takes a reference and returns a bool if success or failure. We could also supply a default and maybe print an error if the value is not found.

// ...

const IniFile* pIni = ReadIniFile("c:/test/game_setttings.ini");
const IniSection* pSection;
pSection = pIni->FindSection("player");
pSection->GetInt(g_playerNumLives, "numlives", 10);
pSection = pIni->FindSection("ogre");
pSection->GetInt(g_orgeHitPoints, "hitpoints", 10);
pSection->GetString(g_orgeWeapon, "weapon", "club");
pSection->GetFloat(g_orgeSpeed, "speed", 1.0f);
pSection = pIni->FindSection("dragon");
bool error = pSection->GetInt(g_dragonHitPoints, "hitpoints", 100) |
             pSection->GetString(g_dragonWeapon, "weapon", "fire") |
             pSection->GetFloat(g_dragonSpeed, "speed", 1.0f)      ;
if (error)
{
    printf ("dragon was missing settings\n");
}

I only showed an example of the dragon checking for errors where it only cares if any field was missing where as the other ones don't care. Of course if the library itself printed the errors that saves us the trouble of doing it ourselves. That could be problematic since every program usually has a different way of reporting errors but putting the error reporting in will save time most of the time and in the worst case you could design your library to take an error reporting callback that you could use to get it to report errors to fit whatever program you are using the library in.

In another example. We could do a similar thing with IniFile::FindSection. Make it take a reference to a section pointer that it fills in and have it return a bool for errors.

bool IniFile::FindSection(const IniSection*& pSection, const char* pSectionName)
{
   bool status = true; // assume success;
   // find the section.
   ...
   // if the section does not exist set pSection to a fake section with no fields
   if (sectionNotFound)
   {
      static IniSection bogus;

      pSection = &bogus;
      status   = false; // return failure
      printf ("couldn't find section (%s)\n", pSectionName);
   }
   return status;
}

Now, even if the section does not exist your code doesn't have to check for it but it still can if it wants to. This is important, at least in game development, the game needs to run at all times. With teams of 50 to 200 people you can't have the game fail to run when one person puts in bad data but at the same time you don't want errors to go un−noticed. With a design like this the game will continue to run but errors will be printed for any section not found and for any fields missing and resonable defaults will be supplied if they are missing.

We've made the code easier to use. We've made it harder to use wrong. You don't have to check for errors if you don't want but they will be reported. You can still check though if you absolutely need to.

You can apply similar ideas to other areas of your code.

One example, don't hand parse command line arguments. Languages like Perl or Python come with command line parsing libaries. Use them. For C/C++, write one or use one. Stop copying and pasting code. Stop writing more code than you need to. Stop making your code have to work hard.

The standard parsers deal with all kinds of issues and they make your code consistant. You don't have worry if some command format is

command -arg1 -arg2=param

and another is

command -arg1 -arg2 param

and another is

command /arg /arg2 param

You don't have to wonder if

command "thing with space"

will work and even if you wrote your own library, if you find a bug it's fixed in one place and that bug gets fixed in all your tools. My library also handles responses files so if I need to supply 1000 files names or something I can put them in a file can do

command @listoffiles.txt

Anyway, maybe I shouldn't have used the commandline example because I know the majority of programs I've seen the source too don't use libraries or the standards. But, those same issues above are so common in other areas as well. I've seen this before.

void FunctionThatEffectsAGlobal()
{
   ... // effect global here
}

Later it's decided to make the code threaded the globals need to be locked for thread safe access. I've seen this alot.

FunctionToLock();
FunctionThatEffectsAGlobal()
FunctionToUnlock();

...

FunctionToLock();
FunctionThatEffectsAGlobal()
FunctionToUnlock();

...

FunctionToLock();
FunctionThatEffectsAGlobal()
FunctionToUnlock();

where as wouldn't this be better?

void FunctionThatEffectsAGlobal()
{
   FunctionToLock();
   ... // effect global here
   FunctionToUnlock();
}

Not only is it less code, I only had to change one place, not every place that called the function that effects a global. It also means you can't call that function incorrectly. You can't accidentally call it without locking which would be a really hard to find bug in a multi threaded program. Sure, the first time you do it it might surround some global access with a lock. But the second time you find yourself locking the same variable somewhere else in the code should trigger the thought that maybe you should turn it into a function.

The title of this article is "Stop Pasting Code". The reason is I see so many programmers code something once and then copy and paste it again and again in the same program. Sure, at first it seems like the easiest thing to do but in the long run it makes your code harder to maintain, harder to know what the right thing to do is and harder to make global changes. And, it's even in the end more work for you. So please, give it a little more thought and see if it wouldn't be better to refactor a little if you find yourself copying and pasting code.

Comments
Video Games in Legos
std::vector and std::remove