Let's Debate Coding Style

2012-08-18

This might be somewhat of rant but what the hell. Let's discuss coding style. We each probably have our personal preferences. Tabs, Spaces, 2 spaces, 3 spaces, 4 spaces, tabs set to 4, braces on lines by themselves or not, etc, etc. Until I started working at my present company I've never in 25 years worked at a company that had any real official coding style and things worked mostly fine. There might have been some un−official style or informal discussions but mostly people just did their own thing at it worked.

At my present company though they have a STRICT with a capital S style guide. I'm mixed on it. On the one hand it's nice to agree on a style. On the other hand, I hate Hate HATE some of their decisions.

Let me say first off there are 2 classes of style. Functional style and Formatting style. For functional ones I have few if any beefs with my current company's decisions. One good example, according to the style guide any constructor that takes just 1 argument must be declared explicit.

class Foo {
 public:
  explicit Foo(const Bar& b);   // 'explicit' is required here.
  ...
};

That's a functional style issue. The reason for this rule is that single argument constructors act as implicit conversions. With out the explicit this code would compile

   ...
   Bar bar;
   Foo foo;

   foo = bar;   // WTF!?
   ...

Of course there are places where that kind of conversion would be useful but in general it can cause more trouble than it's worth. Adding the explicit keyword prevents this kind of error and training coders to look for it is a good thing.

Formatting style on the other hand is often comes down to personal preference. It's possible that if my company's style guide said "this is just our personal preference and we aren't going to justify it" I'd have less issues with it (doubtful) but it really irks me that they claim their style choices make code 'more readable' and that they value 'consistency' since many of their choices are neither.

So some examples with I hope some objective arguments in favor of my preferred styles

Braces

There's 2 major brace styles I've seen (there are more but I've never run into them). The first is sometimes called K&R style since it was used in the original C book

// Style 1
void SomeFunction(int a) {
  if (a > 10) {
    printf("Yea!\n");
  } else {
    printf("Boo!\n");
  }
}

The other common style

// Style 2
void SomeFunction(int a) 
{
  if (a > 10) 
  {
    printf("Yea!\n");
  } 
  else 
  {
    printf("Boo!\n");
  }
}

I prefer style 2. Style 1's only good argument is that it's more vertically compact letting more code appear on the screen at once. That's certainly a valid argument. For 25 years I used style 2. I never once thought "If only I could see more lines I'd be more productive" so that argument doesn't hold a lot of water for me.

On the other hand, style 2 makes it very clear there are 3 blocks and those blocks can easily be copied, pasted, duplicated, and moved around with. The blocks are

{
  printf("Yea!\n");
}

and

{
  printf("Boo!\n");
}

And the outer block

{
   if (...) 
     block
   else
     block
}

Style 2 is also more consistent. In the example above braces appear in only 2 columns instead of 5. In style 2 there's no where in the code where anything comes after a closing brace. In style 1 there's the exception, else. Even better, it's easier to comment style 2.

  // If were going too fast apply the brakes.
  if (velocity > max_velocity) 
  {
     ApplyBrakes():
  }
  // Otherwise if we're going too slow accelerate.
  elif (velocity < min_velocity)
  {
     StepOnTheGas();
  }

With style 1 where does the second comment go?

  // If were going too fast apply the brakes.
  if (velocity > max_velocity)  {
     ApplyBrakes():
  } elif (velocity < min_velocity) {
     // Otherwise if we're going too slow accelerate
     StepOnTheGas();
  }

There the comment is too late. All other comments come before the code they are describing. But doing that is just as bad

  // If were going too fast apply the brakes.
  if (velocity > max_velocity)  {
     ApplyBrakes():
  // Otherwise if we're going too slow accelerate
  } elif (velocity < min_velocity) {
     StepOnTheGas();
  }

Now the comment is inside the wrong code block.

Preprocessor #if/#else/#elif/#endif indentation

The Google style guide says always put preprocessor definitions flush left like this.

#if ENABLED

  DoIt();

#if SOME_FEATURE

  DoThis();

#if SOME_SUB_FEATURE

  DoThat();

#endif  // SOME_SUB_FEATURE

  DoSomethingElse();

#endif  // SOME_FEATURE

#endif  // ENABLED

I don't understand that decision at all. (1) It's inconsistent with normal if/else formatting. (2) If not indenting preprocessor #if why indent regular if/else statements? What's the argument for why indenting code blocks is good like this

  if (something)
  {
    doSomething();
    if (something_extra) 
    {
      DoSomethingExtra();
    }
  }

and indenting preprocessor blocks like this is not good?

#if ENABLED

  DoIt();

  #if SOME_FEATURE

    DoThis();

    #if SOME_SUB_FEATURE

      DoThat();

    #endif  // SOME_SUB_FEATURE

    DoSomethingElse();

  #endif  // SOME_FEATURE

#endif  // ENABLED

I prefer to indent preprocessor blocks like this bottom version. It's consistent. The same reasons that make normal block indentation more readable apply just as much to nested preprocessor directive blocks.

Formatting Function Arguments

There are a ton of ways to list arguments to a function. The most common is just across the line as in

  void SomeFunction(int arg1, float arg2, char arg3) 

The debate usually comes up when you have many arguments or arguments with long names or types such that you'd like to use multiple lines. My company prefers these styles. First do this

  void SomeFunction(int arg1, 
                    float arg2, 
                    char arg3) 

If things are still to long then do this

  void SomeFunction(
      int arg1, 
      float arg2, 
      char arg3) 

I generally jump straight to the second format. Why? Because the first style is inconsistent

void RunInPlace(int speed,
                float duration);

void Sprint(int speed,
            float duration);

void SwimAcrossOcean(int speed,
                     float duration);

void CrawlOnBack(int speed,
                 float duration);

vs

void RunInPlace(
    int speed,
    float duration);

void Sprint(
    int speed,
    float duration);

void SwimInOcean(
    int speed,
    float duration);

void CrawlOnBack(
    int speed,
    float duration);

The first one arguments appear in 4 different columns. In the second they are all in the same column. Another reason is the first style often ends up generating extra work. Let's stay I rename the functions using search and replace or refactoring tools.

void Run(int speed,
                float duration);

void Sprint(int speed,
            float duration);

void Swim(int speed,
                     float duration);

void Crawl(int speed,
                 float duration);

vs

void Run(
    int speed,
    float duration);

void Sprint(
    int speed,
    float duration);

void Swim(
    int speed,
    float duration);

void Crawl(
    int speed,
    float duration);

The first style is now broken and someone has to go back and manually fix the lines. The second style just works whether it's a function declaration or calling the function.

Variable Declaration

When declaring variables, some developers prefer

{
   var speed, hitpoints, damage;
}

or possibly

{
   var speed, 
       hitpoints, 
       damage;
}

I prefer

{
   var speed;
   var hitpoints;
   var damage;
}

Why? One reason is consistency. The first 2 styles are arguably less consistant than the 3rd. In the first 2, 2 variables have commas after them, 1 has a semicolon. 1 variable is preceded by 'var' and 2 are not. In the 3rd style every variable is treated consistently.

Another reason is ease of editing. Delete the first or last variable from the first 2 styles and they break and have to be edited further. With the 3rd style that doesn't happen. Let's say I want to search and replace to get rid of 'speed'. With the first 2 styles I need to know the order of declaration to do the search and replace. It might be "speed, " or it might be "speed;". With the 3rd style without looking I know exactly what will work. "var speed;"

Where to put *

You've got 2 options

  int *ptr;

or

  int* ptr;

Even K from K&R prefers the second in C++ but i'd argue even in C style 2 is more consistent.

  int   speed;
  float duration;
  char  *name;

vs

  int   speed;
  float duration;
  char* name;

With style 1, sometimes a variable name is by itself with its complete type on the left. Other times it's not with part of its type on the left and another part with part of its type connected to it. Style 2 is always the same. Type on the left, name by itself.

Constructors

Google prefers

SomeClass::SomeClass()
    : field1(value1),
      anotherField(value2),
      field3(value3),
      yetAnotherField(value4) {
}

I prefer

SomeClass::SomeClass()
    : field1(value1)
    , anotherField(value2)
    , field3(value3)
    , yetAnotherField(value4) 
{
}

Again it comes down to consistency and easy of editing as well as easy of checking. On consistency the first style has 3 different cases; 1 for the first field, 1 for the last field, and 1 for the fields in between. The 2nd style only has 2 cases. 1 for the first field, 1 for the rest. On editing, with the 2nd style adding or moving fields is easy. Except for the first line adding or removing a new field doesn't effect any other lines. With the second style, if you add a field to the end you have to edit 2 lines instead of just 1. Finally, which one of these is easier to spot the error?

SomeClass::SomeClass()
    : field1(value1),
      anotherField(value2),
      field3(value3)
      yetAnotherField(value4) {
}

or

SomeClass::SomeClass()
    : field1(value1)
    , anotherField(value2)
      field3(value3)
    , yetAnotherField(value4) 
{
}

BTW WebKit prefers the second style here.

Splitting Expressions

Slightly similar to the last one is how to split long expressions. Google prefers operators at the end of lines

  damage = speed +
           weight +
           strength;

I prefer something along the lines of

  damage = speed
         + weight
         + strength;

I think this one could maybe be argued either way. Style 2 has the advantage that if you can't see the end of the line you know that the 2nd and 3rd lines are continuations of the first since they start with an operator that requires or at least suggests they are a continuation. Not so with the first style. On the other hand, Google style has an 80 column limit. If you accept that the 80 column limit is a good thing (I don't) then maybe style 1 has no cons since you'll likely always be able to see the end of the line.

Lining things up

When it's not too inconvenient I like to line things up. Some examples

   struct Creature 
   {
     int         speed;
     float       endurance;
     const char* name;
   };

   Creature creatures[] = 
   {
      // spd  endure  name
      {  75,  1.075f, "Blarg",     },
      {   6,  0.500f, "Gern",      },
      { 103, 13.500f, "Nargblat",  },
   };

Google requires no padding so

   struct Creature  {
     int speed;
     float endurance;
     const char* name;
   };

   Creature creatures[] = {
      // spd, endure, name
      { 75, 1.075f, "Blarg", },
      { 6, 0.5f, "Gern", },
      { 103, 13.5f, "Nargblat", },
   };

The best argument I can think of for padding is email, spreadsheets, mp3 players and file explorers. Padding is basically turning stuff into tables. Would you rather see

ArtistAlbumSongDuration
MadonnaEroticaRain3:54
U2The Joshua TreeIn God's Country2:57
Lady GagaThe FamePoker Face3:35
Red Hot Chili PeppersMother's MilkHigher Ground3:23

or this?

ArtistAlbumSongDuration
MadonnaEroticaRain3:54
U2The Joshua TreeIn God's Country2:57
Lady GagaThe FamePoker Face3:35
Red Hot Chili PeppersMother's MilkHigher Ground3:23

Some people, especially in JavaScript or Python, would be happy with this style

var creatures = [
  {  
    speed: 75, 
    endurance: 1.075, 
    name: "Blarg"  
  },
  {  
    speed: 6,  
    endurance: 0.5, 
    name: "Gern"
  },
  { 
    speed: 103, 
    endurance: 13.5, 
    name: "Nargblat"
  }
];

But I'd argue it's still often better in table form.

var creatures = [
  { speed:  75, endurance:  1.075, name: "Blarg"    },
  { speed:   6, endurance:  0.500, name: "Gern"     },
  { speed: 103, endurance: 13.500, name: "Nargblat" }
];

It makes it far easier to compare them. Which one has the fastest or slowest speed or which one has the most or least endurance is very clear in table for and not clear at all in non−table form. Of course if the tables are giant or edited by non−engineers it might be better to look into editing them outside of code.

(I'd also like to point out that any language designer that doesn't allow trailing comma's in lists needs to be shot. I'm taking to you JavaScript!)

Arguably formatting items in a table makes them more readable otherwise we wouldn't do it. Therefore formatting code in what is effectively a table format is often more readable. This includes expressions. I'll often do something like this

  uint32_t rgba8888 = ((red   & 0xFF) << 24) 
                    | ((green & 0xFF) << 16)
                    | ((blue  & 0xFF) <<  &&#0035;56;)
                    | ((alpha & 0xFF)      );

I might even shift by 0 for more consistency

  uint32_t rgba8888 = ((red   & 0xFF) << 24) 
                    | ((green & 0xFF) << 16)
                    | ((blue  & 0xFF) <<  &&#0035;56;)
                    | ((alpha & 0xFF) <<  0);

where as Google requires this

  uint32_t rgba8888 = ((red & 0xFF) << 24) | 
                      ((green & 0xFF) << 16) |
                      ((blue & 0xFF) << &&#0035;56;) |
                      (alpha & 0xFF);

I think the padded style is objectively more readable. Google thinks so to because they request that comments be aligned as in

  int speed;       // Speed in units per frame
  float duration;  // Duration in seconds

Why the inconsistency? If alignment is helps readability in one place it arguably can help it in many others.

Braces Even on Single Line Blocks

For a single line block styles can often be this

  if (condition)
     code

or

  if (condition) 
  {
     code
  }

Google's style lets you choose. WebKit's style requires the first. Several people would argue the second is objectively better. For one, it prevents this error

  if (condition)
     code
     more code

If braces are required that error is impossible. Another is consistency. Having braces sometimes and not others is inconsistent. Having them always is consistent.

I'm sure some of you are thinking, hey, python solves this problem. No, python's solution sucks even though I love python otherwise. It's often useful to insert extra lines for debugging and it's nice to make them stick out but. Example

    def SomeFunction():
        DoSomething()
print "here1"
        DoSomethingElse()
        DoAnotherthing()
print "here2"
        DoMore()

This makes it easy to go back and delete the debugging lines as they stick out. But I can't do that in python. I have to make the debugging lines blend in with everything else. If the debugging lines are more complicated, say a loop to validate something, they start looking like the real code making it harder to find them later. Yes there are ways around that. End every line with #DEBUG or put # −− DEBUG −− before and after the debug block. If python had real code blocks like most languages I wouldn't have to find ways to work around it's formatting limits.

Line Length Limits

Google requires an 80 character line limit. More than 80 characters and you must split the line following several rules. After 25 years of living without a line length limit I found this to be one of the hardest to accept. In particular, the Google Style guide talks about "readability" and requires not abbreviating names unless their super common. (src, dst, num, len are ok. mgr, rcvr, sgnl are probably not ok.) But these 2 conflicting rules leads to incredibly split lines.

Assume the limit was 40 columns. Which is more readable

  int w = desired_width 
      * scale_factor + padding;
  int h = desired_height
      * scale_factor + padding;

  glBindTexture(
      GL_TEXTURE_2D, someTexture);
  glTexImage2D(
      GL_TEXTURE_2D, level, format, 
      w, h, border, format, type, data);
  glTexParameter(
      GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, 
      GL_REPEAT);

vs

  int w = desired_width  * scale_factor + padding;
  int h = desired_height * scale_factor + padding;

  glBindTexture(GL_TEXTURE_2D, someTexture);
  glTexImage2D(GL_TEXTURE_2D, level, format, w, h, border, format, type, data);
  glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);

I find the second far more readable. The function of the code flows better. I see

compute size bind texture upload data set parameters

where as in the first I see

compute width bla bla bla compute height bla bla bla bind texture bla bla bal upload data bla bla bla set parameters bla bla bla

80 columns is left over from the 1980s and earlier computers with text displays. Nowadays we have 24 inch or 30 inch monitors that can easily display 320 columns or more depending on the font size. It seems silly to hold us back to a 1980s standard. Just as silly as it would seem to send a fax today instead of an email or other form of electronic sharing.

In fairness I've found the 80 column limit can come in handy. It kind of comes in handy when doing side by side diffs as all of both the old code and new code fit on the screen. On the other hand, for the previous 25 years of coding I never felt like "if only I could see the end of every line this diff wouldn't be so hard". Most modern visual diff programs will scroll both views together, or show the code in other ways (top, bottom), etc.

The other plus to 80 columns is it makes code snippets easier to paste into web pages 😉 Of course that's rare enough that I can hand format those cases and write samples that fit.

Strict Naming Rules

Many companies have rules for naming. For example Google's is constants are kCamelCase, functions, classes, structs and typedefs are CamelCase, variables, arguments, and fields are under_score, macros are ALLCAPS. Example:

class GameObject {
 public:
  enum MoveState {
    kWalk,
    kJog,
    kRun,
  };

  static const int kStartSpeed = 10;

  void MoveObject(float elasped_time);

 private:
  int max_speed_;
  int current_speed_;
};

That's all fine. But sometimes those rules should be broken. Upper and Lower case are a western language feature. Languages like Korean, Japanese, Chinese, don't have upper and lower case. (which suggests that languages and frameworks that use upper and lower case to select features might be culturally insensitive design).

The issue comes up with code generation. Sometimes you'd like to auto generate code but if you have to derive Uppercase, lowercase, CamelCase and under_score styles from the same data you're wasting your time just for the sake of style.

Edit: April 2013

Actually I now think naming should be more consistent. Google's style is listed above. Specifically, variables, getters, and setters are under_score and most everything else is CamelCase. The problem with this style is that it's not consistent. I'm not sure I can come up with a good example but...
class ThingAmaJigManager
{
public:
  int get_num_thing_ama_jigs() const 
  {
    return m_num_thing_ama_jigs;
  }

  void SetNumThingAmaJigs(int num);

  ...

private:
   ThingAmaJig* m_thing_ama_jigs;
   int          m_num_thing_ama_jigs;  
};
First rule is, setters and getters are under_score unless they are slow. In other words, if you wouldn't call the function in a tight loop then name it GetNumThingAmaJigvs not get_num_thing_ama_jigs. That's arguably inconsistent. Why, as a user of the class, should I have to guess what the name of the function is. I get that it's nice to know it's inexpensive but if that's the rule then it should be any function which is inexpensive follows this rule, not just setters and getters. Second problem, it makes code search worse. I was recently trying to add a simple flag to some Chrome/WebKit code. There were 6+ classes involved. Most of the classes had functions like setEnableThisFeature and getEnableThisFeature. Some were virtual so they had no field named m_enable_this_feature. Searching the entire code base for "enableThisFeature" only finds some of the references. You also have to search for "enable_this_feature" to find everywhere that stuff is referenced. It seems arguable that having only to search for one of the 2 styles instead of both styles is better. For me, I searched for "enableThisFeature" and found all the classes. I edited the classes and saw the ones that had a field "enable_this_feature" and fixed those clases. What I didn't find was there was some marshaling code directly referencing "enable_this_feature" with no reference to "enableThisFeature". I spent 4 hours tracking that down. A more consistent naming convention would have avoided that. By that argument the consistent WebKit naming convention is better than the inconsistent Google one.

Member names

Lots of project require member names to be named differently than local variables. Microsoft and WebKit usually use m_name. I've seen some projects that use mName. Google uses name_, the name followed by an underscore. I hate that decision. The whole point of given them a different name is to make them stick out clearly. Underscore is not enough to do that in my opinion. Which of these is easier to spot the error.

void SomeClass:&&#0035;128539;rocessVeloctiy(float velocity) 
{
   velocity_  = velocity_ + accleration_;
   position_ += velocity_;
}   

vs

void SomeClass:&&#0035;128539;rocessVeloctiy(float velocity) 
{
   m_velocity  = m_velocity + m_accleration;
   m_position += m_velocity;
}   

For me at least the second one is far more clear that I never used 'velocity'. Maybe that's not a good example since the compiler would warn me that velocity is unused but a few times a year I've used name_ where I meant name or visa−versa. I feel like using m_name I'd be less likely to make that mistake.

I like that python and JavaScript mostly solve this problem as they require self.field or this.field which forces the issue. Of course even in those language you need to distinguish global variables, class variables, privately scoped variables, etc. Choose a naming convention that makes things clear and helps you prevent errors.

Functional Style

A few minor functional style preferences.

Google's style guide lets you use pre−increment or post−increment for simple types. I prefer pre−increment always when it's appropriate. If it's not clear pre−increment

   ++index;

means "increment index" where as post increment

   index++;

means "<code>temp = index; ++index; return temp;". If your iterator is complex that's important since making a temporary copy of the iterator might actually do something. For a simple integer the compiler will of course throw away the temp part but from my point of view it's better to say what you mean. If you mean increment write "<code>++index". If you mean "<code>temp = index; ++index; return index;" then write "<code>index++".

Use Unique Names

I'm really surprised when I come on a new project and people have chosen global variables or more often preprocessor defines with very simple names. Example DEBUG, OS_WIN, WIN, LINUX, MAC, OSX, LOGGING, ENABLE_LOGGING, etc.. The problem with using these kinds of names is you eventually want to use some third party code and they also picked similar names and you end up having conflicts. Choosing names that are unlikely to ever conflict like MYPROJECTNAME_PLATFORM_WIN or MYPROJECTNAME_ENABLE_LOGGING etc..., means you won't run into that problem.

Don't make single line functions

Some style guides, including Google's, allow 1 line functions as in

class Foo 
{
 public:
   int get_width() const { return m_width; }
   void set_width(int width) { m_width = width; }
};

The problem with these kinds of lines is they are often un−debuggable. The debugger won't stop on the line or if it does it's not possible to inspect 'this' or local fields or arguments. Where as this

class Foo 
{
 public:
   int get_width() const
   { 
     return m_width; 
   }
   void set_width(int width) 
   { 
     m_width = width; 
   }
};

Has no such problem. I've run into this issue on gdb, cgdb and VC++ so it's no specific to one platform's debugger.

I'd even go so far to say that I wouldn't be against a style guide that required values to be calculated outside calls. For example.

   ...
   drawCircle(x, y, radius * scaleFactor * fudge);
   ...

vs

   ...
   float effectiveRadius = radius * scaleFactor * fudge;
   drawCircle(x, y, effectiveRadius);
   ...

I've run into several situations where I needed to change the code from the first example into the second just so I could debug. I'm sure you're thinking I could step into 'drawCircle' to find out what the calculation is but that's not always the case. Often you can't debug into certain functions easily. If this is JavaScript in a browser and we're calling 'context2d.arc(...)' there is no stepping into 'context2d.arc()' as it's a native function. Similarly in C/C++ calling system functions or libstdc functions might not be easily debugged. There's no efficiency issue as the compiler will or at least should generate the optimal code when optimization is turned on but following this kind of style it will make debugging much easier.

Conclusion

The funny thing is, I've been at my present employer for over 4 years now and so I've had to follow their style for 4 years. That's made it so it's hard for me to write the style I prefer. Even in personal code I'm so used to doing it their way that I just continue to do it. That doesn't mean it doesn't still bother me though. Anytime I know something would be more readable if I could align it like a table or anytime I rename some method and then have to manually walk through all the files and changes and fix their formatting it can be frustrating. We also have to torture new hires as we beat them into style submission from whatever their previous style was. They run into similar issues where the style guide is arbitrary and not well thought out.

sigh...

If you've got some particular style and objective reasons for them I'd love to hear about them.

Comments
WebGL 3D - Part 1, Orthographic 3D
WebGL Boilerplate