What if good code colorization came before naming standards?

Related to this post on time wasted because of naming standards, I just ran into this 2018 talk about tree-sitter. A fast language parser for code colorization written by Max Brunsfeld at github.

It seems pretty clear something like this had to be developed later than naming standards but its at least interesting to imagine what if it came first? Would we even need naming standards?

In the talk they point out that most editors use a complex set of regular expressions to guess how to color things.

Here's a pretty typical example:

class Foo {
 public:
  static Foo* create(int bar);
  int getBar();

 private:
  explicit Foo(int bar);
  int bar;
};

Foo::Foo(int bar) : bar(bar) {};

Foo* Foo::create(int bar) {
  return new Foo(bar);
}

int Foo::getBar() {
  return bar;
}

What to notice:

Foo is only green in 2 places, 7 others are not colorized.
There are 2 types of bar. bar as an argument to create and Foo::Foo and bar as a member of an instance of Foo.

This is because most of the colorizers have no actual knowledge of the language. They just have a list of known language keywords and some regular expressions to guess at what is a type, a function, a string, a comment.

What if the colorizer actually understood the language?

For example:

class Foo {
 public:
  static Foo* create(int bar);
  int getBar();

 private:
  explicit Foo(int bar);
  int bar;
};

Foo::Foo(int bar) : bar(bar) {};

Foo* Foo::create(int bar) {
  return new Foo(bar);
}

int Foo::getBar() {
  return bar;
}

What to notice: Every type is green, every function is yellow, every bar is red when it's a member of a class. This means we don't need to name it _bar or mBar or bar_ as many style guides would suggest because the editor knows what it is and shows us by color.

We could also distinguish between member functions and global functions

void Foo::someMethod() {
  doThis();  // is this a member function or a global function?
  doThat();  // is this a member function or a global function?
}

Some of these issues go away by language design. In Python and JavaScript a member function and a property both have to be accessed by self / this so yes, there are other solutions than just coloring and naming conventions to help make code more understandable at a glance.

I haven't used tree-sitter directly (apparently it's used on Github for colorization though). I just found the idea that a language parsing colorizer could help make code more readable and help distinguish between things that naming conventions are often used for. I get that color isn't everywhere so it's maybe not a solution but it's still fun to think about what other ways we could make it easier to grok the code.

PS: The coloring above is hand-written and not via tree-sitter.

games.greggman.com

What if good code colorization came before naming standards?

2022-06-04