Reading Source Code

When I started working at Smallworld, which used its own programming language called Magik, I was aghast there wasn’t any documentation of the class library. And it was a large class library – think of it as comparable to the Java (or Python or Ruby or whatever) library. Having worked in Delphi (yeah, dating myself), where every method of every class was documented, I couldn’t believe it. But getting outraged about it didn’t do much to solve my problem – I was a newly hired developer expected to deliver working code built on top of a undocumented class library. But I did have the source code. And coworkers who could point me in the right direction. Looking back, that might have been one of the greatest gifts I received as a young developer, learning how to navigate large code bases.

To be a great developer you need to be great at reading source code. Source code is the truth of a program. To fix a bug, you need to find it in the source code. To add new functionality, you need to know where in the source code to add it. To interface with a library, you have to understand what the library is doing behind the scenes (because the docs are incomplete, wrong or you didn’t understand them). Most of the time no one is going to tell you where to look. Even the original developers may no longer know the answer – many times have I looked at code I wrote just 6 months ago and have to reconstruct what I was trying to accomplish.

But reading source code is hard. The first problem is that most of the time the core logic is overwhelmed by support code. Support code that validates inputs, casts data from one type to another, verifies various environmental or business constraints, deals with corner cases, etc. It all important, but its noise that hides the intent of the code. And that’s just looking through a single method.

The second problem is scale. Large projects built over many years accrete a *lot* of code. Its not possible to go read 20,000 lines of code and understand what its doing in a reasonable amount of time – let alone reading hundreds of thousands or millions of lines of code. Once a code base gets to certain size, no one fully understands it, not even the authors.

The third problem is if you aren’t familiar with a code base, you don’t have a mental model of the system architecture. Since you don’t understand the big picture, you can’t understand how the program is broken down or how the pieces fit together.

So how do you do it? You practice it over and over and over. You look for patterns. You think about how programs are written and use that to your advantage. You know that the only way to deal with complexity is break it down into smaller pieces – so how did the developers of the program break it down? How did they organize the code? There are clues all over – random incomplete architecture documents, the names of source code directories, the name of source files, etc. You also learn to recognize the support code discussed above so you can skip it.

Sometimes you are lucky and find what you are looking for, read the code, understand it, and move on. Most of the time you aren’t. At that point use your debugger! I’m always shocked by how few developers use debuggers. In fact, I think I can count on my fingers the number of developers I’ve met that regularly use debuggers. To me they are invaluable – stepping through code to see the call stack of how a method is reached, how variables and fields are modified by each line, how conditional statements are evaluated, etc.

How you use the debugger depends on what you are trying to accomplish. If you need to fix a bug, you need a beachhead, some part of the code path that is executed by the action that triggers the bug. Until you get that, you are wandering in the dark. Use your experience to find the beachhead. If you know a calculation is wrong, grep the code for potential method names or classes that do the calculation. If you find a class, put a breakpoint in its constructor. If its method, put a breakpoint in the method. You’ll know on the next execution of the bug if you were correct or not.

Much of the same logic applies if you are trying to understand how a piece of code works. If reading it isn’t enough, and you need to step through it with live data to understand it, you’ll still need your beachhead.

Once you have the beachhead, you start to build your mental model of how a piece of code works. First you understand the local code. Then you work through the inputs – what are they, how are they calculated? And what about the outputs? Little by little you understand more and more. At the same time, take a step back and think. What is this code doing it and why is it doing it? Why is it structured like this. Why is it in this folder? Build theories in your mind, and then test those theories.

With persistence, sometimes a whole lot of persistence, eventually you’ll get it. You’ll understand what you need to fix the bug you are looking for, or understand the code you are trying to reuse, or figure out how some ancient crufty part of a program can be modernized and improved.

CFIS

CFIS

CFIS

Reading Source Code

Leave a Reply Cancel reply

CFIS

CFIS

Reading Source Code

Maybe you are interested

tensorflow-ruby – Data and @tf.function

tensorflow-ruby TensorBoard

Leave a Reply Cancel reply