Document and comment everything!

It is more often than not that I come across a lot of code that has absolutely no form of documentation. Thankfully most of the time I have access to the source code and I can figure out what is going on on my own. This however, is a waste of my time and resources; time which could be spent utilizing the code rather than trying to figure out how to use it.

Programmers are lazy; we have a lot of work and little time to do it in. More often than not, documentation is not a priority and will not make the cut before the software is due. As a result servicing your own software or someone else’s code becomes a burden rather than pleasure.

Imagine for a moment that you have spent the last three months working of a piece of software that will service 100 people in a company. The software is up to par with the requirements put forth by the customer and as it turns out, they are very happy with the results. Three more months go by and the customer comes to you again because something needs to be changed or even worse, upgraded. This is a common scenario a programmer has to go through if they have to work with clients. As a result of this cycle, there are two outcomes possible:

  • You have to go and build on top of the code you have already written to add the changes or add new features
  • Someone else has to do the job for you; someone who does not completely understand how your code works and yet must go in and make changes.

In either scenario, you are working with code that has to be reused. If you are the one in charge of making the changes, there is a very good chance you do not exactly remember how to utilize the libraries and code structures you have written. If someone else has to work with your code, they have no idea how your code can be utilized to create new functionality; even worse they have no idea how the code works internally, provided something goes wrong.

From my experience, many programmers don’t want to take the time to run through their code to add comments or write documentation. I can relate to that; I want to get stuff done and stopping to comment anything is well… time consuming. But no matter how much we flinch at the idea of commenting code and writing the appropriate documentation, it is truly a necessary bother.

It is safe to say that we can, most of the time determine what a function or a class is responsible for based on the signature of the function. To obtain an employee for instance, a function can look something like this:

        public Employee getEmployee(string ID) //Strings for ID is common, provided many databases hold them with 0's at the front

Looking at this we can safely say that this function provides an employee object based on the ID of the employee. Despite the simplicity of the example I can already outline a potential problem which can be solved by consulting the documentation. Going along with the fashion of providing simple examples, lets imagine that for this function the documentation is as follows:

getEmployee is an obsolete function that is used to obtain employees which carry the NNNNN ID number format. All employees currently have a new and old ID number. Because many of our applications need to be updated to a new ID number system, this function has not been removed until all our systems are upgraded. For the new system, please use the getEmployeeSomeNewSystem(string ID) instead

This can happen when a company obtains a new software package, for instance PeopleSoft which manages your back end infrastructure such as employee records. Because your old system is running out of ID numbers, it maybe that you are implementing a new ID number system while slowly porting over old applications to use it.

It is obvious that in the above example, you would probably know about this detail fairly well. However, as software becomes more and more complex it becomes difficult to keep track of these kinds of things.

Simple examples aside, it is most important to point out that everyone programs differently. Despite code standards, each programmers will still structure code differently, call things differently and approach coding problems differently. For instance it is common in video games that memory allocation becomes a serious consideration; after all, you are running your code at 25 to 60 times a second (depending on frame rate). A common way to try to curb the amount of memory an game uses to to use an object pooling mechanism. Once an object approaches the end of its life cycle, for instance a Vector3, instead of disposing of it you push it onto an object stack. When a new Vector3 is needed, if there are objects on that stack you simply pop one off and reuse it. This saves you on having to create a new copy (implications of doing this is for a different post). Though there are libraries that do this already, I for instance always like to write my own class for doing this. It minimizes overhead that large libraries have and reduces the complexity of the class to a few functions. Pooling, however, is not always a known approach to some programmers. Some of them might have to figure out what

//Pool has been instantiated somewhere else
SomeObject myObject = pool.resurect<SomeObject>();

does and why am I not just saying

SomeObject myObject = new SomeObject();

In this case I would provide a documentation and examples for the Pool class that takes care of managing object allocation. A programmer may or may not have seen this before, but regardless of that, s/he has to figure out if it is worth using this approach. Often we package off our libraries in DLLs and as a result, it’s quite frustrating to sift through some other solution to figure out what the hell this does. A documentation however, makes this very clear and straight forward and easily accessible.

Ok cool, but how to I approach this “documentation” stuff?

Everyone has their own approach to documentation; there is no golden rule to it. To me the definition of “good documentation” is documentation that makes it clear why this class or function exists and how to use it properly. But there are different types of documentation we have to consider. For instance, I do consider comments to be extremely important but they are not “documentation”. However, consider the following example:


/// <summary>
 /// Use these as the basis for your transformations. It is temping to allocate a new vector each time but it is very costly
 /// especially if you run your game on a mobile device
 /// </summary>
static class Vectors
{
[..STUFF..]
}

In languages such as C# along with other .NET based languages, you can create what I like to call “internal documentation”. This is not the same as comments because when I call upon the Vectors class it provides me with the description. It holds transformation vectors such as [0 1] so that I don’t have to create a new one for computation. The class name “Vectors” can mean a few things. One might think it is for processing vector transformations, when it is actually just holds vectors for common use. However, because I have provided the <summary> information at the top of the class, when I call upon the Vectors class, intellisense will display the summary. This is most probably available in languages such as Java (along with an IDE that supports this), though I have not touched it in a while.

The other kind of documentation is written out documentation with examples. It is not part of the code and is normally found on some kind of resource. For instance a common place to store documentation are wiki’s that are restricted to only the development team. However, it does not have to stay internal. Almost every class in any of the Microsoft frameworks has examples, sometimes in multiple programming languages. This is probably the best resource a programmer can hand to another. If you are carrying a library from one project to another and have no documentation, you are going to have to go into the solution of a project that implements this and figure out how they implemented it there. This is a huge waste of time. In an off solution example, a programmer can generalize the implementation of a class so that you can use it for something else. If you go look at how someone else used it for something else, you’re gonna be stuck with the question

Well that’s great, but how do I use that class or function in general?

So, how do I approach this “commenting” stuff?

Rule of thumb I use; leave short but precise comments everywhere you can, but don’t tip the boat over with comments. Make sure they are constructive.

As a programmer, I don’t want to know why for instance, you created a list of employees unless there was some kind of serious decision behind it. For instance:

     //Create a list of employees
     List<Employee> employees = new List<Employee>();

Cool,  I am not learning how to code. After all “reading code” means I can see that this is a list of employees. However, if you do something like this:


       //This list is used for all employees, those pulled from the old system and the new system.
       List<Employee> employees = new List<Employee>();

This is A LOT more useful because now I know why you created a new list reference and initialized it with a blank list. This tells me that you have created a new list, to then you will it load with employees rather than re-point the reference to a returned list.

As I have mentioned before, comment as much as you can. Programmers can always read code well enough to soft through your written code fairly quickly but comments allow us to figure out why you created something a certain way. When there is a bug, I can quickly follow through with your comments and figure out where you went wrong.

To wrap things up

I think its about time I start wrapping this blog post up. However, I hope I have made a pretty good point; take the time to document. It will pay off in the end. If you work in a company that makes software, there is a pretty good chance you will be passing on your code to someone else. While you’re still around, you might get away with “Steve, how do I use this function properly?”. But has your code base grows and more and more people start using it, you can no longer answer everyone’s questions. You are now responsible for the efficiency of those around you and the quicker they can reuse your code, the better. One day you will not be there anymore and well, you might not care but some programmers do. After all, it is YOUR work that they are using in their software. If you ever have the chance to appreciate someone else leaving documentation for you to look at when they have long stopped working there, you will appreciate it.

 

Early Programmers: Initializing an empty object to be replaced

This is my first programming post and I thought I would start with something that I see all the time with programmers who are just seasoning. This is a common mistake I used to make as well and I believe it stems from the miss understanding of how variables and references relate to one another. Thankfully .NET based languages (including those which compile down to CLI) manage object deletion automatically. Otherwise you would have a pretty serious memory leak.

Lets look at a common example. We are using C# here but Java should look almost one to one. For instance if we have a simple class:

class SomeObject
{
     string someString = "";
     int someInt = 0;
}

and so I implement it as follows

public void main() //Lets assume this is our entry point
{
      SomeObject obj = new SomeObject();

      //someBLL is an arbitrary static class that gets your data from the database or some other source and creates a new object to hold that data

      obj = SomeBLL.fetchObjectData();
}

What has happened here is a new object reference has been created, pointing to an object, only to be repointed to another object. Considering the new object:

     SomeObject obj = new SomeObject();

is quickly overwritten by some new object that someBLL.fetchObjectData(); returns. As mentioned before, C# has a garbage collector to deal with objects that are being disposed, but in reality this is a memory leak, garbage collector or not. For those who are just starting programming in C# (and Java), there are a few things that needs to be outlined:

  • “obj”, as I have named it above, is a reference to an object in the memory. It is not actually the object itself, but rather points to some location in memory at which the object is located
  • If you use the same reference again, for instance in obj = someBLL.fetchObjectData(); as mentioned above, the variable “obj” now points to a new location in memory for a completely different object.
  • When you re point your reference, you leave the old object still in memory taking up space.

So what is the solution to this? It’s very simple. Simply don’t create a new object before repointing your reference somewhere else. To correct the code above, we can do the following:

 public void main() //Lets assume this is our entry point
 {
      SomeObject obj = SomeBLL.fetchObjectData();
 }

However it is common that doing something like this is not quite so simple. For instance, what if I need to decide from where my data is coming from? I am getting the same object type back from either, but they load the object with different data. Consider the following code:

//Using an int here might not be most preferred (use an enum instead), but it will suffice
pubic SomeObject getMyData(int dataType)
{
      SomeObject obj;
      switch (datatype)
      {
           case 0:
           {
                obj = SomeBLL.getDataA();
           }
           case 1:
           {
                obj = SomeBLL.getDataB();
           }
      }
      return obj;
}

With the code above, it is very tempting to initialize the “obj” reference with a default value. Before it hits the case statement, it equals “null” (null == 0 in C/C++). However, if you do initialize it with a blank object you are more than likely to run into a deceivingly existing data, even when you have passed in 3 into your function. If you use the function wrong, you’re better with your code returning a null object. Otherwise you will end up with an empty object eventually somewhere down the road and its much harder to track down why you have empty objects.

Regardless, this is the approach you want to maintain. Provided vast memory sizes and a garbage collector I have encountered this mistake even with somewhat seasoned programmers. For some it might be a tough habit to break, but we must never forget we cannot pollute the heap with empty objects; otherwise its bad programming.