2011-10-31

You're (Probably) Documenting That Wrong

Programmers Don't Like to Document Code

Code documentation is one of those tasks that software developers like to slack on. It's not the documentation that's important, it is the code that is important! Lots of developers either don't add documentation blocks or fill them in with only the basic amount of information. This post will look to address bad software documentation habits and how they can be improved.

Why Should We Document Code?

I've heard lots of arguments as to why writing code documentation is a waste of time. It takes too much time to write! The code is self-documenting! It will just be out of date by the next release, so why bother! Sure, documentation is time consuming. Agreed, well written code can be partially self documenting. Yes, if you don't have good habits in place the documentation will be out of date.

A lot of the arguments are perfectly valid if you are documenting the wrong things. Good code documentation provides a clear understanding of the contract that code will adhere to. Good code documentation will alert consumers of the code of the "gotchas" that can crop up from using it. Good code documentation will serve as a reminder for why something was done or not done.

What Should You Be Documenting Internally?

By internally I mean inside functions. This refers to code that is not part of an interface, but in the lowest level blocks of code that only developers on the same project will ever be able to see.

Internal Code that Is Hard to Understand

First, you don't need to be documenting each and every line of code in a function with what it does. If the code is not clear enough, it should be rewritten. One of the "code smells" from Martin Fowler's book Refactoring: Improving the Design of Existing Code is too much inline documentation. If you think you need to write a lengthy treatise, first consider renaming some variables for clarity or extracting the block into a well-named method with its own documentation. That's not to say that there aren't cases where you need internal developer documentation. If the code is difficult to understand because it is a complex formula or a particularly tricky regular expression, then by all means throw some comments above it. Just don't do this:

// Adds one and one together and saves the result in a variable
int two = 1 + 1;

The "Why" and not the "How"


So we've covered not documenting how something is done unless absolutely necessary, but sometimes comments are still needed to explain why something is being done. Often far more important than how you did something is why you did it, especially if it is not immediately clear to someone unfamiliar with the code. Sometimes comments are needed because the code looks funny, or relies on a side effect elsewhere that most readers of the code wouldn't know about. Comments like the following can be extremely useful for other team members (or yourself weeks later when you forget why you wrote the code that way):

// Subtraction must be performed first to prevent an 
// off-by-one error

// The list is guaranteed to be pre-sorted as a side 
// effect of validation. We do not need to sort the 
// list again here.

// We are adding the initial database entry here 
// because we need the unique ID for <reason> before 
// we will have status. The actual database entry will
// be updated after processing with the actual status.

What Should You Be Documenting Externally?

Far more important than what you document internally is what you document externally. By externally, I mean an interface exposed to consumers (either internal through private methods, or external through public methods). While the code's interface provides a contract between the code and its consumer, the external documentation provides the finer details of that contract.

What You Should Definitely NOT Document

Before we dive in to what you should document, I think it is very important to call out exactly what you should not document. Never expose implementation details in your external documentation. Doing so can lead to a lot of the issues that developers complain about. Internal details are far more likely to change and there's a good chance someone will forget to update the documentation. Even worse, a consumer of the code might make assumptions based on the details, so if the implementation does change the user's code might be broken. Consider the following:

/**
 * Returns an iterable object that can be used to access the 
 * individual line items in the order that have been sorted 
 * using a bubble sort.
 *
 * @return a LinkedHashSet object that provides access to 
 *         the sorted line items in the order as an iterator.
 */
public Iterable<LineItem> getSortedLineItems();

Not the best interface, but it serves a point. The only method you can call on Iterable is iterator(), so that is all that is really exposed to the consumer. Except, the documentation explicitly calls out which Iterable will be returned. Since the documentation is a contract, if you change the implementation of the Iterable from LinkedHashSet you will be breaking that contract. The sorting method mentioned is purely fluff; all the consumer really needs to know is that the items in the iterator are sorted. Now consider what a consumer can do with this:

LinkedHashSet<LineItem> sortedLineItems = 
  (LinkedHashSet<LineItem>) obj.getSortedLineItems();
sortedLineItems.add(new LineItem(...));

Most likely we don't want users adding new line items to the collection of sorted ones, but by documenting the actual type we allow users who want to be a little risky to try to anyway. I say risky because a few months down the road you may realize that line items don't have to be unique and so a set is not appropriate, so you switch to a LinkedList instead. You may also realize that the bubble sort is not as efficient as a quick sort and change that too. In both cases the documentation would have to be updated, and in the first case you may have just broken a consumer relying on the original documentation.

In general, make sure your code documentation is a black box that is only exposing information that its consumers really need. Avoid mentioning specific classes used for interfaces (I see ArrayList called out a lot for the List return type). Also avoid describing the algorithms used, since you might want to change them later.

Document At Minimum What the Documentation Tool Supports

So what do you want to document? Most programming languages have a documentation system that can be used to generate detailed developer documentation. Java has Javadoc, there is XML documentation for .NET, and C++ and C code can use Doxygen. In most of those cases the documentation will include a description of the class or method, any parameters it takes, the return value, and any exceptions it might throw. Make sure that you document all of these things at a very minimum.

Document Units of Measure

Here's something I see developers trip up on all the time. Consider that you're going to sell your software and I come up and hand you a contract offering to buy it for one million! Would you accept that offer? You shouldn't! One million dollars? One million pesos? One million bottle caps? There's a very vital piece of information missing from that contract: the units. At least one ~$200M space craft was lost due to using the wrong units of measure. Now considering the following piece of code:

/**
 * Calculates the amount of time it takes to travel the 
 * distance at the speed provided.
 *
 * @param distance the distance that will be travelled.
 * @param speed the speed at which is being travelled.
 *
 * @return the amount of time it will take to travel the
 *        distance at the provided speed.
 */
public static double calculateTravelTime(double distance, 
                                         double speed);

Does that look fairly standard to you. I've seen code like it at least a half of a dozen times. Since the person who wrote that was probably also the consumer it made perfect sense to him at the time. That is not a good specification though. What is the distance measured in? What is the speed measured in? What unit of time is being returned? Now consider the following:

/**
 * Calculates the amount of of time it takes to travel the 
 * number of kilometers at the speed provided.
 *
 * @param distanceKm the distance (in kilometers) that will 
 *        be travelled.
 * @param speedKph the speed (in kilometers per hour) at which 
 *        the distance is being travelled.
 *
 * @return the amount of time (in minutes) it will take to 
 *         travel the distance at the provided speed.
 */
public static double calculateTravelTime(double distanceKm, 
                                         double speedKph);

That looks a lot better and provides the missing information that consumers will need to call the method. Common places to look for missing units of measure include times, distances, sizes, amounts, weights, temperatures, and screen measurements (em vs. pixel).

Assumptions About Parameters

When we write new functions we automatically make a lot of unconscious assumptions about the parameters, especially if we are also writing the code that will be consuming that function. A lot of common assumptions we make are about the format or expected values of parameters. We expect that certain parameters will never be null. We expect that the string provided representing a phone number will be in the form "XXX-XXX-XXXX" and not "(XXX) XXX-XXXX". We expect that the number of items the user wants to add to their cart is a positive number. We expect the user's GPS latitude value to be between the range of -90 and 90. Many times we test for these assumptions, and some times we don't. The import thing is to document those assumptions though, so that consumers can know why something went wrong when they pass in bad values.

I've found it is a good practice to always document any restrictions on parameters next to the parameters themselves. I usually do it in parenthesis following the parameter description. Here is an example:

/**
 * Records the GPS location of a phone at a specific time.
 *
 * @param trackingTime a timestamp indicating when the phone's 
 *        GPS data was recorded (must not be null, must be in 
 *        the format "MM/dd/yyyy hh:mm:ss" with hours in 
 *        24-hour (0-23) military time).
 * @param phoneNumber the number of the phone being tracked 
 *        (must not be null, must be in the format
 *        "XXX-XXX-XXXX").
 * @param gpsLatitude the latitude of the phone at the time it 
 *        was recorded (must be in the valid range of -90 to 
 *        90 degrees).
 * @param gpsLongitude the longitude of the phone at the time 
 *        it was recorded (must be in the valid range of -180 
 *        to 180 degrees).
 *
 * @throws NullPointerException if any parameter is null.
 * @throws IllegalArgumentException if trackingTime 
 *         or phoneNumber are not formatted 
 *         properly, or if gpsLatitude or 
 *         gpsLongitude is not in the valid range 
 *         of degrees.
 */
public static void trackPhoneLocation(String trackingTime,
                                      String phoneNumber,
                                      double gpsLatitude,
                                      double gpsLongitude);

The Reason for Thrown Exceptions

I see this one quite a lot. Take a look at this example:

/**
 * Updates the note about a customer in the database.
 *
 * @param customerId the ID of the customer for which 
 *        the note will be updated (must be a valid
 *        customer ID).
 * @param note the new note for the customer (must 
 *        not be null, must not exceed 255 characters).
 *
 * @throws NullPointerException
 * @throws IllegalArgumentException
 * @throws NamingException
 * @throws SQLException
 */
public void updateCustomerNote(int customerId, 
                               String note);

It is nice that at least you know what checked and unchecked exceptions can be thrown from the method. Unfortunately, that's all you know. If a consumer does catch a NullPointerException thrown from this method, will they know why (especially if the message is the wonderful default "null")? Just as important as listing the exceptions is to list why they are thrown so consumers can troubleshoot their code. Consider the following revised version instead:

/**
 * Updates the note about a customer in the database.
 *
 * @param customerId the ID of the customer for which 
 *        the note will be updated (must be a valid
 *        customer ID).
 * @param note the new note for the customer (must 
 *        not be null, must not exceed 255 characters).
 *
 * @throws NullPointerException if the note 
 *         parameter is null.
 * @throws IllegalArgumentException if the 
 *         note parameter exceeds its maximum
 *         length or if customerId is not a 
 *         valid customer ID.
 * @throws NamingException if there is an looking up the 
 *         database connection details from the naming 
 *         context.
 * @throws SQLException if there is an error connecting 
 *         to the database or updating the customer
 *         record.
 */
public void updateCustomerNote(int customerId, String note);

That adds some clarity as to why an exception would be thrown and gives the user something to look for in their own code.


Null Return Values

I encountered this issue with a fairly well known API a few weeks ago. According to the documentation, I provide a URL to the function and I get an InputStream back with the contents of the file located at the URL. For the protection of the offender I offer this version:

/**
 * Opens a connection to the provided URL and returns 
 * an InputStream that can be used to read the 
 * contents of the file located at the URL.
 *
 * @param url the URL pointing to the file location 
 *        that the input stream will read (must not 
 *        be null).
 *
 * @return an InputStream that can be used to read 
 *         the contents of the URL.
 *
 * @throws NullPointerException if url 
 *         is null.
 * @throws IOException if there was an error 
 *         connecting to the file location specified 
 *         by the URL.
 */
public static InputStream openUrl(URL url);

That seems pretty good. If the URL can't be reached, exceptions will be thrown. Ok, I can handle that. I write my code, wrap it in a try...catch block, and use the InputStream. During testing we get a NullPointerException and we trace it back to the InputStream returned from the method being null. What? Nothing in the documentation says that the InputStream returned can be null. It turns out that if the URL can be reached but the contents of the document are blank then null is returned instead of an empty InputStream. Well, instead of having to figure that out through trial and error, it would have been nice for the documentation to have read more like this:

/**
 * Opens a connection to the provided URL and returns an 
 * InputStream that can be used to read the contents of 
 * the file located at the URL.
 * 
 * @param url the URL pointing to the file location that 
 *        the input stream will read (must not be null).
 *
 * @return an InputStream that can be used to read the 
 *         contents of the URL, or null if the contents 
 *         of the InputStream would be empty.
 *
 * @throws NullPointerException if url 
 *         is null.
 * @throws IOException if there was an error connecting 
 *         to the file location specified by the URL.
 */
public static InputStream openUrl(URL url);


The general rule is that if your methods can return null, make sure that the user knows that so they can null check the response. A common offender is "search" methods that don't find a result.

All Values and Meanings of "Return Codes"

Return codes, which are integer values that represent the response from a function, have a way of slipping in at the last minute. Often a developer will realize after writing a method that the caller needs to know the result of the operation (especially if it fails), so they change "void" to "int" and return some magic number. The programmer then updates his code to make use of this number and moves on with his day, and the actual meaning of it is lost in time. Consider the following:

/**
 * Scans a dropbox for new files and processes the files it 
 * locates.
 *
 * @return a return code indicating the result of the dropbox 
 *         scan.
 */
int scanDropbox();

I've seen code like that too many times before. I'm certain that the author knows what magic codes can be returned from that function, but I don't. Consider this revision:


/**
 * Scans a dropbox for new files and processes the files it 
 * locates.
 *
 * @return a return code indicating the result of the 
 *         dropbox scan. Possible return values include:
 *         0 -- Dropbox was empty
 *         1 -- Dropbox contained files that were processed
 *         2 -- Dropbox did not exist
 *         3 -- Dropbox could not be read
 *         4 -- An error occurred while processing a file
 */
int scanDropbox();

That's a nice start. The next step is a bit more refactoring than it is documentation, but it really adds clarity. With a few constants declared for dropbox return values the comments and code will be much clearer:


/**
 * Represents the return code indicating that the dropbox 
 * was empty and did not contain any files to process.
 */
const int EMPTY = 0;

/**
 * Represents the return code indicating that the dropbox 
 * contained files and that they were processed successfully.
 */
const int SUCCESS = 1;

// Other return codes here
...

/**
 * Scans a dropbox for new files and processes the files it 
 * locates.
 *
 * @return a return code indicating the result of the 
 *         dropbox scan. Possible return values include:
 *         EMPTY            -- Dropbox was empty
 *         SUCCESS          -- Dropbox contained files that 
 *                             were processed
 *         ERR_NOT_FOUND    -- Dropbox did not exist
 *         ERR_CANNOT_READ  -- Dropbox could not be read
 *         ERR_FILE_PROCESS -- An error occurred while 
 *                             processing a file
 */
int scanDropbox();

That makes the documentation and the code make more sense than the magic number return codes presented before.

Side Effects and Relationships

Code that has "side effects" (i.e. affects something other than it was intended to or that is not obvious) should definitely have the side effect documented. Side effects, in general, should be avoided. To reduce possible confusion about the side effects, always make sure that they are well documented and stand out (preferably in bold text). Additionally, if there is a relationship between two calls, such as one call cannot be made until another one is made, those relationships need to be listed. In those cases, always make sure to point out the required order of the calls and indicate exactly which method must be called first.

/**
 * Scans the database for outstanding orders and notifies the 
 * shipping system. 
 *
 * NOTE: This assumes that a connection to the database has 
 * already been established. Ensure that "init_db_connection()" 
 * is called prior to calling this function.
 *
 * SIDE EFFECT:
* Any error that occurs will be recording in the global * "error_code" variable and a textual description of the error * will be set in the global "error_msg_ptr" variable. */ void prepareOutstandingOrders();

Use Examples for Tricky Interfaces


Someone actually asked me the other day if it was appropriate to put code examples in the documentation. Yes! They say a picture is worth a thousand words, and sometimes a simple example code snippet can be be worth a few paragraphs. I have encountered many classes where I have read the class documentation and method documentation and still have had no idea how to use it. Usually, I end up searching Google for an example to use for clarification. For classes that are complex to use (for instance, have a lot of associations between calls, calls that have to be in specific orders, have specific "states", methods with lots of parameters, etc.) it is a good idea to show an example how of the class or is used in its own documentation. The same can be true for methods that require very specific sets of parameters. This is one of those areas where the documentation can easily become out of sync with the code, so make sure to always check for examples when you change an interface.

Conclusion

I hope that these tips help you to better document your code. Code documentation can take a lot of time, but really good code documentation can pay off big in providing clarity and understanding of how the code works and should be used. If you can think of anything I missed, please feel free to drop a comment.

2011-09-20

Confessions of a Programming Language Bigot

Hi. My Name is Michael and I'm a Programming Language Bigot.

One of the things that I've come to realize is that I'm a one trick pony. It's a very good trick though, at least for the moment. I'm a Java developer. I've been doing Java development professionally for over five years, and semi-professionally for several more years than that. I have mostly worked on user interfaces, and in the past few years have migrated to Java EE and Google Web Toolkit. I taught myself Java in college while most of the formal training was in Pascal (yes, Pascal, it was a great learning language), C, and C++. I had a brief stint with a company where I did C# and ASP.NET work, but for the most part I'm a Java guy.

So where does my Java bias come from? Why Java and not Python or C#? Well, some of it comes from experience, some from politics, but most of it comes from stupidity and laziness. The following expresses my feelings about other languages over the years and why I've come to feel I was wrong. In the end, I'll summarize the root of my language bigotry and how I plan to overcome it.

Python: It Wasn't Java

Early on in my semi-professional career I encountered Python. My first paid programming job was a Java project and consisted of a desktop client and a back-end server. My job was going to be to add some features to the client and server. I would finally get to test those Java skills I honed in college! One problem: the individual whose code I inherited was a Python fan and wrote his client code using Jython (Python interpreted by Java) and the server using a Python server called Zope. None of that code was documented either. The result was a frankensteined monster of Java and Python code. I had no experience with scripting languages, so the concept of no semicolons and indentation based blocks was utterly foreign to me. So in my first "Java" job I started behind the curve and was forced to learn Python on my own time very rapidly. That left a bitter taste in my mouth for the language that still exists today.

I know several people who are very fond of the language, and a lot of great tools (like Mercurial) are built using it. I've since gone back and looked into it more and it doesn't seem that bad, but still I don't want to learn it.

C#: Java Ripped Off by the Devil

When C# first came out I loathed the language. Not that there is anything inherently anything wrong with it, but I was a serious Microsoft hater back in the day and the initial version of C# looked like a blatant Java clone that would only run on Windows. Basically, it was built by Microsoft so I wanted nothing to do with it.

Years after C#'s debut I would work for about half a year as a C# developer. I learned to kind of like C#, especially how property getters and setters are paired. C# was easy to learn coming from a Java background. I've read up a bit on LINQ and think that it is a pretty solid idea, and something I wish that would come to Java. C# also seems to evolve much faster than Java, which is a great thing. Last year I received some training in C# and was actually looking forward to starting work on a C# project, but just before my transfer I was placed on another team that needed a seasoned Java developer.

Objective-C: Square Brackets Go Where?

Like everyone and their mother and their grandmother I wanted to learn how to make apps for my iPhone. There were two great hurdles to learning how to program for the iPhone: Objective-C and XCode. Even though Objective-C is considered a "C like" language and is compatible with both C and C++, its syntax, object system, and memory model differ greatly from those languages. When I first saw Objective-C code I couldn't make heads or tails of it. I quickly got over "message passing" using square brackets, strings with "@" signs, and the weird "release/retain" memory model. I never did get the handle of the way the named parameters worked though. XCode and Interface Builder also led to my confusion. The concept of doing work in one or the other and switching back and forth, and dragging and dropping visual elements to link objects was just a bit much to me. Needless to say, I gave up.

Since XCode 4 released things seem much simpler. At least it is all in one user interface. Someday I still want to write an app for the iPhone, but it's going to take some cramming on Objective-C to get there.

Ruby: A Language About Duck Enclosures on Tracks...Or Something

As Ruby, and especially Rails, started to gain traction I began to get interested in the language. I actually started a book club for Dave Thomas' Programming Ruby book. As we progressed through the book, however, the book club became more and more lost. We all had backgrounds in C, C++, and Java, so the concepts of a weakly typed language, duck typing, and closures confused us. It takes a large shift in thought to go from strongly typed and structured code to a more dynamic language, and due to a lack of time I lost interest in Ruby.

Later on I would lead another book club about Groovy, which is a scripting language similar to Ruby with Java as its foundation. I found that by relating Groovy's implementations of weak typing and closures to facets of Java I already understood that my understanding of those features in Ruby finally snapped into place. I actually respect Ruby a lot more now that I understand what is going on. The problem for me was the large leap from Java to Ruby, but with Groovy as a stepping stone Ruby makes a lot more sense now. I'm actually looking at learning Ruby and Rails to play with on side projects.

JavaScript: Only Useful for Annoying Popups and Browser Tricks

I do have to say that if there is one language I have truly looked down on it is JavaScript. Part of it stems from the name. When I was first learning Java I tried to learn JavaScript as well because, you know, it had "Java" in it. Within the first paragraph of the JavaScript book I was reading at the time I learned that the two languages were completely unrelated and the "Java" moniker was all for marketing. Back then, there was very little that JavaScript could do. It could pop up windows, move the browser window, scroll text in the status bar, and a few other "tricks" that didn't seem to have any use to me. On top of that, the weak typing and strange classless objects were too different from what I was used to.

Fast forward to today and HTML5. My belief is no longer that JavaScript is for silly scripts or pointless web tricks. JavaScript is the way forward for most applications. More and more applications are moving into the cloud and executing the browser: image editing, word processing, full-blown software IDEs. JavaScript now stands to be the most important of all of the programming languages, which is why I've been cramming my head with as much JavaScript know-how as I can.

Lisp and Clojure: I Can't Wrap My Head Around Reverse Reverse Polish Notation

I'm lumping these two into one category because they are both functional languages. The "functional" languages have just never made sense to me. I just don't think that way. I mean, in a sense I get it. In English we say "Add one, two, and three together.", but mathematically we're using to seeing "1 + 2 + 3" though. Writing "(+ 1 2 3)" to express that same concept just seems foreign.

I haven't worked on any code that would need to make use of the functional languages, so I'm sure I'm just missing the point. Personally, I just can't imagine trying to write a user interface or any other extensive amount of code in them.

Scala and Clojure: What Part of Java Virtual Machine Did They Not Understand

Clojure gets to show up twice, because it is both a functional language and one that runs on the JVM. My initial reaction to hearing about these languages was "why?" They are designed to run on the Java Virtual Machine (JVM). Isn't there already a language that runs fine on the Java Virtual Machine? Did they miss the "J" in JVM?

I admit, this is extremely shortsighted of me. Especially since I took an interest in Groovy, which also runs on the JVM. The difference was that Groovy code makes sense to me and for the most part looks like Java code. Once again, the functional languages are hard for me to read and understand, so I was biased against Scala and Clojure because they ran on the JVM and did not look like Java code.

To be honest, I still don't know much about Scala, other than it must be fairly good since it is running one of the most popular websites of our time (Twitter). It is supposedly more succinct than Java and is completely object oriented, and both of those aspects would be welcomed by any Java developers working today.

So Why be a Bigot?

I have to say, for the most part it is not intentional. There are a number of factors that have led to my bigotry.

First, Java became my preferred language because of my ideals. I was totally down for with "write once, run anywhere" slogan. I didn't want to write in a language that only ran on Windows/Macintosh/Linux.  Back when I was learning Java I hated Windows, revered Linux, and dreamed of being able to afford a Mac. I believed that code should be portable and run on whatever operating system I had available or was forced to use. Not being a fan of Microsoft, I had no interest in the .NET languages. I didn't agree with Microsoft's philosophies so I shunned their languages. Quite honestly, I'm starting to feel the same way about my beloved Java now that Oracle has the reins.

Second, there is the matter of tooling. Java was easy to learn because there were some powerful IDEs with great tools such as code completion. IDEs and editors are another area where I have a bias (Eclipse for development, Vi for text editing). Having tools that are cheap, easy to obtain, and easy to work with made Java a better choice for me. Another black mark against the Microsoft's languages was that you needed Visual Studio, which is anything but free (I do realize there are free learning versions now). Tools for other languages also have different layouts, commands, and keyboard shortcuts.

Third, the other languages don't look like the ones I know. Learning something new can be hard, especially if it differs greatly from what you've been taught is normal. New languages come with all new syntax and semantic rules which can be completely foreign, like Python's indention based blocks, JavaScript's blocks not defining scope, Objective-C's message passing and unique memory model, or functional programming in general.

Fourth, I don't need the other languages to do my job. My company doesn't use Ruby or Python or Scala or Lisp or many of the other languages I mentioned. Having skills in those languages provides me with no immediate benefits.

Lastly, the other languages pose a threat to my preferred language. Every week I see another blog post with the headline "Java Is Dead: Long Live Scala/C#/Ruby/Etc." What if one of those languages does manage to surpass Java? Java is my favorite language and the one I use professionally to put food on the table. All the years of knowledge I've gained about Java could all be thrown out if one of these other languages does gain a significant foothold.

The Ultimate Root of My Bigotry

With the exception of the ideals, there is one root to all of my bigotry: time. I lack it. Learning a new language takes time, especially if that language is vastly different from the languages you already know. It takes time to learn new concepts. It is not that I'm too dumb to learn new languages and concepts, it is that I'm too dumb to take the time try to learn them. It is far easier to downplay a language than to take the time to learn it.

With the exception of perhaps the functional languages, the worst part of learning a new language isn't even the language itself (syntax is easy); the worst part about learning a new language is learning the huge set of libraries. I know a ton of different Java libraries, and it has taken a long time to gain that knowledge. Learning the equivalent libraries in a different language is a time consuming prospect, so its easier to just dismiss learning the language, especially if you have no immediate use for it.

Learning new IDEs and editors also suffers from the same time constraints. I know Eclipse's shortcuts by heart. Every time I use NetBeans I am completely lost and frustrated. That doesn't mean that NetBeans is a bad IDE, it just means that I don't want to take the time to relearn everything I already know how to do very well. The same is true for Vi vs. Emacs. I learned Vi first and I am decently adept at it, and every time I'm forced to use Emacs I can't figure anything out, including how to exit it (I looked this up after writing this and now know how). I know people who swear by Emacs, but I just haven't taken the time to learn it.

The Cost of Programming Language Bigotry

So what is the downside to all of my bigoted ways? This whole article was spurned by my renewed interest in JavaScript. Like I said, it looks like web applications are the way forward, so going forward means casting aside my old feelings about that language and taking a fresh look at it. My fear is that if I don't, I'll get left behind when software development moves in that direction.

In the beginning I mentioned that I've come to view myself as a one trick pony (that trick being Java development). Right now that is a solid position to be in, and I don't see Java going away any time soon. Some day that trick won't be as useful though, and if I don't learn any new ones then I'm afraid I'll be like the developers who stuck with COBOL or FORTRAN because those were the only languages they needed. Those people are most likely out of jobs and can't find new ones with their skill set (and with the recent layoffs in the space program, I actually know that some of them are in that position).

Who knows, if I stuck with Objective-C I might be making the big bucks as an app developer (I seriously doubt this).

Really good programmers are lifelong learners and keep their skills up to date. While I've definitely done this in regard to Java, I've spurned other languages in the process. It turns out, those other languages are quickly becoming more applicable. Java 8 is set to include closures. When I first encountered closures in Ruby they were foreign and made no sense, so I didn't care for them. After seeing them in Groovy I started to understand the power of closures, and now I'm excited about the fact they will be coming to Java. Without that exposure from other languages I probably would have a hard time understanding closures when they do come to the one language I'm most familiar with as it evolves.

Conclusion

So how I am going to overcome my past views? Well, part of it is just carving out time to learn new things and being more open to learning them. Using an e-reader and my phone I've already read a couple of great JavaScript books (Douglas Crockford's JavaScript: The Good Parts being a fantastic quick read). I've started experimenting with JavaScript to learn it better. I'm ramping up to start learning Ruby and Rails and challenged myself with a simple project. I'm even challenging myself to learn TextMate just to have a different editor under my belt. I just don't want my old views to affect my future value as a developer going forward.

Still, don't expect me to be writing Lisp or Python code in Emacs any time soon...I only have enough time for my open mindedness to go so far :)

2011-08-29

Coding in the Cloud: Diving In with Online IDEs

It seems like everything today is moving to "the cloud," which is that nebulous term for software applications and data storage hosted remotely on some server far across the internet. So if all of our software is moving to the cloud, why not the tools used to create that software? Well, in some cases, those tools are rolling out right now. Heck, even GitHub recently added basic editing capability. This article will look at some of the up-and-coming online IDEs.

Pastebins: The Little Brothers of Online IDEs

The first category of online IDEs that we will explore can't really be called IDEs. These IDE-lite web tools are more like pastebins on steroids. Pastebins provide basic text editing capabilities and features such as syntax highlighting, but not support for version control or deployment. Instead, pastebins allow snippets of code to be saved and shared with other users. The following web tools extend beyond the basic premise of typical pastebins in that they also allow your code to be executed, something vital for testing. This category of IDE-lite is excellent for simple proof of concept work, but not for developing full applications. The main feature that these web tools support is quick and easy tinkering and the ability to share small snippets of code.  The three IDEs is in this space that we will be covering are JS Fiddle, Python Fiddle, and ideone.

JS Fiddle

JS Fiddle


JS Fiddle may not be a fully featured IDE, but for trying out quick ideas with HTML, JavaScript, and CSS it is a great place to start. The web site provides separate windows for HTML, JavaScript, and CSS, and another window that will display the final output. The editors support syntax highlighting. JS Fiddle allows you to choose from multiple versions of each of the major JavaScript libraries like jQuery, Mootools, Prototype, YUI, and Dojo when testing your JavaScript snippets. It even allows you to link in the URL to your own JavaScript and CSS files hosted remotely. Your "fiddles" can also be saved and shared with other users. JSLint is also integrated to ensure your JavaScript code is tip-top.

Python Fiddle

Python Fiddle


Python Fiddle is similar to JS Fiddle except it is focused on Python instead of the HTML/CSS/JavaScript combination. Python Fiddle provides a sandbox to test snippets of Python code, and provides support for many of the popular Python libraries. Like JS Fiddle, Python Fiddle supports syntax highlighting, and your snippets can be saved and shared with other users.

ideone

ideone


ideone is by far the most ambitious of the glorified pastebins. Much like JS Fiddle and Python Fiddle, ideone provides syntax highlighting and the ability to save and share snippets of code. Unlike the other two, however, ideone aims at supporting well over 20 different languages, including C, C++, C#, Java, JavaScript, Go, Groovy, Objective-C, Perl, Python, and Ruby. ideone aims to be a one-stop shop for all of your snippet testing needs.

Will the Real Cloud Based IDEs Please Stand Up

What separates the real cloud based IDE's from their little brothers is that they include support for importing projects, working with version controls systems (primarily Git), and in some cases support deployment. Of the fully-featured online IDEs we will be examining Eclipse Orion, Cloud 9 IDE, and eXo Cloud IDE.

Eclipse Orion

Eclipse Orion


Eclipse Orion is an online IDE effort being led by part of the Eclipse foundation. The IDE is primarily focusing on web development languages such as HTML, JavaScript, and CSS, but includes initial support for Java. Orion differs from the other online IDEs in that it is not necessarily meant as a hosted solution, but actually intended to be deployed into private environments. You can download Orion and host your own instance. For instance, you company might host an Orion instance for all of its developers, or even for specific projects.

Right from the start Orion lets you initialize new projects by cloning from Git (the only supported version control system), uploading a zip file of your project, or by creating a new HTML5 site using initializr.

For my testing I cloned a repository in Git. For some strange reason I had to use HTTP for Git instead of SSH, since I cannot find the SSH key listed anywhere in the IDE. Orion provides a GUI for interacting with Git, but it can be a bit confusing. While I understand it is a good practice to use Git's "status" command before performing a commit, Orion hides the commit interface under the Respository window's "Show Status" option, which is very confusing.

The editor is fairly nice. It supports line numbers and syntax highlighting. Support for outlining and content assistance is very spotty. Currently outlining is only supported for JavaScript and HTML, and content assist is only available for CSS. Basic editing commands such as Alt+Up/Down to move lines, Ctrl+D to delete a line, and Ctrl+L to go to a line number are supported.

For more information you should check out Eclipse's "Getting Started Guide" for Orion.

Cloud9 IDE

Cloud9 IDE


Cloud9 IDE is another web-based IDE specialized towards web development, and is the only one on the list with a paid solution. Cloud9 IDE is free for projects that will be publicly visible, but for private projects the cost is $15 a month. The IDE supports HTML, JavaScript, and CSS, but also includes support for Ruby and PHP. Of the online IDEs, Cloud9 is the only one on this list that supports Mercurial as well as Git. In fact, GitHub and BitBucket support is baked in from the start and you can actually create your Cloud9 IDE account by logging in through one of those services. In fact, when I created my account that was the only way (hopefully something that has been fixed, but I don't want to create another account to test the idea).

Provided you log in through GitHub or BitBucket, importing projects is a breeze. You will need to be familiar with the command line versions of Git and Mercurial though, as all version control support is implemented through a console at the bottom of the screen. The tool does provide an SSH key that you can register with GitHub so you won't be prompted every time you try to push code.

The editor is very responsive and supports some of the niceties I've come to expect from Eclipse (Alt+Up/Down to move lines of code, Ctrl+D to delete a line, Ctrl+L to go to a line). The syntax highlighter is nice, and there is support for code "beautification." The editor will also notify you of code errors as you type.

Your HTML pages can be tested right inside the interface. If you are looking to deploy to an outside host, Cloud9 IDE supports deployment to Joyent and Heroku.

eXo Cloud IDE

eXo Cloud IDE


eXo Cloud IDE is the last online IDE we will be looking at, and probably the most ambitious. eXo Cloud IDE aims to support the three web languages as well as Ruby, Java and JSP, and PHP. It also provides the most deployment options: CloudBees, CloudFoundry, Heroku, and Red Hat OpenShift.

eXo Cloud IDE supports Git for version control, and provides a nice menu at the top of the screen for all of the Git operations. A bit tucked away under the "Window" menu there is an "Ssh Key Manager" utility so you can set up an SSH key to use with your remote Git host (such as Github). Importing a project from Git was a bit more difficult with eXo Cloud IDE in that first you had to create a folder, initialize a repository in the folder, and then perform the clone. Most of the other services let you start with "clone" and took care of the rest for you.

The editor is nice enough. It supports syntax highlighting and line numbers. It doesn't have some of the text editing niceties I mentioned for the others (Alt+Up/Down is absent, but Ctrl+D for delete line and Ctrl+L for navigating to a line do exist). It does support a nice outline view though for navigating your code, which is very helpful. One extremely nice feature that eXo Cloud IDE supports is a minimal form of code completion (with the standard Ctrl+Space command). eXo Cloud IDE provides basic auto-completion for HTML, JavaScript, CSS, and Java, Ruby, and PHP code.

Comparison

The following table provides a comparison of the different features of the online IDE's mentioned:
IDE Cost Languages Version Control Deployment
JS Fiddle Free HTML, JavaScript, CSS None None
Python Fiddle Free Python None None
ideone Free 20+ Languages None None
Eclipse Orion Free HTML, JavaScript, CSS, Java Git None
Cloud9 IDE Free (Public), $15/mo (Private) HTML, JavaScript, CSS, Ruby, PHP Git, Mercurial Joyent, Heroku
eXo Cloud IDE Free HTML, JavaScript, CSS, Ruby, Java, PHP Git CloudBees, CloudFoundry, Heroku, OpenShift

Conclusion

While I don't see the online IDEs taking over for most developer's day-to-day tasks, I think that they are a welcome addition to any developer's toolset. Web developers will benefit the most at the moment, as most of the tools are geared towards HTML, JavaScript, and CSS development. Seeing as how most software is moving into "the cloud" and those are the technologies that are driving new web-based applications, that's not a bad thing. The pastebin style tools are useful for quickly testing new concepts and don't require a lot of the overhead of a full-blown IDE.

Git seems to be the clear winner when it comes to version control support in the online IDEs. GitHub in particular is favored (it is a great service so I can see why). Only one IDE offered support for Mercurial, and Subversion lovers are out of luck going forward into the online world.

Personally, robust code completion and refactoring tools are a must in my world, and support for those features is definitely lacking in the current generation of online IDEs.  I think that as the tools mature support for those features will increase. I could see that several years down the line there will be whole software teams that develop and deploy all of their code right from their web browser.

2011-08-17

Dealing With Management Disdain for Refactoring and Unit Testing

Refactoring and unit testing seem to have bad stigmas from upper management. From my experience, most managers see no value in performing either of those tasks. Refactoring (rewriting existing code) and writing unit tests take time, which is time that could be used to add valuable features for the customer. While this is true, I would like to touch on some of my personal experiences with refactoring and unit testing that have proven that there is a value to those tasks.

The Golden Book of Refactoring

First I'd like to cover refactoring. Years ago I read Martin Fowler's excellent book Refactoring: Improving the Design of Existing Code. This is probably the most important book I've read in my programming career. The book covers the process of identifying code to refactor (through the use of "code smells" that are rules of thumb to identify potentially bad code), the different refactoring techniques that can be applied, and how to actually apply the refactoring techniques. While around 50% of the book is obsolete (most modern IDEs provide built-in support for performing the refactoring techniques, which is a large part of the book), the rest of the book is pure gold and still worth the read just to learn how to identify bad code and know which refactoring technique to apply to fix it. After I read this book I completely changed how I view coding. I found myself seeing potential refactorings before I even wrote the code, and I changed the design before even getting started so the code wouldn't need to be rewritten in the future. As soon as I read it I started tearing apart our baseline at work. I found all sorts of places that needed cleaning up. Every month my line of code count was in the negative, which management wasn't too pleased with as that was our main metric at the time (LOCC is a horrible metric).

The Fallacy of "Not Writing New Code is Not Being Productive"

Early on in my refactoring experience I started getting resistance to changing the code base. It was the standard argument about how I could be working on new features instead of rewriting old ones. So why did I persist with my refactoring arguments? Even more importantly, why did management see the light? Well, the project was a Swing GUI with a lot of screens. Most of those screens were very similar or made of similar components. In the original design when I came on, most of that similarity came from copy and pasted code or someone's independent work to make a similar screen. The average time for a team member to create a screen from scratch was about one week.

Duplicated code is one of those code smells I mentioned earlier. I managed to create a framework of superclasses and extracted classes that formed commonly used components. It took me a little over a week to perform all of the refactoring. In that same time, I could have written one screen and added value to our product! What a waste! Well, it turns out not. After the refactoring was complete, the time to create a new screen using the new base classes and refactored components dropped to one screen per day. So for one screen's worth of effort one week, you got five screens worth of effort the next. I also refactored the styles and sizes of screen components into a set of utilities, so that when all screens used that utility they would have a uniform look and feel, which was something that was missing before.

That's not to say that the payoff will always be that big. This was an extreme case, but it makes my point about refactoring having value very well.

You Might Break Something!

I recently completed a major refactoring on a set of EJBs for another project. There were about 8-10 EJBs that each consisted of a single method that was between ~200 and ~500 lines long. The code was a nested mess, and once again it was an example of copy-and-paste happiness. Once again my new boss was totally against the idea of rewriting the code, mostly because of the possibility of me breaking something. His favorite phrase when I mention refactoring is "Don't break anything." So this time I added another best practice to the mix: unit testing. For every block of code that I refactored out of these massive EJBs I created a unit test to ensure that the block worked as expected. Now not only was the code cleaner and better organized, but it was now tested as well.

Testing Takes Too Long and Is More Code to Maintain

I tend to hear this one not just from management but from other team members. On another project separate from the two mentioned above, my team implemented unit testing late in the game. The benefits were still there though. Can you guess the results? We found bugs (which is what unit testing is designed to do), some of which had been in the system for ages. We also tended to break code less, since we ran the unit tests every build to regression test our work. Every time we did discover a bug, we wrote a unit test to expose it and then fixed it. We were lucky on this project that management was supportive of our decision to write unit tests (mostly after we completed another major refactoring effort that improved productivity dramatically).

Unit testing also provides another benefit: improved documentation. If a unit test is well written it can often uncover gaps in the documentation (what happens if I pass null for this parameter? what if I use a negative number for this one?). Many times I ended up rewriting the Javadocs to better reflect what the method was doing after the unit tests exposed its true behavior.

How I've Dealt with Management

So how have I dealt with management pushback? Well, on my first program I had already started the refactoring without permission. Once our team started reviewing the code and realizing how much was changing I started to get heat about it. So I explained my vision about what I was trying to accomplish and how the new code base would be easier to extend and create screens. I quoted a bunch of stuff from Martin Fowler's refactoring book about the problems with the code. I had a plan for my refactoring and by explaining it and the benefits I was able to get buy in from the rest of the team and management. If you plan on making large sweeping changes to code, explain your changes first and explain their benefits (quote Martin if you have to).

The second way I've dealt with it is to do it in minor steps. As I work on a new feature that is similar to and older one, I clean up a bit of the older one and reuse what I can. On the project that added the unit testing the refactorings were done very incrementally, but over time the payoff was huge in productivity gains for adding new features. The general idea is leave the code you touch better than when you started with it. Extract out a method or class here and there that can be reused, or rename or document something that is unclear. Add a unit test for both the older feature and the new one. Little changes over time can add up without causing too many ripples.

Conclusion

So in conclusion, while unit testing and refactoring do take up time that could be spent developing features, they also provide benefits by making software more robust and easier to understand, maintain, and extend, which can make adding those new features much easier going forward. From my experience, refactoring often (but not always) leads to productivity gains over time. Feel free to use my examples next time you have to convince your manager that they are good practices and not wastes of time.

2011-08-14

How to Fix a Broken Software Patent System

Oracle vs. Google
Apple vs. HTC
Lodsys vs. App Developers
Microsoft vs. Barnes & Noble


Recent headlines are filled with cases of patent battles between technology and industry titans. It seems like more and more patent cases (and especially software patent cases) make the news every day. NPR recently did a fantastic piece in their This American Life program called When Patents Attack. It is definitely worth the time to listen to or read the transcript.

As a software engineer, I have to agree with many of my colleagues in the field that the patent system is broken. I have actually witnessed it firsthand to a certain extent. Several years ago I worked on a project for what my team thought at the time was a novel idea. After we had a working prototype, our company of course wanted us to look into patenting the idea. So I was tasked with doing some patent research. What I found kind of surprised me. It turns out, our idea wasn't novel. In fact, it had already been patented three times! Two of the patents were for recent years, but one of them was over a decade old. Ours wasn't a new idea, it was just one that hasn't been capitalized on yet.

Now the real question is how did the exact same idea get patented three times in the first place. Well, having read the patent applications in full (and they were quite lengthy) it was kind of easy to understand. Each application was filled with technical terms and legal mumbo-jumbo that made the applications nearly unreadable. Many of the claims in one patent used synonyms for terms in the other. When you boiled it down to layman's terms though, the patents were identical. Heck, with enough big words obfuscating what you really mean, you can patent anything (the When Patents Attack story mentions Patent 6,080,436 titled "Bread Refreshing Method", which is better known to the rest of the world as "toast").

In truth, to find the patents matching our idea it took some searching and using combinations of terms that were similar to the ones we used to describe our "novel idea." I knew which terms to use because I was familiar with the subject matter. In the case of the United States Patent and Trademark Office (USPTO), the patent applicant reviewers probably are not as familiar with the field and will be less likely to make the connection.

Something else I found while Googling for the patents were articles on the subject of our "novel idea." Some of these dated back well beyond a decade when the first patent was filed. Technically, these should have been considered as "prior art" since they described the same concept before the first patent application was filed. In all reality, none of the patents should ever have been valid. It is easy to understand how the first patent could have been granted. Back when it was issued, the USPTO probably didn't have the search capabilities to identify works of prior art (before the days of Google), but the subsequent patent applications should have turned up this prior art when they were being reviewed.

The point is, there are numerous patents that exist that are either identical or similar enough to be considered the same, and many of them probably aren't valid because they weren't novel ideas. These patents slipped through the cracks, but are now valid and are now viable weapons.

Weapons?

The NPR piece describes an arms race among patent holders. Apple and Google and Microsoft are amassing caches of patents to use "for defensive purposes." (updated: such as Google's announcement today to purchase Motorola and its 17,000+ patents). The NPR piece compares it to the term "Mutually Assured Destruction," which in nuclear terms means "if you nuke me then I'll nuke you and we'll both be destroyed." In patent terms it translates to "if you sue me because I infringe on your patent then I'll sue you because you infringe on mine." The idea is that if you hold enough patents that your opponent infringes on then they can't sue you because you can counter-sue them back.

That's not a healthy strategy for innovation and the patent system. One, it means that many patents are being cached not for their innovation value but instead for their litigation value. Sadly, the patents that are worth the most are really the least innovative ones: far reaching, overly vague patents that cover general concepts. Two, it means that smaller players such as small businesses and startups can't participate. Back to the nuclear analogy, it means that a country like San Marino (the startup of countries) better not try to build a weapon (product that may infringe a patent), or better pray it doesn't get noticed, because any of the bigger countries could take it off the map at any point.

There is one more problem with this patent approach: patent trolls (i.e. non-practicing entities). Patent trolls collect patents and use them for litigation or licensing, but do not build any products themselves. Which means all of their patents are offensive. Google can threaten Apple with a patent claim that Apple infringes on if Apple tries to sue Google for one of Apple's patents that Google infringes on. In the end, it could just be a stalemate. That doesn't work with the patent trolls. If a patent troll sues Google for patent infringement, there can be no counter suit since Google can't be holding any patents the troll is infringing.

Some people are calling for software patents to be abolished again (software could originally not be patented until a 1998 ruling, just in time for the original web gold rush). I think that once the software patent genie is out of the bottle, there's no turning back the clock to undo what was done. There are too many big established companies with too many lobbyists to let that happen. So the best we can do is reform the broken system and fix the problem going forward. 

Part of the problem with the patent system seems to be that the USPTO is overworked and underfunded. A recent proposal called the "America Invents Act" helps to push some reform through the senate. One of the benefits is that there will be additional USPTO offices and that the USPTO will get to set its own fees and keep all of those fees (previously it did not set fees and only received a portion of them).

With that in mind, I have a few suggestions for how these fees should be set and how the patent system should be reformed.

First, raise the barrier to entry. It currently costs as low as $110 to file for a patent (non-refundable whether the patent is issued or not), and $755 if the patent is issued. Raise those values to $1,000 to file (non-refundable still) and another $1,000 if the patent is issued. This would probably cause the number of bogus or far-fetched patents to be lowered and improve the quality of the patents that are submitted. This would mean more money for the USPTO and hopefully fewer patents to review (leading to more thorough and better reviews).

Second, reduce the duration of the patents (at least for software and business process patents). Patents currently last 20 years from the date of filing. When the patent system was first formed, technology (like the cotton gin) didn't advance at nearly the rate that it does now. A 20 year monopoly in the software field is well beyond the useful lifetime of most software products. Five years, or at most seven, should be more than sufficient, if not still excessive for software patents. Can you imagine if Apple refused to license their multi-touch patents for 20 years? Only the iPhone would be able to pinch zoom for that long? That number needs to go down.

Third, raise the maintenance fees and make them annual. Currently, at 3 1/2 years it costs $490 to renew a patent. At 7 1/2 years that number goes up to $1,240, and at 11 1/2 it is $2,055. I propose that a more exponential scale be used.

Year 1: $1,000
Year 2: $10,000
Year 3: $50,000
Year 4: $100,000
Year 5: $500,000
Year 6+: $1,000,000

There would be numerous benefits to this approach. First, the smaller values initially would allow startups a few years to cement the patent and get something off the ground. Second, patents with lower value and less utility would probably not be renewed after a few years, making the technology available to other companies sooner. Companies would probably only pay the upkeep on their most valuable patents and the ones that differentiate them from their competition. Third, the patent trolls will have a hell of a time paying for the upkeep on the thousands of patents they are hoarding. Unfortunately, they would probably still pay for the broadest reaching patents they can use to sue with, but it would probably cause them to reduce their patent portfolios and their threat to small companies. Fourth...that would be one well funded USPTO! With that kind of money they could hire field experts to examine patents and have multiple reviewers for patent applications. The best possible solution would be to apply those fees starting when they change (i.e. next year if your patent is 6+ years old then there's a $1M cost to renew, only 2 years then it would be $10,000).

In addition to the pricing changes, the USPTO should provide an easy system for third parties to challenge patents. The America Invents Act actually is looking to implement this by providing a one year post-grant review period where third parties can submit prior art and challenge a patent. There are multiple benefits to this as well. One, it gives competitors a chance to defend themselves against a patent by striking it down early. Additionally, if the fees are upped as I mentioned above, then a lot fewer bogus patents will likely be filed. If an idea is not really new or radical and is likely to be shot down in the post-grant phase, the filer will be less likely to file the patent at all since they will lose out on the non-refundable application fee. This too would likely improve the overall quality of patents that are granted. The hard part on the USPTO is to make this fair for both parties, as large companies could easily file endless reviews against startup competitors and put pressure on them with already granted patents past the review period. The system will need to be affordable so that startups can defend themselves and file reviews as well.

What do you think?

2011-08-09

So You Want to be a Programmer

So you want to be a programmer?

That is great! Programming can be very rewarding.

First of all, I'd like to to touch on why I am writing this. I see a lot of blog articles with titles like "10 Things Every Programmer Should Know." When I go to the post, I see a bulleted list of things which only five of which I agree with. There are a lot more things that I think programmers should know. This article is intended for the newbies out there, but I think some experienced programmers may get something out of this.

So you want to be a programmer?

My first question to you is simple: why? Are you doing it because you love computers? Are you doing it because you want to know how software works? Are you doing it because you have this cool idea for an app for your phone? Are you doing it for the money?

Only three of those are good reasons to consider programming. Software engineering and the medical professions are the top growth industries at the moment and also offer some of the best salaries, which is great to those of us in those professions. However, if you don't have any interest in how software works or in helping people, those are two professions you need to steer clear from. The last thing a programming team needs is someone who hates coding and the last thing a hospital needs is someone who hates working with patients. While programming can be very rewarding, it can be a lot of hard and frustrating work as well.

Personally, in my first years of college I was a Psychology major. I played with computers, tried different operating systems, and modded games as hobbies. I also worked as a tutor in the campus computer lab. Just before I graduated with my associate's degree my psychology professor sat me down and recommended I change my major when I go to a university. It's not that I wasn't good at my psychology coursework, but he knew I was better at computers, so he recommended I go into computer science. That was probably the best counseling I ever received at school.

That's enough about me. The point is, you need to be passionate about programming and computers if you want to be really good in the field. You also have to be really passionate about learning. Compared to other fields, the lifespan of programming languages, tools, frameworks, and development methodologies is similar to that of gnats. If you're not constantly learning, you're a dinosaur pretty quickly. Luckily, reading blogs (like this one), is a good way to stay up to date, and there are lots of resources and tutorials on the internet (Google is your friend).

So what is it that you really need to know to be a good programmer. Quite a lot, actually. First, it probably takes learning a programming language.

Learn a Programming Language

So which programming language should you learn first? It really depends on what you want to develop. Want to develop games for the XBOX 360? You should probably learn C# or C++ then. Want to developer enterprise level back end services? Want to develop Android phone applications? You'll probably want to learn Java. Want to develop a quick and dirty content-based web site? You'll want to look into Ruby and Rails. Want to develop iPhone applications? Objective-C is the language for you. Want to develop interactive web pages? The holy trinity of HTML, JavaScript, and CSS will be required.

It turns out learning a language is usually pretty easy. Once you have the concepts of one language down it is not hard to pick up another one (usually). The hard part is learning the libraries of another language. Libraries provide the meat of using a language. Some languages, such as JavaScript, provide very minimal libraries. Others, like Java, have thousands of commands in their standard library. Fortunately, for every language, there's probably only a subset of the libraries that you will be using. One of the best ways to learn a language is to read source code from an existing project. Search Google Code or Github for projects related to your interests and read the code for those projects.

In addition to a bread-and-butter programming language such as the ones mentioned above, you'll probably eventually want to learn a scripting language such as Perl, Python, Ruby, or Groovy as well. Scripting languages are good for small tasks and for performing maintenance, and in many cases for writing full applications (especially for the web: Ruby on Rails is a popular framework for running a website, and Groovy has an equivalent called Grails). For instance, say your program has a configuration file and the format of that file needs to be completely changed. You could manually change every single instance of the file (which would take a lot of time), or you could write an application that would change the file to the new format (which still make take a while to write). The other option is to write a quick script to make the change.

Learn how to Document Code

Another key point when learning a programming language is learning how to document code for that language. Javadoc (for Java), XML Commenting (C#), and Doxygen (C++) are a few ways that you can write documentation for you code so that it can be generated into human-readable documents such as web pages. Documentation is almost as important as the code itself is, especially if you are working in a team environment. While most programmers hate documenting, well documented code can reduce the confusion when reading the code for people who didn't write the code and might not know why it is doing something (or for you many years down the road when you don't know why you were doing something).

Learn how to use an IDE

Once you know the language, you next step is to learn the development environment for it. Modern software is usually developed in an IDE (Integrated Development Environment). For instance, as a Java programmer, I spend most of my day coding in an IDE called Eclipse. For Java there are other options like Netbeans as well. If you plan on writing C# applications, you'll most likely use Microsoft Visual Studio. If you're looking to write iPhone apps, then you'll be using XCode on a Mac. No matter what IDE you use, make sure you learn how to use it well. IDEs offer features like code completion which will show you what commands are available when writing code (great if you're first learning, so you don't have to look up everything on the internet all of the time or wade through gigantic documents). IDEs can also color code your code so you know what are reserved words, what are the variables, function calls, etc.

As well as learning an IDE, it is a good idea to have a good handle on a basic text editor (usually something better than Notepad though). For Windows, this could be Notepad++, TextPad, or e. For the Mac, the most recommended text editor is TextMate. If you're on Linux, welcome to the holy war between vi and emacs (I'm personally a vi guy). These editors are great for when you have to edit config files, HTML or XML documents, or make small code changes.

Learn a Version Control System

Another very important tool that you will need to learn how to use is a version control system. Version control allows you to save snapshots of your programming work at various stages in time. That way, if you break something, you can always revert it. The current top version control systems are subversion, mercurial, and git. Mercurial and git are very similar, but subversion can be quite different from the two. Git seems to be the rising star and the one you should probably pay the most attention to. There are numerous sites that will host your code for free (as long as it is publicly accessible) such as Google Code, BitBucket, and GitHub.

Learn XML

So now that you know you need to learn a programming language and have picked one and its IDE, there are a few technologies that pretty much go hand-in-hand no matter what language you are learning. The best example of this is probably XML (Extensible Markup Language). XML is the lingua franca of the web. It is the basis for XHTML, is used in configuration files, web services, data files, and even in some databases. No matter what language you choose, at some point you will most likely have to interact with XML documents. It is a good idea to at least understand the basics of XML. Another technology called JSON (JavaScript Object Notation) is making inroads on a lot of what XML is used for, so that's another one you should probably look into as well.

Learn Regular Expressions

Regular expressions are another feature that spans all programming languages and you will probably have to use at some point. Regular expressions provide a syntax used for searching text, replacing text, and ensuring that text is in a specific format. For instance, say that a user types in a phone number or credit card number and you need to validate it is formatted properly (has all of of the right digits, parentheses, and dashes in the right places). The easiest way to do this validate the text the user entered against a regular expressions representing a valid phone number or credit card number. Regular expressions are also useful for extracting values out of text, such as pulling just the area code from a phone number.

Learn SQL

Database interaction probably the next important thing you will need to learn. At some point in your career as a programmer you will most likely have to interact with a database. You may not be the one to set it up (there's usually "the database guy" on the team that does that), but you will need to learn how to query it and update it. Most databases today are "relational databases", and luckily there is a standardized query language called SQL (Structured Query Language) used for these interactions. While you might not need to know all of the details of how databases work, at a minimum you should know that basics of SQL (SELECT, UPDATE, etc.). Looking forward, a new breed of databases is emerging that don't use SQL (and are thus referred to as NoSQL databases). These databases don't have a standard query mechanism yet, and often use JSON for storing data.

Learn Web Technologies

Even if you're not working directly with the web, a lot of technology and applications are moving to web-based applications. It will be a good idea going forward to learn the basic web technologies (HTML, JavaScript, and CSS). HTML (Hypertext Markup Language) is a markup language used to display the structure of a web page. JavaScript is a programming language that controls the behavior of a page, and CSS (Cascading Style Sheets) define the style and look and feel of the page. If you will be doing any serious web page work, expect to have to learn additional libraries (such as jQuery or prototype) if you want your page to have any cool effects.

Learn a Unit Testing Framework

Another set of technologies you should learn is the *Unit testing framework that goes with your language of choice. Unit tests are pieces of software that are designed to test units of other pieces of software. Writing test cases for your code ensures that the code that is being tested is well designed and bug free. If you're a Java developer, you'll want to learn how the JUnit library works. For C#, it is called NUnit, and C++ developers will probably use CppUnit.

Conclusion

So that's just a taste of some of the things you will need to learn to be a good programmer. That doesn't even touch on the numerous design philosophies or development cycles that are also involved, but it should be enough to get you started. Pick up a few books on those subjects. Read some blogs. Read some code on GitHub and Google Code. Pick a few small projects for yourself to push your skills. Good luck!