A Refactoring Story in Four Parts

I recently tweeted about how people new to programming, or companies that look for candidates and give them programming tests, don't understand that the first code one writes is just a draft. It can take several edits (so-called "refactors") to tease out a good, workable, testable, and ultimately simple solution. So here's a story of how I recently refactored my code 3 times before arriving at a final solution. This happened over the span of a day as I thought about and shaped the code into a format that can be easily tested and understood by others. The code is from a private project, so I am only presenting snippets here and some of the package names and variables have been changed for privacy.

I start at Version 0, which is that I just wanted to get code written that compiled and fleshed out the business logic.

Version 0

package main

import (
  "log"
  "os"
)

type CliEnv struct {
  CustomerId        string
  DynamoDBTableName string
  DomainName        string
}

func main() {
  env := CliEnv{
    CustomerId:        os.Getenv("CUSTOMER_ID"),
    DynamoDBTableName: os.Getenv("DDB_TABLE_NAME"),
    DomainName:        os.Getenv("DOMAIN_NAME"),
  }

  if err := svc.Process(env); err != nil {
    log.Fatalln(err)
  }
}

This version just does the bare minimum and doesn't have any error-checking, unless that job is punted to the svc.Process() method (which it isn't). So the first refactor is that the CliEnv struct contents need to be validated.

Let's do that in Version 1.

Version 1

... previous code omitted ...

func (e CliEnv) Validate() error {
  if e.CustomerId == "" {
    return errors.New("Missing CustomerId")
  }

  if e.DynamoDBTableName == "" {
    return errors.New("Missing DynamoDBTableName")
  }

  if e.DomainName == "" {
    return errors.New("Missing DomainName")
  }

  return nil
}

This is fairly straightforward for validating the a struct's contents, but it has a few problems with it:

The last issue was what bothered me the most because I like to be informed of all the issues during validation, not just the first. The other issues are minor and can be solved at any time. But being that I like to tackle easy things first, let's fix the small issues.

Version 2 lifts up all magic strings from the various parts of the file and pulls them into global variables so they can be named correctly and tests can reference the specific error that was thrown instead of performing string matching.

Version 2

... previous code omitted ...

import (
  "errors"
  "fmt"
  "log"
  "os"
)

const (
  EnvCustomerId string        = "CUSTOMER_ID"
  EnvDomainName string        = "DOMAIN_NAME"
  EnvDynamoDBTableName string = "DDB_TABLE_NAME"
)

var (
  ErrMissingCustomerId        = errors.New(fmt.Sprintf("Missing env var: %s", EnvCustomerId))
  ErrMissingDomainName        = errors.New(fmt.Sprintf("Missing env var: %s", EnvDomainName))
  ErrMissingDynamoDBTableName = errors.New(fmt.Sprintf("Missing env var: %s", EnvDynamoDBTableName))
)

type CliEnv struct { ... }

func main() {
  env := CliEnv{
    CustomerId:        os.Getenv(EnvCustomerId),
    DynamoDBTableName: os.Getenv(EnvDynamoDBTableName),
    DomainName:        os.Getenv(EnvDomainName),
  }

  if err := svc.Process(env); err != nil {
    log.Fatalln(err)
  }
}

func (e CliEnv) Validate() error {
  if e.CustomerId == "" {
    return ErrMissingCustomerId
  }

  if e.DynamoDBTableName == "" {
    return ErrMissingDynamoDBTableName
  }

  if e.DomainName == "" {
    return ErrMissingDomainName
  }

  return nil
}

This change removes any syncing issue future developers will need to do between the name of the environment variables used in the os.Getenv() method call and the use of them in the ErrMissing... declaration. Also, the magic strings are gone and specific errors can be checked during testing.

Now let's tackle the problem with the Validate() method only returning one validation error at a time.

Version 3

... previous code omitted ...

func (e CliEnv) Validate() error {
  var miss []string

  if e.CustomerId == "" {
    miss = append(miss, EnvCustomerId)
  }

  if e.DynamoDBTableName == "" {
    miss = append(miss, EnvDynamoDBTableName)
  }

  if e.DomainName == "" {
    miss = append(miss, EnvDomainName)
  }

  if len(miss) > 0 {
    msg := fmt.Sprintf("Missing env var: %s", strings.Join(miss, ", "))
    return errors.New(msg)
  }

  return nil
}

This version will now return all the validation errors on the CliEnv struct, but my refactoring was fairly naive in its implementation. I decided to use strings, which meant I had to append to a list of strings for each validation error found, then perform a count on that list to see if an error should be produced. I lost the ability to use the ErrMissing... variables I declared previously, which reduces the effectiveness of my test cases, and this whole implementaton left me feeling like I'm starting to leak the abstraction of what a validation error really is since I've reduced my code to manipulating strings. And I've accidentally introduced another magic string for the resulting error message, which I absolutely do not want since I am validating structs all over the codebase and this magic string and logic will be duplicated multiple times.

Then it hit me. I remember Rob Pike mentioning in his Go Proverbs talk that "errors are values" and that Go programmers should use them just like any other variable. I realized that I had been writing my Go code like Java, Python, and Ruby where errors (exceptions) are a condition that breaks the flow of the program into an error state. Thus, I decided to take Rob's advice and create a container to hold the validation errors. The result is Version 4 and it combines the best of Version 2 and Version 3.

Version 4

package main

import (
  "errors"
  "fmt"
  "log"
  "os"
)

const (
  EnvCustomerId string        = "CUSTOMER_ID"
  EnvDomainName string        = "DOMAIN_NAME"
  EnvDynamoDBTableName string = "DDB_TABLE_NAME"

  errValidationMsg string = "Validation error: "
)

var (
  ErrMissingCustomerId        = errors.New(fmt.Sprintf("Missing env var: %s", EnvCustomerId))
  ErrMissingDomainName        = errors.New(fmt.Sprintf("Missing env var: %s", EnvDomainName))
  ErrMissingDynamoDBTableName = errors.New(fmt.Sprintf("Missing env var: %s", EnvDynamoDBTableName))
)

type CliEnv struct { ... }

func main() {
  env := CliEnv{
    CustomerId:        os.Getenv(EnvCustomerId),
    DynamoDBTableName: os.Getenv(EnvDynamoDBTableName),
    DomainName:        os.Getenv(EnvDomainName),
  }

  if err := svc.Process(env); err != nil {
    log.Fatalln(err)
  }
}

func (e CliEnv) Validate() error {
  var v ValidationError

  if e.CustomerId == "" {
    v.Add(ErrMissingCustomerId)
  }

  if e.DynamoDBTableName == "" {
    v.Add(ErrMissingDynamoDBTableName)
  }

  if e.DomainName == "" {
    v.Add(ErrMissingDomainName)
  }

  return v.Result()
}

type ValidationError struct {
  errors []error
}

func (v ValidationError) Error() string {
  msg := []string{errValidationMsg}
  for _, i := range v.errors {
    msg = append(msg, i.Error())
  }

  return strings.Join(msg, "\n")
}

func (v ValidationError) Has(e error) bool {
  for _, i := range v.errors {
    if i == e {
      return true
    }
  }

  return false
}

func (v *ValidationError) Add(e error) {
  if v == nil {
    return
  }

  v.errors = append(v.errors, e)
}

func (v ValidationError) Result() error {
  if len(v.errors) > 0 {
    return v
  }

  return nil
}

In this version I created a ValidationError struct and made it contain the right methods to be used by the error interface, thus I can return this struct when my method returns error. The conditional in Validate() about whether to display the error or not has been removed from the caller through the use of the Result() method, because it is a detail of the ValidationError struct to decide whether a validation error has occurred, not the caller. Lastly, the creation of the Has() method ensures that test cases can lookup which errors occurred in the ValidationError struct to check that the right errors are being caught.

Summary

One thing to realize about refactoring is that it is a subjective exercise, but one that every programmer has to do regardless of their experience level. I still write terrible first drafts of code, but over the course of hours or days I refine and refactor the code into something I am proud to show to others. But subjectiveness is the key here.

One person may like constants strewn about, others may not. So if you read this article and disagreed with some of my edits, that's okay because you have your way of doing things. And maybe I missed something along the way. That's where peer review comes in; another person's perspective and experience can help make code even better. However, the first perspective you need is your own and you can do some of the initial refactorings yourself.

Happy coding!