A Refactoring Story in Four Parts
I recently tweeted about how people new to programming, or companies that look for candidates and give them programming tests, don't understand that the first code one writes is just a draft. It can take several edits (so-called "refactors") to tease out a good, workable, testable, and ultimately simple solution. So here's a story of how I recently refactored my code 3 times before arriving at a final solution. This happened over the span of a day as I thought about and shaped the code into a format that can be easily tested and understood by others. The code is from a private project, so I am only presenting snippets here and some of the package names and variables have been changed for privacy.
I start at Version 0, which is that I just wanted to get code written that compiled and fleshed out the business logic.
Version 0
package main
import (
"log"
"os"
)
type CliEnv struct {
CustomerId string
DynamoDBTableName string
DomainName string
}
func main() {
env := CliEnv{
CustomerId: os.Getenv("CUSTOMER_ID"),
DynamoDBTableName: os.Getenv("DDB_TABLE_NAME"),
DomainName: os.Getenv("DOMAIN_NAME"),
}
if err := svc.Process(env); err != nil {
log.Fatalln(err)
}
}
This version just does the bare minimum and doesn't have any
error-checking, unless that job is punted to the svc.Process()
method
(which it isn't). So the first refactor is that the CliEnv
struct
contents need to be validated.
Let's do that in Version 1.
Version 1
... previous code omitted ...
func (e CliEnv) Validate() error {
if e.CustomerId == "" {
return errors.New("Missing CustomerId")
}
if e.DynamoDBTableName == "" {
return errors.New("Missing DynamoDBTableName")
}
if e.DomainName == "" {
return errors.New("Missing DomainName")
}
return nil
}
This is fairly straightforward for validating the a struct's contents, but it has a few problems with it:
- it feels like a stream of conscious
- it has magic strings that are hard to reach; and worst of all
- it will only report one error back to the caller at a time.
The last issue was what bothered me the most because I like to be informed of all the issues during validation, not just the first. The other issues are minor and can be solved at any time. But being that I like to tackle easy things first, let's fix the small issues.
Version 2 lifts up all magic strings from the various parts of the file and pulls them into global variables so they can be named correctly and tests can reference the specific error that was thrown instead of performing string matching.
Version 2
... previous code omitted ...
import (
"errors"
"fmt"
"log"
"os"
)
const (
EnvCustomerId string = "CUSTOMER_ID"
EnvDomainName string = "DOMAIN_NAME"
EnvDynamoDBTableName string = "DDB_TABLE_NAME"
)
var (
ErrMissingCustomerId = errors.New(fmt.Sprintf("Missing env var: %s", EnvCustomerId))
ErrMissingDomainName = errors.New(fmt.Sprintf("Missing env var: %s", EnvDomainName))
ErrMissingDynamoDBTableName = errors.New(fmt.Sprintf("Missing env var: %s", EnvDynamoDBTableName))
)
type CliEnv struct { ... }
func main() {
env := CliEnv{
CustomerId: os.Getenv(EnvCustomerId),
DynamoDBTableName: os.Getenv(EnvDynamoDBTableName),
DomainName: os.Getenv(EnvDomainName),
}
if err := svc.Process(env); err != nil {
log.Fatalln(err)
}
}
func (e CliEnv) Validate() error {
if e.CustomerId == "" {
return ErrMissingCustomerId
}
if e.DynamoDBTableName == "" {
return ErrMissingDynamoDBTableName
}
if e.DomainName == "" {
return ErrMissingDomainName
}
return nil
}
This change removes any syncing issue future developers will need to do
between the name of the environment variables used in the os.Getenv()
method call and the use of them in the ErrMissing...
declaration. Also,
the magic strings are gone and specific errors can be checked during
testing.
Now let's tackle the problem with the Validate()
method only returning
one validation error at a time.
Version 3
... previous code omitted ...
func (e CliEnv) Validate() error {
var miss []string
if e.CustomerId == "" {
miss = append(miss, EnvCustomerId)
}
if e.DynamoDBTableName == "" {
miss = append(miss, EnvDynamoDBTableName)
}
if e.DomainName == "" {
miss = append(miss, EnvDomainName)
}
if len(miss) > 0 {
msg := fmt.Sprintf("Missing env var: %s", strings.Join(miss, ", "))
return errors.New(msg)
}
return nil
}
This version will now return all the validation errors on the CliEnv
struct, but my refactoring was fairly naive in its implementation. I
decided to use strings, which meant I had to append to a list of strings
for each validation error found, then perform a count on that list to
see if an error should be produced. I lost the ability to use the
ErrMissing...
variables I declared previously, which reduces the
effectiveness of my test cases, and this whole implementaton left me
feeling like I'm starting to leak the abstraction of what a validation
error really is since I've reduced my code to manipulating strings. And
I've accidentally introduced another magic string for the resulting error
message, which I absolutely do not want since I am validating structs all
over the codebase and this magic string and logic will be duplicated
multiple times.
Then it hit me. I remember Rob Pike mentioning in his Go Proverbs talk that "errors are values" and that Go programmers should use them just like any other variable. I realized that I had been writing my Go code like Java, Python, and Ruby where errors (exceptions) are a condition that breaks the flow of the program into an error state. Thus, I decided to take Rob's advice and create a container to hold the validation errors. The result is Version 4 and it combines the best of Version 2 and Version 3.
Version 4
package main
import (
"errors"
"fmt"
"log"
"os"
)
const (
EnvCustomerId string = "CUSTOMER_ID"
EnvDomainName string = "DOMAIN_NAME"
EnvDynamoDBTableName string = "DDB_TABLE_NAME"
errValidationMsg string = "Validation error: "
)
var (
ErrMissingCustomerId = errors.New(fmt.Sprintf("Missing env var: %s", EnvCustomerId))
ErrMissingDomainName = errors.New(fmt.Sprintf("Missing env var: %s", EnvDomainName))
ErrMissingDynamoDBTableName = errors.New(fmt.Sprintf("Missing env var: %s", EnvDynamoDBTableName))
)
type CliEnv struct { ... }
func main() {
env := CliEnv{
CustomerId: os.Getenv(EnvCustomerId),
DynamoDBTableName: os.Getenv(EnvDynamoDBTableName),
DomainName: os.Getenv(EnvDomainName),
}
if err := svc.Process(env); err != nil {
log.Fatalln(err)
}
}
func (e CliEnv) Validate() error {
var v ValidationError
if e.CustomerId == "" {
v.Add(ErrMissingCustomerId)
}
if e.DynamoDBTableName == "" {
v.Add(ErrMissingDynamoDBTableName)
}
if e.DomainName == "" {
v.Add(ErrMissingDomainName)
}
return v.Result()
}
type ValidationError struct {
errors []error
}
func (v ValidationError) Error() string {
msg := []string{errValidationMsg}
for _, i := range v.errors {
msg = append(msg, i.Error())
}
return strings.Join(msg, "\n")
}
func (v ValidationError) Has(e error) bool {
for _, i := range v.errors {
if i == e {
return true
}
}
return false
}
func (v *ValidationError) Add(e error) {
if v == nil {
return
}
v.errors = append(v.errors, e)
}
func (v ValidationError) Result() error {
if len(v.errors) > 0 {
return v
}
return nil
}
In this version I created a ValidationError
struct and made it contain
the right methods to be used by the error
interface, thus I can return
this struct when my method returns error
. The conditional in
Validate()
about whether to display the error or not has been removed
from the caller through the use of the Result()
method, because it is
a detail of the ValidationError struct to decide whether a validation error
has occurred, not the caller. Lastly, the creation of the Has()
method
ensures that test cases can lookup which errors occurred in the
ValidationError struct to check that the right errors are being caught.
Summary
One thing to realize about refactoring is that it is a subjective exercise, but one that every programmer has to do regardless of their experience level. I still write terrible first drafts of code, but over the course of hours or days I refine and refactor the code into something I am proud to show to others. But subjectiveness is the key here.
One person may like constants strewn about, others may not. So if you read this article and disagreed with some of my edits, that's okay because you have your way of doing things. And maybe I missed something along the way. That's where peer review comes in; another person's perspective and experience can help make code even better. However, the first perspective you need is your own and you can do some of the initial refactorings yourself.
Happy coding!