Technical Advice Needing a Home

I just completed a set of interviews with a company for a technical lead position 1. At the last interview I got a bit quiet at the end which I explained to the interviewer was because my brain has now spun up and I'm starting to think about ways to fix some inefficiencies I heard about in the previous interviews. In each interview I made notes along the way about how I'd fix the issues, and I feel it is a tragedy to not allow part or all of this information to be used by someone 2.

Part of being a technical lead is to champion best practices, spot problems before they happen, and bring all team members into the same consistent vision. This job combines elements of architecture, software development, security, testing, and management into one package.

Here is a list of the solutions to some inefficiencies that I saw at the company. It is in no particular order and it has been cleansed of any PII. Implementing all of these ideas would take about 6 to 12 months of solid work, so including other deliverables that are already promised it may stretch to 1 to 2 years. However, each of these are mostly independent of each other and can either be done in parallel or a phased-in approach. Enjoy!

Monolithic application

After having the application architecture explained to me, I can now reasonably assume that it is classed as a monolithic application. This is a bad thing and it will cause problems sooner rather than later.

Monoliths are bad a scaling with new functionality, as all of the UI and business logic is contained within the same code, so changing one small piece of the UI means you must check the business logic components because that code may have changed. The only way to prevent testing everything, but still keep the same architecture, would be to perform code diffs before every deployment, but nobody likes looking at those.

My advice is to identify the functional components of the application and then break them up into services, creating a service-oriented architecture. If you have a component that takes care of account information, move that to an internal back-office service that can be called via REST. When a change is made to the accounts system, but has no upstream impact, this is a simple deployment whose testing can be compartmentalized and the risk is mitigated. It is even possible to has zero downtime deployments when deploying the right back-end services.

Creating a service-oriented architecture also helps with scaling issues. Using the example above, perhaps the accounts component gets more traffic than expected (it is a central piece after all), it is easy to horizontally scale this service over more machines which can handle the load. In a monolithic application, horizontally scaling the application means you are scaling each component, which may not be what is needed 3.

If horizontal scaling is not the issue, then you can take advantage of caching and put a Varnish server in front of your back-end service, whose job it is to watch for requests of repeated data and not send those requests to the backend/database. I have personally seen request times go from hundreds of milliseconds to single-digit millisecond times with this simple caching layer 4.

Unit testing

I was told that unit testing was no longer performed because the tests became too cumbersome to write and maintain. To be honest, I'm dumb-founded at this revelation. My advice here is to STOP what work is being done and start writing unit tests. I have seen a $400M system go down because the unit tests were basically commented out Java tests (i.e. no tests at all) 5. If this can happen to a system that large, with a dedicated QA team, it can also happen to a small codebase.

There should be a large set of unit tests that run automatically. To do this you can use a tool called guard and add plugins for your language of choice (e.g. PHP developers can use guard-phpunit). This tool will watch for file changes on the developer's local machine and run the unit tests automatically. As this happens on every file change, it is important to keep the entire test suite to a sub-second runtime. Sounds crazy? Use mocking and stubs and it can be done.

The way to sell unit testing to developers is to get them to see unit tests as the first line of defense to test their code. The QA team does not exist to be a development team's safety net, but instead is acting as the development team's customer. Each piece of code committed to the repository should be unit tested locally before it is committed, and then again after it is in the repository. There are no exceptions.

Once code makes it into the code repository's main branch (trunk, master, etc), it is no longer in the development team's hands. That repository should be seen as publicly posted and can be used by anyone. This means the code can now break other people's systems and should be treated as such. To mitigate this risk, install a system like Jenkins to perform automated testing whenever it notices a file change on the repository. If an error is found, an alert is sent to the development team and all work is stopped until a fix is made.

If developers still refuse to unit test the code, the solution is simple: FIRE THEM. They are not developers. They are cowboys.

Integration/Functional testing

The integration testing framework used is called FIT and it is the most obscure testing framework I've seen. While it seems to be working as expected, I worry about the future of it when the codebase gets larger. FIT requires testers to write HTML files manually, but I've seen better efficiency when testers know how to write small programs and use proper testing frameworks like Cucumber 6. Heck, even Fit doesn't appear until the end of page 2 when searching for "fit" on Google.

Since the product is a Web application, I also advise the usage of Selenium in headless mode to automate the clicking through the application. Functional and unit testing is good, but things get weird and non-deterministic when a Web UI is created.

Configuration management

On the systems side of the company, there is a move to use configuration management. I completely applaud this approach. I've seen too many servers that are in a pool yet each one has minute changes to it making them act in random ways.

However, I dislike the choice of CFEngine as the preferred tool. I feel this is behind the times and there are now better, more easier to use frameworks available for configuration management. I suggest looking at Puppet, Chef, or Ansible. My personal favourite is Ansible because it is a simple SSH connection that makes configuration changes in a simple format. Puppet is written in a weird Ruby dialect and its evented nature makes the instructions hard to read over time. As well, Puppet and Chef require a daemon present on the machine and if that daemon dies, you are hooped and have to login manually. That's a fail to me.

Another side to this technology choice is the ease of hiring someone that knows the chosen configuration tool. Most devops people don't know about CFEngine, but they do know about Puppet and Chef (Ansible is new, but gaining in popularity).

Taming CSS Code

It was mentioned that the product's CSS files were growing too large to maintain and making changes in the UI were breaking the CSS (and vice versa). My solution to this is to use SASS, which allows you to modularize your CSS code into manageable pieces.

Switching to SASS from CSS is fairly trivial. For the most part, you can copy and paste the CSS code into a .scss file, run the SASS compiler and it will output valid CSS back to you (unchanged, obviously). From there, you can redefine your compilation processes, refactor into modules, and reduce repeated code.

Redefining your copmilation process is easy with SASS, but it does introduce a separate step in your development and deployment process. It is unavoidable but it works so well that you shouldn't need to worry about it once everyone is trained up. Simply install SASS, then ask it to watch a particular directory for changes to .scss files and compile the changed files to .css files in another directory.

$ gem install sass
$ sass --watch path/to/scss:path/to/css

Refactoring your CSS into modules is easy with SASS because SASS supported the @import "file";, just like you would do in CSS only this imports the file.scss file. In this way the monolithic CSS file can be broken up into components, or per-page files, or whatever you like. The resulting CSS file will be composed of one CSS file containing all of the SASS data, instead of many small files.

You can even reduce repeated code by moving that code into a SASS mixin, which can then be called from any CSS selector like a function. You can even pass in arguments in case small modifications are needed.

For more information about what SASS can do, please refer to the documentation.

SVN and Better Version Control

The use of SVN at the company is probably more legacy than anything else. I have no problem with seeing SVN, and it is a very easy tool for developers to use and to wrap their head around. However, it does pose some problems because every action on the repository is modifying the main company repository and that is risky. One anecdote I heard was that if a piece of code isn't finished by 5pm, then it is not committed into the repository because the change could break and it would block the development team in the morning until the developer who broke it arrives 7. What?!? This problem was solved over 10 years ago.

To solve this you can use developer branches that are used for each feature being worked on. At the end of the day, all code is committed to the branch (working, preferably) and nothing is left on the local machine. In the morning, the developer finishes/fixes the code on the developer branch and then merges it into the main branch. Voila. No issues. But I do have one problem.

Aside from SVN being notoriously bad at merging, when branches are created they must be created on the central repository. In this way you are changing the main repository, which could create corruption or do all manner of nasty things. I prefer the Github approach were everyone has their own repository (locally and also on the Git server). When features are finished and they need to be incorporated into the main company codebase, a pull request is issued and the code is accepted into the company codebase. An audit trail is formed and the central (company) repository is never at risk for corruption.

Local Developer Databases

I understand that the requirement to use Oracle is an external constraint, but it worries me that a developer does not have a local database instance to hack on. Since it is Oracle, I know that each instance is costly, but the solution currently used it definitely broken.

The current solution is to have a shared development database to which all developers connect. That means as your development team grows in size, the probability of two developers using the same resource (be it a table, row, or cell value) grows exponentially. And just think how dirty the data in that development database is going to be without regularly refreshing of the data.

One solution is to create a separate database for each developer. On Oracle, this is known as a "schema". The developers should also have an automated way of wiping their schema and refreshing it with new data from Production (anonymized, of course).

The other solution is to use a different database on your development machines. This means the SQL must be written in a database-agnostic manner, but at least the developers can run something like PostgreSQL locally and test that everything runs fine in the QA environment (which would run against an Oracle database). However, if Oracle hints are being used (and they should be avoided like the plague 8), then this will increase the complexity of testing performance fixes between environments.

"A" Players

So much has been said in the past 5 years about how the IT industry is being overrun by people looking for "gurus", "rockstars", "ninjas" and "A players". One has to see beyond such hype and look at the type of applicant being presented. Are you hiring based on people's egos or do you want someone that genuinely cares about your company and is technically capable of moving your company forward?

Those companies that are hiring "rockstars" may not like all the egos in the room. Or perhaps they all work out perfectly and nothing was wrong, but hiring based on a label given to someone is just a terrible thing to do. It also presents a barrier to other candidates that if they are not "A" material, then they are "B" or "C" material and, ipso facto, that's bad. Lastly, if one looks at statistics, everything ends up resting into a bell curve distribution, so some people will naturally rise to the top, some drop to the bottom and most stay around the middle. In other words, even if you hired all "A Player" people, some will be more "A" than others.

The trait that most senior people (those that have been in the industry awhile) carry is that they realize the more they learn, the less they know. I can look at a systems architecture diagram and spot numerous failure points, but that's only because I have seen those components fail. Before I had that experience, everything looked so easy when doing N-tier architecture. An overabundance of confidence is your first red flag.

One last thing: hiring "rockstar ninjas" makes the mistake of using a relabelled version of stack ranking as your hiring and retention practice. It doesn't work.

Scrum and Agile

Don't worry so much about labeling your software development processes. Even one of the prolific creators of Agile just came out against it 9. Every company's processes are going to be different, but the key is to find your mistakes, learn from them, and then adapt so that said mistakes do not return. Easy peasy.

If a company says they are specifically looking for a Certified ScrumMaster®, then please let them know that all it takes is $1,000 and 2 days of your time. Seriously. Go look up the course schedule and fees. It is considerably more difficult and costly to learn effective software development practices, regardless of whether you are managing or implementing code. Instead, if you find someone that can tell you how a piece of code will fail before they type it on a keyboard, HIRE THIS PERSON NOW.

Footnotes

  1. After four interviews, ten (!) interviewers, providing code samples, and performing reference checks, the company decided to abruptly reject all candidates and instead restart the candidate search (which is very costly for all parties and a waste of 1 month of time). I naively thought I had it locked in, so I'm completely stunned at the way things turned out. It could have been hubris on my part, or maybe not but I've since fallen back to Earth (to be honest, I'm still picking up pieces of me embedded in the pavement). Heh. 

  2. When you get rejected by a company, they cut all communication with you immediately. Heck, even sending a quick "thank you" note to them will not elicit a response. I find this such an odd practice, and I don't like it. I prefer to make friends and, if our two puzzle pieces don't happen to fit perfectly, then that's fine but the friendship and knowledge-sharing should remain. 

  3. For instance, pieces of code that talk to databases will often have connection pools to reuse and keep the number of database connections constant (2 servers, each with 100 connections in their pool equals 200 total database connections open). If you introduce horizontal scaling on code that uses a connection pool, you are increasing the number of connection pools, causing more connections to the database. I have seen applications overload a database because nobody remembered to retune the connection pool limits after new servers were added. How databases fail when connections exhaust the memory is non-deterministic. 

  4. Yes, proper cache invalidation is key, and it can be difficult, but at the point where caching is needed one should already be thinking about how it can be properly invalidated by the application layer. 

  5. The test methods were present in the codebase, but the content of each test was simply //assertTrue(true);. Why? Because the company had tools to check that tests existed, but did not check for test coverage. So the developer who wrote this was able to get his code accepted, and didn't have to write any tests. 

  6. I'm using Cucumber as an example. It is not the right fit for some organizations, but I have seen some places use it very well and gain a lot of benefits from it. You can't beat Cucumber if your goal is to write tests using English sentences. 

  7. Just like with unit testing, every decision in life is about assessing risk and then acting upon the decision. Just because you can drive at 200 km/h without crashing doesn't mean that it lowers the risk of a crash. As well, saving uncommitted code to a local computer to be committed the next day doesn't seem like it is posing a risk until someone doesn't make it into the office in the morning. 

  8. Oracle hints should be avoided because they are black magic. You are specifically telling the database engine's query planner, "I know you want to execute the query in this manner, but do this instead." Trying to work out performance issues between environments with different sized data sets is next time impossible once you start using hints. Also, hinting can be like a drug where you get addicted to all the power you wield on the systems and, as always, power corrupts

  9. Ha! I was right all along! I endured years of naysayers and derisive comments about my worth as a software developer when I expressed my views on the bastardization of Agile practices.