Saturday, 1 February 2014

Paper Cuts and Broken Windows

It is the little things that matter


We are often reminded of the facts of keeping focus, deliver a minimum viable product, lean startup principles, only invest in what returns business value and so on. Whilst all these are definitely true, but to some extent they are also false.

Delivering business value, concentrating on one task is a virtue I do insist on, however we should not abandon all other tasks and common sense. Secondly it is actually very important we do the low priority and the little ROI tasks as well. Why? Because of Paper Cuts and Broken Windows.   

Note: This may apply to other professions and aspects of life in general but I am only thinking of IT development projects.


TL;DR: Fix your bugs asap, evolve systems continually, keep code clean and take pride in it



The problem: Paper Cuts


Many little bugs that in total hurts a lot.

A paper cut is a simple bug or an outstanding minor feature that is deemed not important enough to fix/implement right now. Some may have best intentions of implementing it soon/eventually but we all know from experience that it is most likely to never happen. 

Over time these paper cuts adds up, soon everything you touch is already full of bugs, painful procedural steps or legacy code, and in the end your team and project may start to venture close to death by a thousand paper cuts or at least feel like it. 


The problem: Broken Windows


If something is already broken, it does not matter if you break something else

Similar to paper cuts, broken windows is when a system/project/code base has some or many things wrong with it so that the current maintainers no longer feel any pride in it. It is then simple to not fix things, to ignore bugs, poor code etc.

If a continuous integration server notification of a broken build or a Nagios alert occurs for that system, people are more concerned about getting rid of the notification than actually fixing the root cause.

In an application suffering from broken windows any new feature will not be done to their best possible standard or refactored for cleaner more maintainable code.  It will not be fully tested, in fact may introduce a few known bugs and limitations that will be ignored.

Also if some epics are never completely finished, it may leave an impression that you do not need to finish stories and epics properly and as such broken windows of missing features start to be an accepted practice.



Combined problem


Essential paper cuts and broken windows are about the same. Paper cuts are how much it is hurting you by slowing you down and making people resistant to do anything with it. Broken windows is about lack of pride and willpower to make an application better as it feels already partially broken.

Or worse, if your customers start to have the same paper cuts and broken windows experience then they will simply stop using the application or even your company.



Derivative problem: The costly rewrite


A system suffering from Paper Cuts & Broken Windows will eventually be so despised by the developers that they will campaign for it to be rewritten from scratch. Which leads to a long period of delivering no business value, just pure expense. Refer to my post: Do Not Rewrite

But the new system will eventually also quickly suffer from paper cuts and broken windows if the processes are the same.



Solutions?


There is no right and wrong way to do prevent this. Eventually it is inevitable that a system will suffer from paper cuts and broken windows. However there are several ways to minimise the risk of it happening quickly, increase morale to reduce its spread and significantly delay the rot until the system has been used for so long that there is so little business value left and that it can be shut down.



Prevent death by paper cuts


Fix bugs ASAP


You cannot fix every bug and implement every feature as you will be trying to reach the impossible perfection which would cost an unlimited amount of time and money.

However the cost (in time and therefore money) of fixing a bug increases by magnitudes the longer you leave it, due to the costs of context switch, environment setup, remember data model/logic flow, knowledge share etc. A bug fixed straight away due to a very short feedback loop is invaluable, a bug fixed the same day or next day does take a little time but not much at all. Any longer than that it does become a real exponential costly tech debt.

So try and just fix bugs straight away or at least whilst still on the same specific task. It will make life so much more comfortable. Pushing out buggy features should be deplored.

For minor tasks and bugs that are still skipped or found at a later date they will still cause paper cuts.  A thorough automated and manual QA procedure will reduce this but they will still occur. 


Fix old bugs continually


However I have found continually picking up a few of these left over bugs and tasks will slowly reduce the amount of them on the backlog and therefore help prevent too many paper cuts.

When finishing a large complex story, try to pick up 1 or 2 bugs and minor tech debts before the next big story is started. If you finish another feature an hour before lunch, meeting or end of the day then instead of half starting another story fix a quick bug instead. This continuous self healing done in what is usually mostly lost productivity time will be invaluable over time. It is a beneficial minor procrastination.

If it turns out that bug/feature is a much larger task than you have time for now then simply just add that information to your issue tracker so that at least you have done some backlog grooming that makes it future planning easier. It may then be a valid candidate for one of the few paper cuts that is never fixed, but at least now with more data.


Development principles


Another way of preventing paper cuts is obviously to never introduce them in the first place. A bug free system does not exist, but you can reduce the frequency and impact of them by applying good procedures, architecture and code style. 

Design a system as simple as possible and writing code that is clean ie KISS and SOLID principles, so that a method, class, application or system only does one thing and is very easy to understand. This significantly reduces the risk of introducing unwanted secondary effects and greatly enhances the ease of maintaining it. 

Applying functional programming techniques such as avoiding mutability of objects and removing state where it is not needed will also reduce risk of unwanted secondary effects or architecture issues when scaling a system.

Designing systems and features using TDD reduces risk of bugs, ensures test coverage and also avoids implementing unnecessary features.

Introducing continuous integration delivers a very quick feedback loop so secondary effects are found whilst the developer are still aware of the context. Avoiding time spent on unintegrated feature branches is also advisable (ie max hours, never days). 


Prevent lack of pride due to Broken Windows


Reducing paper cuts will prevent the broken windows feeling. Applying the principles mentioned above (KISS, SOLID, immutability, TDD, CI, etc) will reduce the risk of broken windows. And by insisting on the same principles of good code standards and sensible processes people will take more pride in their work.

But broken windows will happen. Due to unpreventable technology evolution, staff turnover and lack of knowledge transfer or just plain mistakes. 



How do you fix these “windows” and when?


However windows can be fixed. And with more and more windows fixed people take pride in their systems and their work again.

Embrace automation and continuous delivery. If most parts of the process of maintaining a project is automated then any barrier to take the time and energy to fix a broken window will be very low and much more likely to happen voluntarily. 

I wrote a post about a project I was on where we transformed most of what we routinely did into one button clicks on our CI server. The returned value of that investment was great, most tasks was no longer a chore, removed many bottlenecks, the risk of process typos or forgotten steps was minimised and business value feature delivery time was reduced immensely.

A good practice is to apply the same principle of picking up and fixing a broken window issue as you do with bugs, ie as soon as possible or just after finishing another story.

Another practice is to fix issues as part of story, especially if it touches the same areas. If for example a deploy often fails on a server, when you deploy to that server just fix it properly. Do not continue with your head in the sand. 

With these you will slowly mend your most of your broken windows.


Little known time sinks  


It is the little used and little known applications and features that becomes major time sinks when changes are needed. And as such is also the parts the team will avoid to fix. Try to prevent these from becoming this by always evolving architectures and killing features.



Evolve systems


Create your architecture from the start or bit by bit of an existing system into modularised projects and components. That way you can refactor and improve one small part at the time without big and long costly rewrite committal periods of no business value, as detailed in Do Not Rewrite

Do not leave old systems to rot with outdated technologies and ever shrinking pool of possible maintainers. Keep evolving it, especially their integration points by for example by applying Micro Service Architecture and/or Strangler Application pattern.



Trim the fat, kill features


If an existing feature is no longer really needed, then be quick to kill it. Euthanasia to unimportant or historic features will make maintenance much easier, reduce risk and speed up delivery of new business value features.

Kill entirely an old application if nearly all its original features have already been migrated away or removed.



Bandwidth


To be able to do this you need the time to do this.

You need a strong tech lead to insist on the team to fix bugs and remove tech debt continuously, and can protect his team if they are pressured to cut corners.

You need non blinkered, smart product owner and/or project manager that understands the long term value of not doing the short term features only. A PO/PM that values the velocity of feature counts only and accepts bandwidth is also used on tech debt to achieve this. The PO/PM should not micromanage tasks and tech debt, only priorities at the epic and story level.

Henrik Kniberg’s Lean from the Trenches book describes the value of feature count above any other metric, and whether you solve tech debt (or not) and how much is up to the team and no-one else.



Summary


To reiterate the TL;DR at the top: Fix your bugs asap, evolve systems continually, keep code clean and take pride in it.