An aim for a development team (and company) is to achieve one button deploys of your applications to all environment.
By achieving that it involves several solutions which in total results several direct and indirect benefits for the company. Benefits such as:
- Quicker releases of new features
- Frequent releases of new features
- Quick and painless rollback of unwanted releases
- Reliable release process
- Confidence in testing in different environments
- Reduced risk in deployments
- Predictable roadmaps with more stable velocities
- Less context switching for ops and developers
- Developer time focused on delivering features instead of on processes
I joined as a Tech Lead at my current company last year. Since then my team, or more appropriately our fantastic Ops/Devops, Tools as well as other teams, have in a year made great strides towards One button deploys. We are more or less there and the company is rewarded more and more with the benefits mentioned above. We have achieved this by doing the following since last year:
Move to Git and internally hosted repository server
Subversion was once great but no longer. Whilst we used git-svn for a while, we eventually fully moved all applications’ source code to Git.
With the addition of first Stash then GitLab we removed a lot of bottlenecks, restrictions and frustrations. Code could be easily referenced without having to check out. Collaboration via Pull requests for code bases you may not be comfortable with. Code review when needed. Forking and creating new projects became instant. Project discovery and general code share became very very easy.
External configuration and one binary
The main applications in my team used to use Maven profiles to build separate binaries for each environment. This was because the configuration was baked in via Maven filters.
This is not great for QA sign off as you basically have different binaries. It also slows down deployment as you have to rebuild the binary all the time. It definitely makes it difficult to rollback releases or investigate older releases as you have to rebuild an older version from scratch. And it makes it very fragile as each machine might build it differently (different JDK, broken local Maven repositories, etc)
Moving to external configuration meant we had one binary that gets promoted through all environment. The one binary gets upload to the repository manager (initially Nexus then later Artifactory). This binary is then downloaded as part of every deploy job.
Changing configuration is a configuration change and no change to the binary.
Configuration in source control and rolled out with Puppet
With external configuration we enhanced reliability by adding these configurations into source control. This also meant configuration change became very easy. Push changes for the environment to Git and it was then automatically sync and rollout via Puppet. This removed a very annoying and typo prone bottleneck.
Production configuration changes are not activated automatically but are rolled out via a single command.
Process automation, Teamcityfy everything...
We aimed to automate as many processes as possible. Our continuous integration server, TeamCity, have a number of automated builds and a whole range of manual jobs.
Testing and building a binary automatically on every check in. Automatic deployment to development servers on successful builds.
- one button jobs to deploy that binary to other environments
- one button to tag a release
- one button to push release binaries to environments
- one button to create bugfix branch
- one button to rebuild databases
- one button to migrate database schemas
- one button to restart servers
- one button to test 3rd party APIs and environments
- one button to run acceptance tests
- one button to smoke test environments
- one button to trigger load tests (Gatling)
We did use Fabric directly for deploy tasks but Fabric is now triggered from within TeamCity jobs. Deployinator was nice but we phased it out in preference for all deploys in TeamCity for every environment.
Test separation
There was already an extensive JUnit suite of tests in the applications my team was responsible for. However they were a mix of integration tests masquerading as unit tests which slowed down development and feedback loops.
We relabeled unit test that required frameworks (Spring, Hibernate, etc) or databases as integration tests instead. That reduced our feedback loop time.
We added separate verification builds for just integration test. We separated and migrated 3rd party test to 3rd party libraries. We added stand alone acceptance and smoke tests using Cucumber, Specs2 and Selenium.
Test data
We wrote command line and web applications with Node.js and Spray that quickly create test data and tools that review test data. This sped up development testing and proper QA testing.
3rd party mock applications
Mocking out 3rd party systems or even some internal systems entirely by creating test harness applications in integration environments that pretend to be those systems have speeded up our QA process a lot.
Removed restrictions
Restricting who can commit, run certain jobs or see certain data might sound like a sensible option, but in reality it slows down collaboration. We decided we just trust the majority more than we distrust a minority.
A great benefit was that we opened up all Git repositories to everyone. Anyone can commit, although people less involved in a project prefer to use Pull requests.
Team City jobs can be run by anyone. There is an audit trail, but there has never been an issue of anyone ran a job they should not have. It is a great help if people are in meetings or similar that anyone can push code out to the next environment for example.
Production data for some applications due to PCI and CRB restrictions, production credentials and some production jobs for our critical applications are still restricted but we try to minimise this as well.
Production data for some applications due to PCI and CRB restrictions, production credentials and some production jobs for our critical applications are still restricted but we try to minimise this as well.
Feature toggles
Whilst it is a good practice to keep these toggles to a minimum, adding feature toggles that can be overridden via external configuration has been a good change. We can now quickly disable broken new features. We can dark release features or we can canary release features, then via Git & Puppet enable the feature and it becomes available for all.
Content migration tools
Instead of applying content directly and manually as SQL scripts, or scp/ftp/rsync we started writing tools in Play! to help create data sets of new content, then promote those through environment, cross checking which environment the data is in.
Adding scripts that interact with 3rd party portals was also helpful.
This avoid typos, avoids forgetting to run a script in an environment. And greatly speeds up data migration.
Database migration
Many of our applications use NoSQL solutions such as Cassandra and Redis as some of our flows have to handle millions of interactions, but the core data are still mostly in PostgreSQL.
Whilst we use Flyway and DBDeploy to migrate those database schemas and stub data to some environments, we do not use it all the way to production. This is one area we need to improve.
Database migration
Many of our applications use NoSQL solutions such as Cassandra and Redis as some of our flows have to handle millions of interactions, but the core data are still mostly in PostgreSQL.
Whilst we use Flyway and DBDeploy to migrate those database schemas and stub data to some environments, we do not use it all the way to production. This is one area we need to improve.
Environment creation
One element that is important and which our Ops team is just starting to roll out is an internal PAAS/IAAS solution. One button to create a new VM environment or one button to create a database, AWS SQS queue etc. Further enhancements, such as one button to clone an existing VM or database will be nice.
No button
Obviously there were other enhancements that is not really related to "One button", such as phasing out old legacy Java applications with newer Scala applications by applying Strangler Application pattern, replacing Quartz based batch jobs with Akka and Camel, measuring new features effect with AB testing, monitor applications metrics with Graphite, log analysis via Logstash, etc.
Company profitability
All these combined has made certain part of our development and release process so easy and quick. I am sure we have covertly increased the profitability of our company as we now can release quicker, more frequently, with less broken releases or bugs in general.
Future