8 Meaningful DevOps Metrics You Should Trust
Recently we described DevOps and the tools you can use to optimise business processes in your company. Today, I am going to explain how to measure the success of the DevOps approach. In this article, I will cover the metrics you can trust to evaluate the success of DevOps usage in your company. It is important to understand that optimising any of the following metrics can help you to make releases faster and create products your clients will love. What could be more important to any business?
In simple terms, lead time means time-to-market. Time-to-market consists of Process time ( time to make changes) and Queue time (time to wait in a queue). It is a key DevOps metric, measuring from start of work to production. The shorter the time-to-market, the higher your chance to offer relevant products to the market.
Simply put, deployment frequency means how often a new version of your product is released to the client or how often you could apply bug-fixes. There are situations where developers write a lot of code, but all the changes reach clients several months later. The main problem of this approach is the lack of meaningful feedback from the client and the potential threat that all these changes will not be relevant to your client. Through frequent deployments, you can get client’s feedback faster, quickly deliver the necessary features and turn off non-relevant ones. Otherwise, you create your product not according to your customers’ needs but based on your own perceptions.
For example, let’s look at the outdated “waterfall” methodology. When a company has a development plan for one or two years, and delivers the product to the client in two years. As a result, it is not clear whether the client needed the product with such characteristics or it has been too long and the product is no longer relevant for the company. Infrequent deployments can lead to worsening of the feedback process and potentially the quality of the product. That’s why checking how regularly you do deployments is a valuable DevOps metric. This metric can help you to find bottlenecks and check how the structure of your DevOps team performs.
If the size of your deployment is small (so that you have few modifications to apply) it will be easier to test and release the code, track how many stories, feature requests, and bug fixes are being deployed. By making your deployments smaller you can reduce the implementation time. The less code you need to deploy the easier the process will be. In every team there may be several developers who work on different features which could affect different parts of the system. There are situations when changes in the code of one developer can rewrite changes of another developer. When your release is big, nobody can guarantee its stability because you never know where the problem can appear. That’s why it is easier to make small releases to keep everything under control. Making only bug-fixes or releasing micro-features is perfect and the most secure scenario. Imagine how high the risk is of missing errors while testers are verifying a big release.
Your deployment time shouldn’t be as long as eternity. When the company creates the product it is important to deploy new versions regularly. Without automation, these tasks fall on the shoulders of the system administrator, whose time is usually very expensive. Sometimes the developer asks to deploy the new release which should only take 30 minutes. But if the system administrator needs to deploy four releases per day it takes 2 business hours. Actually, you can automate this process by using such tools like Ansible, Gitlab and Jenkins, which can carry out these tasks fast without human intervention. Checking this metric will help you identify which problem you spend a lot of time on and choose more efficient tools. All deployments should be automated, which in turn will save time and the nerves of your system administrators.
Automated tests pass percentage
Every version of the program that is released may include quite a lot of changes and the main idea of DevOps is to release versions more frequently and automate the release process. Any changes should be tested. Manual testing is time-consuming and inefficient. In this situation, autotests could be a great alternative. Autotests can check the basic functionality according to the checklist, which includes the most important things that should work in the program. Normally the checklist includes 50-100 tests and the percentage of success of these tests shows the stability of the program. It is very important to run autotest after every deployment to check stability and rollback if need be.
After the application is deployed it is very important for the server to respond correctly. The error rate can be constantly monitored using tools like Zabbix or Prometheus, the detected errors can be stored in a database and displayed using Grafana dashboards. Often increased error rate can be caused by traffic spikes and can indicate the slowness of your server.
Mean time to detection (MTTD)
This metric indicates the average time to detect an error. Imagine the situation, that after deploying a new version of the program, your client detects a critical error which he only notices two hours after it happened. You have lost two hours. It is essential to check this metric for high-quality service to identify the errors quickly and fix them.
Mean time to recovery (MTTR)
While creating a product, failures are inevitable. If something goes wrong you need to recover or rollback the system. How long will it take to recover to the original working state? The answer to that question is the mean time to recovery metric. Normally it is measured in business hours. Docker and the micro-services concept could help your business to have a shorter MTTR.