Put your metrics where your tests are
TL;DR
Experiment showed that I spend about 20% of time on coding tests, but experience tells me that I actually save time on testing as compared to doing it manually
Extra time is well worth it in the environment I work in but are there ways to use the metrics to improve?
There is a way to determine time spent on coding automated tests vs features
Sometimes tests can be a sore subject. I am a very devoted advocate of tests in development life cycle. Others not so much. To me, a good testing suite means reliability of code, safety net during refactoring, less errors in production. To others it is time slowing down development, frustration with failing code, increased complexity of development of life cycle that comes from adding one more step to the development flow.
In the summer of 2015 after a discussion with a coworker on this topic, a question popped up in my head - how much does my devotion to tests really take away from coding product features? I wanted numbers. So I decided to take a journey which would show me just how much extra time I spend writing feature code vs. time that I spend on writing and maintaining tests. I had never done this so this will be a good self analysis.
Methodology:
Tools:
I modified Vim plugin BufTimer to save a report of time spent in each file (buffer really, but it doesn’t matter for this discussion) periodically.
Assumptions:
To make it more manageable, I only considered Ruby code and ignored other things that I often work on (configuration files, JavaScript, CSS)
I only counted the time spent editing code - actually running tests, testing the product in the browser, etc cannot be measured reliably (yet). I’m open to ideas how to do that since that would be a real way to measure how tests affect productivity.
Code updates made to accommodate for testing could not be distinguished. I considered this minor but this may be a more significant piece for those new to testing.
Precision
BufTimer is not perfect in terms of getting the amount of time spent editing files but it doesn’t discriminate against any type of file
Calculations
I configured BufTimer to spit out files with reports based on vim pid and date.
I created a simple ruby script that would tally up all the numbers https://github.com/ilyakatz/perfectforloop/blob/master/test_time.rb and save into the a csv file.
Number are grouped by vim session which could last a few minutes or days
I started collecting data on Sept 4, 2015 and finished the first experiment on Oct 8, 2015. So I have a full month worth of data. I plan to continue running to refine the numbers. So, stay tuned!
Results:
My experiments produced the following graph:
Full table with numbers is available here
https://docs.google.com/spreadsheets/d/1MPbcMozRhtANmHx04SSlVSb0g6JNfatG8kbJHEngS7E/edit?usp=sharing
When I tallied up all the numbers it showed that I spend roughly 20% of my coding time writing tests on average. So this means that one fifth of my time is spent writing code that is never seen by our users and is a liability that doesn’t bring in any revenue. I believe this number is well worth the investment but ...
We got the numbers, now what?
While numbers themselves are interesting the next step is a bit more difficult. What is the lesson. There will be a sister blog post that talks about pros and cons of testing, some of which are more obvious than others. But I think there are a few takeaways for this
20% spent on tests is well worth the investment (stay tuned for next post)
I will continue running this experiment to look at trends.
There may come a time of diminishing returns. So far, I have not come up with a way to determine of what number will make me reconsider the necessity of tests.
It is important to note that not all applications or part of the same application deserve the same amount of testing. This experiment did not distinguish this
I would love to get your feedback on what the cost of your tests are so I can compare and maybe come up with some more concrete goals, whether it be to write more tests or less.
I consider this experiment very much a beginning of a brand new journey. This is the first time I’ve seen, much less done it myself, any quantitative analysis of testing. I would love to see this integrated more into developers’ toolkits as continue to find better and more efficient ways to do what we enjoy!
Acknowledgements:
Thanks to
gh:chrisbra for being very helpful in adding additional functionality to BufTimer that I needed for this project.
Naoum Naoum for editing and keeping me honest









