From 0 to 10000X - Testing Automation + Continuous Integration for Any EnvironmentPosted By: Kobi Kisos on April 29, 2014
How we built a testing-as-service infrastructure for our team of developers using Git, QuickBuild, JClouds, Cloudify, Logstash, TestNG, Tomcat, & MySQL
System testing is one of the most tedious, time consuming and error prone procedures in software development. More so when developing and testing distributed middleware products. All this, when achieving a robust continuous integration system is highly reliant on a completely automated testing environment.
In this post I’m going to take a deep dive on the testing automation used by our developers to test even their most complicated code in fully distributed production like environment, running more than 10K tests a day. Using this framework, our developers can test their code at scale (in some cases up to hundreds of nodes) at the click of a button, across AWS, RackSpace, HP OpenStack and our-prem environment.
Testing Automation as Our Continuous Integration Enabler
One of the key factors that enable continuous integration is automatic testing. If you want to achieve very complicated testing you will need to build automatic tests that can run all the time in order to get quick feedback. This is eventually the main goal in continuous integration - a quick feedback loop.
In order to achieve this level of testing we chose to run our tests in the cloud using Cloudify. This is actually kind of like drinking our own merlot - using our products in production to test their production readiness. The actual use of the product when putting it to the test actually affords us even further testing of its capabilities. (But that’s just an added bonus).
Using Cloudify for the purpose of testing automation is really not very different from running any app in the cloud. To do this, we wrote a service which knows how to run our tests, and this service uses the testing framework, TestNG. We chose TestNG, because this is a testing framework our developers are very familiar with, as it is a unit test testing framework, which enables a lower barrier of entry. This all ties into our efforts of tool unification and developer portability. This also enables our developers to run and debug the tests within the IDE on their own laptop, further simplifying the complexity normally involved with system testing.
Using Cloudify enables us to install many of the same service instances in parallel and in this way we can run our spectrum of tests in parallel, as well. So, for example, if we have 1000 tests to run - we can distribute them among our 100 service instances, where each service instance runs only 10 tests.
Test drive Cloudify and build your testing-as-a-service framework in minutes. Go
Just as a side note, these are all system tests, not unit tests, where by default the system is distributed across a number of machines that contain the many different integrated software components - so each test actually tests all of these software components.
This parallel system allows us to run a few suites simultaneously. Each suite is a logical group that tests a different feature, or different set of features. Each suite is divided into a few service instances. This makes it possible to run a larger number of tests in parallel. This is our method for testing automation, and it’s all available in GitHub.
How We Do It
Our build servers are on all the time and they build, compile and package our products.
We also have two additional dedicated machines for the testing framework on EC2 with elasticsearch, logstash, and Redis, and the other machine is Tomcat and MySQL. We also have a third machine that is the Cloudify Manager, that orchestrates the testing services.
When this is completed the testing begins. What the build server does is install the testing services. After these are installed, Cloudify starts the machine, and then the service(s) or service instances, and at this point the test clients start to run.
There are a few possible test flows and scenarios. The first, is running a test on a single machine - that is the client and the server run on the same machine, and in our case also on the same cloud - this is the simplest one.
A more complicated scenario is running the client tests on one cloud and the tested service on any other cloud. For example when we want to test our HP cloud driver, we run the client service on EC2, and start service on other clouds, in this case HP Cloud. This is a pretty complicated flow.
In this way we test other clouds like HP Cloud, Rackspace, Exoscale, EC2, Softlayer, and Softlayer bare metal. This does not include testing of Cloudify’s BYON functionality, this is tested in our own local lab.
All the logs are then sent to the logstash server, so we can have all of the runtime logs and use them as needed.
When all of the test instances or services instances have finished running their tests, we send the team the results that contain the logs from the logstash server, and we also keep the results in MySQL. We have a dashboard that you see in our SCRUM room with all the suites results, so our team is always up to date with the latest results.
Just as an aside, apropos our previous post on test driven development, (AKA Behavior Driven Development) this level of testing differs, as it deals with the entire system. TDD deals with unit tests - and just for context, these unit tests run every half hour for each product/component.
These testing processes have helped us greatly improve our developer’s code. In the last two years, we have rewritten our application from scratch - which is a very complicated undertaking. Since we work in agile scrum, we have a two week sprint. Each day the developer sees the dashboard and if its a regression result they fix it immediately and at the end of the sprint, everything new feature is considered beta, and every other bug is a “show stopper.”
Eight years ago all our tests ran once or twice in a release. Five years ago - once a day.
Today all our tests run pretty much all the time which allows for the quick feedback we need for our continuous integration processes and bug fixes - which lends to better code, which is the ultimate goal eventually.