Testing Cloudify Auto-Scaling Rules
Closed Loop Feedback Test
Cloudify is an open source PaaS software stack, that automates deployment, monitoring and fault detection of applications running on the cloud. It can automatically add instances when monitored statistics exceeds a certain threshold. This automatic scaling rules algorithm implementation is a closed-loop control system which requires careful testing.
The diagram below shows the test load generator, and the closed loop "system under test". Each web server's throughput is being monitored by the controller (scaling rules). When throughput exceeds a certain threshold a new web server instance is started, and the throughput per instance goes down below threshold.
+--------------+ | Test | | + | | +------+-------+ | http | requests V +--------------+ +--------------+ | Tomcat | | Throughput | | Instance(s) +--->| JMX | | | | Monitor | +--------------+ +-------+------+ ^ | | add | | instance | +------+-----+ | | Scaling | | | Rule |<-------------+ | | +------------+
Here is the JMX plugin configured to expose the Total Number of web Requests (per instance)
https://gist.github.com/2788838
The scaling rule uses a 20 seconds sliding window to convert the Total number of Requests (X) into Throughput (delta X divided by delta T) and compares the result against a predefined threshold.
https://gist.github.com/2788840
The closed loop test starts with "zero traffic" changes to "constant traffic" and then back to "zero traffic":
Start without any web traffic
Wait until minimum number of instances.
Increase web traffic to a predefined level of requests per second
Wait until expected number of instances.
Wait a little more, make sure no add/remove instance fluctuations.
Wait until minimum number of instances.
The test waits until the expected number of instances is reached (step 4), and stays there for certain period of time (step 5). During that time we must verify that the scale out is performed without fluctuations. An unwanted fluctuation is when without any input change (stable input http traffic) an instance is added and then removed by the controller.
In more advanced test scenarios we may want to monitor resources such as number of busy threads, or CPU usage. This would require a more sophisticated HTTP load generator, which is usually used in stress/performance testing.
The problem with developing a closed loop feedback system is that you cannot test the controller (scaling rules) in an isolated environment. Every decision the controller makes affects the output of the system which affects the controller. The way to deal with that is to "open" the loop (non-feedback controller). The controller takes a decision, but the result does not affect the monitored data feeding the controller.
+--------------+ | Test | | + | | +------+-------+ | set | value V +--------------+ +--------------+ | Stub | | value | | Instance(s) | | monitor | | | | | +--------------+ +------+-------+ ^ | | add | | instance | +------+-----+ | | Scaling | | | Rule |<------------+ | | +------------+
Here is a little Cloudify recipe trick. Each Cloudify instance stores the recipe as a POJO in memory, which allows adding new properties. In this case we add a long value which mocks the web server throughput.
https://gist.github.com/2788844
This recipe allows the test to remotely inject the monitored values that the service exposes to the scaling rules controller. This mock value is not affected by the scaling rules decisions and does not require any actual web server instance running.
Here is how an open loop controller test looks like:
Wait for minimum number of instances.
Set monitored value to "$highthreshold+1"
Wait for maximum number of instances.
Wait for minimum number of instances.
Notice that step 4 expects the scaling out to be performed again and again until the maximum number of instances is reached. This is since there is no closed loop feedback. No matter how many web instances the scaling rules start, there would always see the same monitored value (greater than high threshold).