Sunday, July 22, 2012

Autoscaling Algorithm used in WSO2 Elastic Load Balancer 2.0




This algorithm is developed by Afkham Azeez, Director of Architecture, WSO2 Inc. and it calls "Request in-flight based autoscaling" algorithm.
We autoscale based on a particular service domain. Say we have following configurations specified for the service domain that we gonna autoscale, in the loadbalancer.conf file of the WSO2 Elastic Load Balancer.

queue_length_per_node 3;
rounds_to_average 2;
Few points to keep in mind:

  • Autoscaling task runs in every “t” milliseconds (which you can specify in loadbalancer.conf).
  • For each service domain we keep a vector (say “requestTokenListLengths”) which has a size of “rounds_to_average”.
  • For each service domain we keep a map (say “requestTokens”), where an entry represents a request token id and its time-stamp.
  • For each incoming request (to load balancer), we generate an unique token id and add it into the “requestTokens” map, of that particular service domain, along with the current time stamp.
  • And for each outgoing request we remove the corresponding token id from the “requestTokens” map of the corresponding service domain.
  • Further if a message has reached the “message expiry time”, we gonna remove the respective tokens from the “requestTokens” map.

Algorithm:

In each task execution and for a particular service domain:

  • We add the size of the “requestTokens” map into the “requestTokenListLengths” vector. If the size of the vector is reached “rounds_to_average”, we gonna remove the first entry of the vector, before adding the new entry.

  • We take a scaling decision only when the “requestTokenListLengths” vector has a size, which is more than or equals to the “rounds_to_average”. 

  • If the above condition is satisfied we gonna calculate the average requests in flight, dividing the sum of entries in the “requestTokenListLengths” vector by the size of the vector.

  • Then we gonna calculate the handleable requests capacity by the instances of this service domain, by multiplying “running instances” from “queue_length_per_node”.

  • Now, we are in a position to determine whether we need to scale up the system. For that we check whether the calculated “average requests in flight” is greater than “ handleable request capacity” and if so we gonna scale up. Before scaling up we perform few more checks like, whether we reach the “maximum number of instances” specified in the loadbalancer.conf file for this particular domain, whether there are any instance in pending state etc.

  • Then we calculate the handleable requests capacity of one-less of current running instances, by multiplying “(running instances - 1)” from “queue_length_per_node”. Next we check whether this value is greater than the average requests in flight. If so we gonna scale down. Before scaling down we gonna make sure that we maintain the minimum instance count of this service domain.

Let's look at an example scenario.

Task iteration
1
2
3
4
5
6
7
8
9
requestTokens” size
0
0
5
7
4
5
3
1
0

Let's say for service domain “X” you have specified the following configuration;

min_app_instances 0;
max_app_instances 5;
queue_length_per_node 3;
rounds_to_average 2;

Also, pendingInstances = 0 and runningInstances = 0.

Iteration 1:

0


Vector is not full → we cannot take a scaling decision

Iteration 2:

0
0

Vector is full → we can take a scaling decision
Average requests in flight → 0
Running Instances → 0
→ No scaling happens

Iteration 3:

0
5

Vector is full → we can take a scaling decision
Average requests in flight (l)→ 2.5
Running Instances (n)→ 0
queue_length_per_node → 3
→ 2.5 > 0*3 and pendingInstances=0→ scale up! → pendingInstances++

Iteration 4:

5
7

Vector is full → we can take a scaling decision
Average requests in flight (l)→ 6
Running Instances (n)→ 0
queue_length_per_node → 3
→ 6 > 0*3 and pendingInstances=1 → we don't scale up!

Iteration 5:

7
4

Vector is full → we can take a scaling decision
Average requests in flight (l)→ 5.5
Running Instances (n)→ 1
queue_length_per_node → 3
→ 5.5 > 1*3 and pendingInstances=0 → scale up! → pendingInstances++

Iteration 6:

4
5

Vector is full → we can take a scaling decision
Average requests in flight (l)→ 4.5
Running Instances (n)→ 2
queue_length_per_node → 3
→ 4.5 < 2*3 → we do not scale up!
→ 4.5 > 1*3 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 7:

5
3

Vector is full → we can take a scaling decision
Average requests in flight (l)→ 4
Running Instances (n)→ 2
queue_length_per_node → 3
→ 4 < 2*3 → we do not scale up!
→ 4 > 1*3 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 8:

3
1

Vector is full → we can take a scaling decision
Average requests in flight (l)→ 2
Running Instances (n)→ 2
queue_length_per_node → 3
→ 2 < 2*3 → we do not scale up!
→ 2 < 1*3 → scale down, since the load has gone down and we could mange to handle the current load with one-less instances.!

Iteration 9:

1
0

Vector is full → we can take a scaling decision
Average requests in flight (l)→ 0.5
Running Instances (n)→ 1
queue_length_per_node → 3
→ 0.5 < 1*3 → we do not scale up!
→ 0.5 > 0*3 → we do not scale down, since we can't handle the current load with one less running instances!

So, as you can see in a production environment it is critical that you pay a considerable amount of attention to the two properties namely queue_length_per_node and rounds_to_average. Uncalibrated values for these properties would lead to unnecessary regular scale ups and downs.

1 comment:

Nirmal Fernando said...

This algorithm has an improved version, and will wrote an article on that soon.