Friday, January 18, 2013

Scale up early... scale down slowly...





In a distributed system, ability to expand or contract its resource pool is defined as scalability. A system can be scaled in two modes, horizontal and vertical. What we are interested in is horizontal scaling which is adding more nodes to a clustered distributed system.

In this article, you will learn the auto-scaling algorithm used in WSO2 Elastic Load Balancer, few tips you should keep in mind when calibrating auto-scaling decision making variables and also a brief explanation on a sample scenario.

What is auto-scaling?


When there is a sudden peak of requests coming to an application, we should ideally increase the amount of resources we have provided for that application. There comes a solution call auto-scaling. In an auto-scaling enabled system, system itself should detect such peaks and start-up new server instances, to cater the requirements, without any manual interception.

With the revolutionization of Cloud, today we can easily start new instances and terminate already existing instances at any given moment, that makes auto-scaling a possibility in a Cloud environment.

Where does this autoscale decision making task reside?

The ‘autoscaling decision making’ task currently resides in WSO2 Elastic Load Balancer. Default implementation is “org.wso2.carbon.mediator.autoscale.lbautoscale.task.ServiceRequestsInFlightAutoscaler. 

What is the basis for autoscaling?

Current default implementation (ServiceRequestsInFlightAutoscaler) considers number of requests in-flight as the basis for making autoscaling decisions. We follow the paradigm; “scale up early and scale down slowly” in the default algorithm.

What are the decision making variables?

There are few of them and all of the vital ones are configurable using loadbalancer.conf file. (sample configuration files are provided at the end of this document.)
  1. autoscaler_task_interval (t) - time period between two iterations of ‘autoscaling decision making’ task. When configuring this value, you are advised to consider the time ‘that a service instance takes to join ELB’. This is in milliseconds and the default value is 30000ms.
  1. max_requests_per_second (Rps) - number of requests, a service instance can withstand per a second. It is recommended that you calibrate this value for each service instance and may also for different scenarios. Ideal way to estimate this value could be by load testing a similar service instance. Default value is 100.
  1. rounds_to_average (r) - an autoscaling decision will be made only after this much of iterations of ‘autoscaling decision making’ task. Default value is 10.
  1. alarming_upper_rate (AUR)- without waiting till the service instance reach its maximum request capacity (alarming_upper_rate = 1), we scale the system up when it reaches the request capacity, corresponds to alarming_upper_rate. This value should be 0
  1. alarming_lower_rate (ALR) - lower bound of the alarming rate, which gives us a hint; that we can think of scaling down the system. This value should be 0
  1. scale_down_factor (SDF) - this factor is needed in order to make the scaling down process slow. We need to scale down slowly to reduce scaling down due to a false-positive event. This value should be 0

How does the number of requests in-flight gets calculated?

We keep track of the requests that come to Elastic Load Balancer (ELB) for various service clusters. For each incoming request, we add a token, against the relevant service cluster and when the message left ELB or got expired, we remove the corresponding token.

What are the decision making functions?

We always respect the minimum number of instances value and maximum number of instances value of service clusters. We make sure that the system always maintains the minimum number of service instance requirement and also system will not scale beyond its limit.
We calculate,
average requests in-flight for a particular service cluster (avg) =
total number of requests in-flight * (1/r)

Scaling up....

number of maximum requests that a service instance can withstand over an autoscaler task interval (maxRpt) =
(Rps) * (t/1000) * (AUR)
then, we decide to scale up, if,
avg > maxRpt * (number of running instances of this service cluster)

Scaling down....

imaginary lower bound value (minRpt) =
(Rps) * (t/1000) * (ALR) * (SDF)
then, we decide to scale down, if,
avg < minRpt * (number of running instances of this service cluster - 1)

Can I plug my own implementation?

You can write your own Java implementation which implements org.apache.synapse.task.Task and org.apache.synapse.ManagedLifecycle interfaces. Wrap the implementation class to an OSGi bundle and deploy in WSO2 ELB. Then, point to that class from the {ELB_HOME}/repository/conf/loadbalancer.conf file’s loadbalancer section as follows.
loadbalancer {
…....
# autoscaling decision making task
autoscaler_task  org.wso2.carbon.mediator.autoscale.lbautoscale.task.ServiceRequestsInFlightAutoscaler;
…...
}

Sample configuration files

Properties defined in the defaults section.

loadbalancer {
        # minimum number of load balancer instances
        instances               1;
        # whether autoscaling should be enabled or not.
        enable_autoscaler   true;
        #please use this whenever url-mapping is used through LB.
        #size_of_cache                  100;
        # autoscaling decision making task
        autoscaler_task org.wso2.carbon.mediator.autoscale.lbautoscale.task.ServiceRequestsInFlightAutoscaler;
        # End point reference of the Autoscaler Service
        autoscaler_service_epr ;
        # interval between two task executions in milliseconds
        autoscaler_task_interval 30000;
        # after an instance booted up, task will wait maximum till this much of time and let the server started up
        server_startup_delay 60000; #default will be 60000ms
        # session time out
        session_timeout 90000;
        # enable fail over
        fail_over true;
}
# services' details which are fronted by this WSO2 Elastic Load Balancer
services {
        # default parameter values to be used in all services
        defaults {
            # minimum number of service instances required. WSO2 ELB will make sure that this much of instances
            # are maintained in the system all the time, of course only when autoscaling is enabled.
            min_app_instances           1;
            # maximum number of service instances that will be load balanced by this ELB.
            max_app_instances           3;
            max_requests_per_second   5;
            rounds_to_average           2;
            alarming_upper_rate 0.7;
            alarming_lower_rate 0.2;
            scale_down_factor 0.25;
            message_expiry_time         60000;
        }
        appserver {
            hosts          appserver.cloud-test.wso2.com;
            domains   {
                3.appserver.domain {
                    tenant_range        *;
                    min_app_instances           0;
                }
            }
        }
}

Properties defined within the service element

loadbalancer {
        # minimum number of load balancer instances
        instances               1;
        # whether autoscaling should be enabled or not.
        enable_autoscaler   true;
        #please use this whenever url-mapping is used through LB.
        #size_of_cache                  100;
        # autoscaling decision making task
        autoscaler_task org.wso2.carbon.mediator.autoscale.lbautoscale.task.ServiceRequestsInFlightAutoscaler;
        # End point reference of the Autoscaler Service
        autoscaler_service_epr ;
        # interval between two task executions in milliseconds
        autoscaler_task_interval 30000;
        # after an instance booted up, task will wait maximum till this much of time and let the server started up
        server_startup_delay 60000; #default will be 60000ms
        # session time out
        session_timeout 90000;
        # enable fail over
        fail_over true;
}
# services' details which are fronted by this WSO2 Elastic Load Balancer
services {
        # default parameter values to be used in all services
        defaults {
            # minimum number of service instances required. WSO2 ELB will make sure that this much of instances
            # are maintained in the system all the time, of course only when autoscaling is enabled.
            min_app_instances           1;
            # maximum number of service instances that will be load balanced by this ELB.
            max_app_instances           3;
            max_requests_per_second   5;
            rounds_to_average           2;
            alarming_upper_rate 0.7;
            alarming_lower_rate 0.2;
            scale_down_factor 0.25;
            message_expiry_time         60000;
        }
        appserver {
            hosts          appserver.cloud-test.wso2.com;
            domains   {
                3.appserver.domain {
                    tenant_range        *;
                    min_app_instances           0;
                        max_requests_per_second   5;
                        alarming_upper_rate 0.6;
                            alarming_lower_rate 0.1;
                }
            }
        }
}