エクステンドの自動スケーリング
Overview
This article walks you through how scaling works for Hosted Extend Apps and how to configure them based on average target CPU utilization in the AccelByte Gaming Services (AGS) Admin Portal.
Hosted Extend service autoscaling strategy
Hosted Extend services use horizontal scaling as the strategy to match demand. Horizontal scaling works by adjusting the number of the running Extend app replicas. It's immediately scaled up the number of replicas in response to demand (no delay or stabilization period). CPU utilization percentage of the replica (including the internal components) is used as the proxy for demand. The Extend controller maintains the average of CPU utilization from all replicas of a hosted Extend service to be close to configured target CPU Utilization percentage which by default is 50% by using the following algorithm:
desiredAppReplicas = ceil[currentAppReplica * ( currentAvgCPUUtilizationPercentage / targetCPUUtilizationPercentage )]
These are some examples how controller decides number of replicas to run with given current and desired CPU utilization at any point in time:
For example, with a scale-out scenario, given that:
- The current average CPU utilization of the App replica is 90% (the replicas are highly utilized above target utilization).
- There are 2 App replicas currently running.
- With targetCPUUtilizationPercentage set to 50%.
Calculate with given parameter:
desiredAppReplicas = ceil[2 * ( 90% / 50% )]
desiredAppReplicas = 4The desired number of app replicas will be 4.
Compared with first example, with a scale-out scenario, given that:
- The current average CPU utilization of the App replica is 90% (the replicas are highly utilized above target utilization).
- There are 2 App replicas currently running.
- With targetCPUUtilizationPercentage set to 30%.
Calculate with given parameter:
desiredAppReplicas = ceil[2 * ( 90% / 30% )]
desiredAppReplicas = 6The desired number of app replicas will be 6. Compared to first example above, with the same current CPU usage by App replicas we get higher number of desired replicas because we want the target CPU utilization to be lower which is 30% by running more replicas compared to first one which is 50% that need less replicas to handle the load.
Another example, with a scale-in scenario, given that:
- The current average CPU utilization of the app replicas is 20% (the app replicas are underutilized below target utilization).
- There are 4 app replicas currently running.
- With targetCPUUtilizationPercentage set to 50%.
Calculate with given parameter:
desiredAppReplicas = ceil[4 * ( 20% / 50% )]
desiredAppReplicas = 2The desired number of app replicas will be 2.
Configure autoscaling average target CPU utilization
Parameter targetCPUUtilizationPercentage
is configurable from Admin Portal to meet dynamic needs of different extend apps use cases and implementations. This parameter is the desired percentage of CPU usage, the cluster will adjust the number of replicas to bring the average CPU utilization closer to the desired target.
targetCPUUtilizationPercentage
for Extend App in Admin Portal by default is 50% and can be set within valid range number minimum: 30% and maximum: 90%
To configure the autoscaling based on average target CPU utilization of an Extend app, follow these steps:
On the AGS Admin Portal sidebar, go to Extend and select the Extend app menu where your app is in.
From the list of Extend apps, click on the name of the Extend app you want to update to open its details page.
At the top-right side of the details page, click on the settings button (gear icon). The Settings page appears.
On the Settings page of the Extend app, configure the Target CPU Utilization (%) in Auto Scaling Policy form section as needed:
Click Save to apply your changes.
warningIf the Extend app you're configuring is currently running, the app will automatically restart after you save your changes to apply them, otherwise please redeploy to apply changes.
General guide to decide average target CPU utilization
This section helps developer decide target CPU utilization for autoscaling based on application needs and behavior.
The requirement for target CPU utilization can be very different, it is crucial that we test and monitor the performance of the extend app and adjust autoscaling accordingly.
These are some of considerations to decide target CPU utilization to meet different extend app cases:
- Understand Application Behavior, by assessing traffic patterns:
- Burst traffic: Frequent spikes in load.
- Steady traffic: Consistent workload over time.
- Start with a Baseline
- Default starting point is 50% target CPU utilization for every extend app
- Adjust based on traffic, these are a more generalize consideration:
- Bursty traffic: 30-50% for headroom during spikes
- Steady traffic: 60-80% for resource efficiency
- Consider scaling latency
- Scaling up adds delay before new replicas running and ready to receive traffic.
- Lower target CPU utilization (for example: 30-50%) provide buffer capacity to handle spike before new replicas are ready.
- Ensure accurate CPU requests
- HPA calculations depend on CPU requests (follow this page To Configure CPU requests)
- If CPU requests are too high, actual usage will appear lower than it is, causing under-scaling.
- If CPU requests are too low, actual usage will appear higher, causing over-scaling注記
Settings the right CPU requests might need some testing, monitoring and reassesing extend app performance over time.
- HPA calculations depend on CPU requests (follow this page To Configure CPU requests)
- Test and refine
- Depends on you requirements you might want to perform load testing to simulate peak traffic and observe autoscaling behavior
- Monitor overtime and adjust the target CPU utilization based on observed performance and stability
General Example Scenario:
Application Observations:
- CPU requests: 500m per replica
- Normal load: around 150-250m
- Peak load: up to 500m
Recommendation:
- Set CPU requests to 400m, which covers most of normal workload.
- Set target CPU utilization at 60%
- Utilization threshold: 400m * 60% = 240m
- This ensure scaling occurs before replicas hit their limits and maintain application performance in normal load.