Skip to main content

Game Health dashboards

Last updated on November 18, 2024
AGS Private Cloud

Only Private Cloud customers can access the Game Health dashboards in Grafana Cloud.

Overview

The AccelByte Gaming Services (AGS) Game Health dashboards in Grafana Cloud offer a clear view of important metrics that affect game performance. These dashboards gather data from various services, providing real-time insights into key areas such as concurrent users (CCU), server health, and matchmaking activity.

This unified solution helps game developers monitor and optimize performance, ensuring smooth gameplay and a great player experience. The dashboard features tailored views for different services, including IAM, Lobby, and Matchmaking, giving a comprehensive overview of game health.

info

To learn how to access and view Grafana Cloud dashboards, refer to the Access Grafana Cloud and View dashboards pages.

Game Health dashboards

The Game Health dashboards provided by the AGS and Grafana Cloud integration are as follows:

Game Health Overview Dashboard

The Game Health Overview Dashboard provides a comprehensive snapshot of the game's performance and player engagement. By identifying trends and potential issues, it enables data-driven decisions to enhance gameplay experiences and further optimization.

The Game Health Overview Dashboard has three sections: Login and Lobby Overview, Matchmaking and Session, Overview, and Other Service Activity Overview.

Login and Lobby Overview section

This section shows metrics about your game's logins and lobby connections, presented in the following panels:

PanelDescription
Current CCUThe current number of concurrent users (CCU) actively connected to the game.
CCU Change per MinuteThe number of concurrent users compared to one minute prior.
CCU Per PlatformThe number of concurrent users currently connected to the game, tagged by platform (e.g., PSN, Xbox, Steam).
CCU by Closest Configured RegionThe number of concurrent users currently connected to the game, based on their proximity to the nearest game server region. A “To be determined” tag indicates that the region information is still being processed on the client side as QoS measurements and lobby connections occur simultaneously.
Login Activity Count by PlatformThe number of login attempts for each platform, based on the chosen game namespace.
Login Success Rate by PlatformThe percentage of successful logins, calculated by dividing the number of successful attempts by the total login attempts for each platform.

Matchmaking and Session Overview section

This section shows metrics related to the matchmaking events and sessions in your game, presented in the following panels:

PanelDescription
Current Total Players In MatchThe total number of players currently engaged in active matches across all game modes.
Total Players by MatchpoolThe number of active players currently participating in each specific match pool.
Acquiring DS Wait TimeMeasures the time players wait to acquire a dedicated server (DS) in seconds, helping to identify potential delays in server allocation.
Average Time to MatchThe average time (in seconds) it takes for players to be matched with others. This is measured across all matchmaking attempts.
Matchmaking Success RateThe average percentage of players successfully matched for a game session compared to the total number of matchmaking attempts.
Graceless/Abnormal Lobby DisconnectsThe number of unexpected disconnections of players from the Lobby service.

Other Service Activity Overview section

This section shows metrics related to the performance and reliability of various services, such as Platform, Social, and Cloud Save, presented in the following panels:

PanelDescription
All Services Availability / Success RateThe overall percentage of successful service requests across all systems. This reflects the health and availability of all game services.
P95 LatencyThe 95th percentile of latency, meaning 95% of requests are completed within this time frame.
P99 LatencyThe 99th percentile of latency, meaning 99% of requests are completed within this time frame.

IAM Service Dashboard

The IAM Service Dashboard enables quick identification of trends and issues affecting user authorization and authentication through transparent metrics. Tracking key data such as Login Success Rates and 3rd-Party Token Validation insights enables you to proactively address problems and enhance user experience.

The IAM Service Dashboard has two sections: Overview and Resilience and Monitoring.

Overview section

This section shows metrics related to login activities and third-party token validation from various platforms, presented in the following panels:

PanelDescription
Login Activity Count by PlatformThe number of login attempts for each platform based on the chosen game namespace.
Login Success Rate by PlatformThe percentage of successful logins, calculated by dividing the number of successful attempts by the total login attempts for each platform.
3rd Party Token Validation Errors (PSN, Xbox, Epic, Steam)The number of token validation errors from the AGS IAM Service to various third-party platforms, such as PSN, Xbox, Epic, and Steam.
3rd Party Token Validation p95 Latency (PSN, Xbox, Epic, Steam)This metric indicates the longest response time for 95% of players logging in with a third-party platform, measured from the AGS IAM service to the platform. This shows how quickly most players are logging into the game on each platform.
3rd Party Token Validation p99 Latency (PSN, Xbox, Epic, Steam)This metric indicates the longest response time for 99% of players logging in with a third-party platform, measured from the AGS IAM service to the platform. This shows how quickly most players are logging into the game on each platform.

Resilience and Load Monitoring section

This section shows metrics related to the overall performance and reliability of the AGS IAM service, focusing on its ability to handle requests efficiently and maintain stability under load, presented in the following panels:

PanelDescription
Requests per second (RPS)Measures how many requests the IAM service handles each second, providing insight into its overall workload.
4xx Error RateShows how often players encounter issues due to invalid requests (e.g., accessing unavailable content).
5xx Error RateTracks the frequency of server errors that can prevent players from connecting to the game.
Proxy P95 LatencyShows the longest response time for 95% of players, indicating how quickly most players are getting their requests processed.
Proxy P99 LatencyHighlights the slowest response times for 99% of players, helping identify any issues affecting a small subset of users.
Service RestartsCounts how often the IAM service restarts, which may indicate stability issues that could disrupt player experiences.

Lobby Service Dashboard

This dashboard tracks key metrics for the Lobby service, providing insights into real-time concurrent users (CCU) connected to the service, lobby connection health, and the success of lobby notifications and third-party friend synchronization.

The Lobby Service Dashboard has two sections: Overview and Resilience and Monitoring.

Overview section

This section shows metrics related to concurrent users and connectivity with the AGS Lobby service within the game, presented in the following panels:

PanelDescription
Current CCUThe current number of concurrent users (CCU) actively connected to the game at this moment.
CCU Rate/MinThe rate at which new users are joining or disconnecting from the game per minute.
CCU Per PlatformThe number of concurrent users currently connected to the game, broken down by platform (e.g., PSN, Xbox, Steam).
CCU by Closest Configured RegionShows the total number of concurrent users based on their proximity to the nearest game server region, with 'To be determined' indicating that the region information is still being processed on the client side as QoS measurements and lobby connections occur simultaneously.
Graceless/Abnormal Lobby DisconnectThe number of unexpected disconnections of players from the lobby service.
Get Notification Success RateDisplays the success rate of lobby notifications, ensuring proper communication between users and the service.

Resilience and Load Monitoring section

This section shows metrics related to the overall performance and reliability of the AGS Lobby service, focusing on its ability to handle requests efficiently and maintain stability under load, presented in the following panels:

PanelDescription
Requests per second (RPS)Measures how many requests the Lobby service handles each second, giving an idea of its overall workload.
4xx Error RateShows how often players encounter issues due to invalid requests (e.g., accessing unavailable content).
5xx Error RateTracks the frequency of server errors that can prevent players from connecting to the game.
Proxy P95 LatencyShows the longest response time for 95% of players, indicating how quickly most players are getting their requests processed.
Proxy P99 LatencyHighlights the slowest response times for 99% of players, helping identify any issues affecting a few users.
Service RestartsCounts how often the Lobby service restarts, which may indicate stability issues that could disrupt player experiences.

Matchmaking Service Dashboard

This dashboard provides key insights into the performance of the matchmaking service, helping developers monitor player activity, matchmaking efficiency, and wait times.

The Matchmaking Service Dashboard has two sections: Overview and Resilience and Load Monitoring.

Overview section

This section shows metrics related to matchmaking engagement and performance, presented in the following panels:

PanelDescription
Current Total Players In MatchThe total number of players currently engaged in active matches across all game modes.
Players by MatchpoolThe number of active players currently participating in each specific match pool.
Matchmaking Success RateThe percentage of players successfully matched for a game session compared to the total matchmaking attempts.
Average Time to Match in SecondsThe average time it takes for players to be matched with others, measured in seconds across all matchmaking attempts.

Resilience and Load Monitoring section

This section shows metrics related to the overall performance and reliability of the AGS Matchmaking service, focusing on its ability to handle requests efficiently and maintain stability under load, presented in the following panels:

PanelDescription
Requests per second (RPS)Measures how many requests the Matchmaking service handles each second, giving an idea of its overall workload.
4xx Error RateShows how often players encounter issues due to invalid requests (e.g., accessing unavailable content).
5xx Error RateTracks the frequency of server errors that can prevent players from connecting to the game.
Proxy P95 LatencyShows the longest response time for 95% of players, indicating how quickly most players are getting their requests processed.
Proxy P99 LatencyHighlights the slowest response times for 99% of players, helping identify any issues affecting a few users.
Service RestartsCounts how often the Matchmaking service restarts, which may indicate stability issues that could disrupt player experiences.

AMS Service Dashboard

This dashboard provides visibility into the availability and performance of dedicated game servers, focusing on server count and wait times for acquiring servers.

The AMS Service Dashboard has two sections: Overview and Resilience and Load Monitoring.

Overview section

This section shows metrics related to the availability and performance of dedicated game servers (DS), presented in the following panels:

PanelDescription
DS CountDisplays the total number of active dedicated servers available to handle game sessions.
Acquiring DS Wait TimeMeasures the time players wait to acquire a dedicated server (DS), helping to identify potential delays in server allocation.

Resilience and Load Monitoring section

This section shows metrics related to the overall performance and reliability of the AMS service, focusing on its ability to handle requests efficiently and maintain stability under load, presented in the following panels:

PanelDescription
Requests per second (RPS)Measures how many requests the AMS service handles each second, giving an idea of its overall workload.
4xx Error RateShows how often players encounter issues due to invalid requests (e.g., accessing unavailable content).
5xx Error RateTracks the frequency of server errors that can prevent players from connecting to the game.
Proxy P95 LatencyShows the longest response time for 95% of players, indicating how quickly most players are getting their requests processed.
Proxy P99 LatencyHighlights the slowest response times for 99% of players, helping identify any issues affecting a few users.
Service RestartsCounts how often the AMS service restarts, which may indicate stability issues that could disrupt player experiences.

Cloud Save Service Dashboard

This dashboard monitors the performance of the cloud save service, focusing on data payload sizes and the success rate of save and update operations.

The Cloud Save Service Dashboard has two sections: Overview and Resilience and Load Monitoring.

Overview section

This section shows metrics related to the performance and efficiency of Cloud Save operations, presented in the following panels:

PanelDescription
Payload Size DistributionDisplays the distribution of data payload sizes to help optimize storage and transmission efficiency. JSON records over 1MB can negatively impact performance because of the additional space required for formatting and escaping characters. For larger payloads, using Binary Cloud Save is recommended for better performance and efficiency. For more details, refer to the official documentation.
Create and Update Success RateTracks the percentage of successful creation and update of Public Game Records operations to ensure reliable cloud save functionality.

Resilience and Load Monitoring section

This section shows metrics related to the overall performance and reliability of the AGS Cloud Save service, focusing on its ability to handle requests efficiently and maintain stability under load, presented in the following panels:

PanelDescription
Requests per second (RPS)Measures how many requests the Cloud Save service handles each second, giving an idea of its overall workload.
4xx Error RateShows how often players encounter issues due to invalid requests (e.g., accessing unavailable content).
5xx Error RateTracks the frequency of server errors that can prevent players from connecting or enjoying the game.
Proxy P95 LatencyShows the longest response time for 95% of players, indicating how quickly most players are getting their requests processed.
Proxy P99 LatencyHighlights the slowest response times for 99% of players, helping identify any issues affecting a few users.
Service RestartsCounts how often the Cloud Save service restarts, which may indicate stability issues that could disrupt player experiences.

E-commerce Service Dashboard

This dashboard provides insights into the performance of the e-commerce service, focusing on transaction success and DLC synchronization efficiency.

The E-commerce Service Dashboard has two sections: Overview and Resilient and Load Monitoring.

Overview section

This section shows metrics related to player purchases, including in-app purchases and downloadable content (DLC) synchronization, presented in the following panels:

PanelDescription
In-app Purchase Success RateTracks the percentage of successful transactions to ensure smooth and reliable purchases.
DLC Sync Success RateMeasures the success rate of 3rd Party Store DLC synchronization, ensuring that purchased content is correctly delivered to users.

Resilience and Load Monitoring section

This section shows metrics related to the overall performance and reliability of the AGS Platform service, focusing on its ability to handle requests efficiently and maintain stability under load, presented in the following panels:

PanelDescription
Requests per second (RPS)Measures how many requests the Platform service handles each second, giving an idea of its overall workload.
4xx Error RateShows how often players encounter issues due to invalid requests (e.g., accessing unavailable content).
5xx Error RateTracks the frequency of server errors that can prevent players from connecting or enjoying the game.
Proxy P95 LatencyShows the longest response time for 95% of players, indicating how quickly most players are getting their requests processed.
Proxy P99 LatencyHighlights the slowest response times for 99% of players, helping identify any issues affecting a few users.
Service RestartsCounts how often the Platform service restarts, which may indicate stability issues that could disrupt player experiences.