Skip to main content

Extend app performance dashboards

Last updated on October 24, 2024

Introduction

AccelByte Gaming Services (AGS) Extend Observability provides you access to Grafana Cloud dashboards where you can get a high-level view of your Extend apps' performance metrics. By using these dashboards, you can quickly see how each of your Extend apps are running without needing to read through detailed logs and metrics, which will allow you to catch problems that arise earlier and make troubleshooting more efficient.

This article covers how to access the dashboards, what information they show, and some potential use cases for that information.

Prerequisites

To use the AGS Extend Observability dashboards, you will need to have configured at least one AGS Extend app (Override, Service Extension, or Event Handler), configured it, and integrated it with AGS.

Access the dashboards

To access your Extend app dashboards, you can use the steps within Introduction to Observability. Once logged in to Grafana Cloud, do the following:

  1. Click Dashboards on the sidebar.

  2. In the search bar, search for "Extend dashboard". The available Extend dashboards will appear in the list.

  3. Click the dashboard you want to view:

All Extend Apps Performance dashboard

The All Extend Apps Performance dashboard contains sections of panels that show performance metrics for the AGS Extend apps across all your game namespaces. If you want to generate the data of Extend apps from a specific game namespace, use the Game Namespace filter at the top of the dashboard. You can also apply default and custom filters, such as app ID, name, and scenario, to refine the data displayed on the dashboard.

note

Make sure to also specify the time period of the data you want to generate. The time period settings is at the top-right side of the dashboard and is set to the last 30 minutes.

The All Extend Apps Performance dashboard contains the following panels:

Overview section

This section's panels show the following general information and basic performance metrics of all your Extend apps:

Total Game Namespace panel

Shows the total number of your game namespaces with Extend apps that have actively deployed images.

Total App panel

Shows the total number of Extend apps with actively deployed images.

App Information panel

Lists all the Extend apps with actively deployed images. showing each app's ID, name, source game namespace, and Extend scenario (or app type).

Deployment Duration panel

Lists the deployments of Extend apps, showing the the date, time, and duration of each deployment.

Failed Deployment (count) panel

Shows the top three Extend apps with the highest count of failed image deployments.

Timeout Deployment (count) panel

Shows the top three Extend apps with the highest count of image deployment attempts that took longer than the deployment timeout duration limit would allow.

CPU Usage (service) panel

Shows a graph showing the amount of CPU usage within the specified time range. Hover over the graph to see specific usage and time details.

Memory Usage (service) panel

Shows a graph showing the amount of memory usage of the Extend apps. Hover over the graph to see specific usage and time details.

Replica Count & Limit panel

Displays a graph showing the number of replicas in the Extend apps with actively deployed images.

Total Service Error Logs panel

Displays a graph showing the number of service error logs generated from Extend apps with actively deployed images.

Service Error Logs panel

Lists the last 20 service error logs generated from all Extend apps with actively deployed images.

Overridable feature section

This section's panels show performance information specific to the Extend Override apps.

Received Rate panel

The rate of requests received by AGS from the Extend Override apps.

Response Rate per Status Code panel

The rate of responses made by AGS to the Extend Override apps, categorized by gRPC status code.

Response Latency panel

A graph showing the delay in milliseconds between when a request is received by AGS from the Override apps and when a response is sent by AGS.

Event Handler section

This section's panels show performance information specific to the Extend Override apps.

Record Read Total panel

A graph showing the number of Kafka Connect events that were listened for by the Event Handler apps.

Records Consumed Rate panel

A graph showing the number of Kafka Connect events that were listened for and consumed by the Event Handler apps.

Records Lag panel

The number of events that failed to be handled upon request.

Service Extension section

This section's panels show performance information specific to the Service Extension apps.

Request Success Rate/5m (Linkerd) panel

A graph showing the rate of successful HTTP requests per five minutes from the Event Handler apps to the AGS backend.

Response Latency panel

A graph showing the delay in milliseconds between when a request is received by AGS from the Override apps and when a response is sent by AGS.

5xx/5m (Linkerd) panel

Number of completed HTTP responses with a status code between 400-499 (server error responses), indicating a temporary issue with the AGS backend.

Event Handler dashboard

The Event Handler (EH) dashboard contains sections of panels that show performance metrics for your AGS Extend Event Handler app.

To view the data of a specific Event Handler app, set the environment and game namespace at the top of the dashboard. Then, from the App dropdown, select the app you want to view. The dashboard displays the data for your selected app.

See below for what panels are in each section and description of the information they convey.

EH Overview section

This section's panels show the app's general information and basic performance metrics.

See below for this section's specific panel information.

EH App Information panel

General app information.

  • App ID: The ID for this app, used in code.
  • App: The name of the app.
  • Game Namespace: What game namespace this app is under.
  • Extend Scenario: The type of AGS Extend app (event-handler, function-override, or service-extension)

EH App Creation Duration panel

How long in minutes it took for the selected app to be created.

EH Deployment Duration panel

Information related to the select app's image deployments. Each item represents a new image version deployed.

  • deployment_time: The date and time the image was deployed.
  • deployment_id: The ID for the deployed image.
  • deployment_duration: How long in seconds it took for this image to deploy.
tip

You can use the information on this panel to troubleshoot image deployment problems. For example, if performance started degrading when a specific image was deployed, or the deployment_duration of a specific image took longer than other deployments, it could be worth investigating if there's something wrong with that specific image. If you can't find anything wrong with the image, you can reach out to AccelByte support for help.

EH Failed Deployment (count) panel

The number of image deployment attempts that were unsuccessful.

tip

A high number of failed deployment attempts could indicate something is wrong with your image deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Timeout Deployment (count) panel

The number of image deployment attempts that took longer than the deployment timeout duration limit would allow.

tip

If there is a pattern of images taking too long to deploy, there could be inefficient logic in your deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Replica Count & Limit panel

The number of replicas that have been created for the selected Extend app, and the max number of replicas that can be created. Replicas increase the resources available for the app. A new replica is created when this app's CPU or memory utilization exceeds 80%.

tip

If your replica count is reaching the replica limit, you can reach out to AccelByte support to discuss increasing your replica limit, or to get help in decreasing your resource utilization.

EH Container OOMKilled per Replica panel

Out of Memory Killed (OOMKilled). A list of replicas that have been killed or stopped due to running out of memory.

EH Service Container CPU Usage panel

A graph showing the amount of CPU usage within the time frame selected. Hover over the graph to see specific usage and time details.

note

This graph only shows the CPU usage for the selected app. It is not an aggregate of CPU usage with other apps and services.

EH Service Container Memory Usage panel

A graph showing the amount of memory usage within the time frame selected. Hover over the graph to see specific usage and time details.

note

This graph only shows the memory usage for the selected app. It is not an aggregate of memory usage with other apps and services.

EH Total Service Error Logs panel

A graph showing the number of error logs generated within the time frame selected. Hover over a time to see how many logs were generated at that time.

tip

You can use this to see when more errors started occurring. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Service Error Logs panel

The last 20 error logs that were generated.

tip

This will show you the most recent error logs generated, which can help you resolve current issues. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Kafka Connect section

This section's panels show information related to Kafka Connect and the consumption of events.

See below for this section's specific panel information.

EH Partition Count panel

A graph showing the number of topic partitions (or event sources) assigned to this app at a given time within.

EH Record Read Total panel

A graph showing the number of Kafka Connect events that were listened for by this app at a given time.

tip

You can use this information to see what times your app receives a lot of requests, aiding in troubleshooting and error prevention.

EH Records Consumed Rate panel

A graph showing the number of Kafka Connect events that were listened for and consumed by this app at a given time.

tip

A consumption rate that is inconsistent with the record reads could indicate a problem with logic or insufficient resources. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Records Lag panel

The number of events that were unable to be handled upon request.

tip

A trend of records experiencing lag could indicate insufficient resources to handle the demand for the requests to your app. You can reach out to AccelByte support to discuss allocating more resources to the app.

Overridable Features dashboard

The Overridable Features (OF) dashboard contains sections of panels that show performance metrics for your AGS Extend Override app.

To view the data of a specific Extend Override app, set the environment and game namespace at the top of the dashboard. Then, from the App dropdown, select the app you want to view. The dashboard displays the data for your selected app.

See below for what panels are in each section and description of the information they convey.

OF Overview section

This section's panels show the app's general information and basic performance metrics.

See below for this section's specific panel information.

OF App Information panel

General app information.

  • App ID: The ID for this app, used in code.
  • App: The name of the app.
  • Game Namespace: What game namespace this app is under.
  • Extend Scenario: The type of AGS Extend app (event-handler, function-override, or service-extension)

OF App Creation Duration panel

How long in minutes it took for the selected app to be created.

OF Deployment Duration panel

Information related to the select app's image deployments. Each item represents a new image version deployed.

  • deployment_time: The date and time the image was deployed.
  • deployment_id: The ID for the deployed image.
  • deployment_duration: How long in seconds it took for this image to deploy.
tip

You can use the information on this panel to troubleshoot image deployment problems.. For example, if performance started degrading when a specific image was deployed, or the deployment_duration of a specific image took longer than other deployments, it could be worth investigating if there's something wrong with that specific image. If you can't find anything wrong with the image, you can reach out to AccelByte support for help.

OF Failed Deployment (count) panel

The number of image deployment attempts that were unsuccessful.

tip

A high number of failed deployment attempts could indicate something wrong with your image deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF Timeout Deployment (count) panel

The number of image deployment attempts that took longer than the deployment timeout duration limit would allow.

tip

If there is a pattern of images taking too long to deploy, there could be inefficient logic in your deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF Replica Status

Shows the statuses of the replicas for the selected app.

OF Replica Count & Limit panel

The number of replicas that have been created for the selected Extend app, and the max number of replicas that can be created. Replicas increase the resources available for the app. A new replica is created when this app's CPU or memory utilization exceeds 80%.

tip

If your replica count is reaching the replica limit, you can reach out to AccelByte support to discuss increasing your replica limit, or to get help in decreasing your resource utilization.

OF Container OOMKilled per Replica panel

Out of Memory Killed (OOMKilled). A list of replicas that have been killed or stopped due to running out of memory.

OH Service Container CPU Usage

A graph showing the amount of CPU usage within the time frame selected. Hover over the graph to see specific usage and time details.

note

This graph only shows the CPU usage for the selected app. It is not an aggregate of CPU usage with other apps and services.

OH Service Container Memory Usage

A graph showing the amount of memory usage within the time frame selected. Hover over the graph to see specific usage and time details.

note

This graph only shows the memory usage for the selected app. It is not an aggregate of memory usage with other apps and services.

OF Total Service Error Logs panel

A graph showing the number of error logs generated within the time frame selected. Hover over a time to see how many logs were generated at that time.

tip

You can use this to see when more errors started occurring. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF Service Error Logs panel

The last 20 error logs that were generated.

tip

This will show you the most recent error logs generated, which can help you resolve current issues. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF gRPC section

This section's panels show information related to gRPC requests and responses.

See below for this section's specific panel information.

OF Response Rate panel

The number of responses made by AGS to the requests received by the selected Extend app.

tip

If there is a discrepancy between the received rate and the response rate, it could indicate a temporary issue with the AGS backend. If this occurs, reach out to AccelByte support for more information.

OF Response Rate per Status Code panel

The rate of responses made by AGS to the selected Extend app, categorized by gRPC status code.

tip

You can use this number to compare the number of requests you anticipate your Extend app to make with the number AGS receives by status code. If there is a discrepancy, you can investigate the issue. If you're unable to determine the cause, you can reach out to AccelByte to support for help.

OF Response Latency panel

A graph showing the delay in milliseconds between when a request is received by AGS from the selected Extend app and when a response is sent by AGS.

tip

A large delay in response time may indicate a temporary slowdown of the AGS backend. If this occurs, reach out to AccelByte support for more information.

OF Received Rate panel

The rate of requests received by AGS from the selected Extend app.

tip

You can use this number to compare the number of requests you anticipate your Extend app to make with the number AGS receives. If there is a discrepancy, you can investigate the issue. If you're unable to determine the cause, you can reach out to AccelByte to support for help.

Service Extension dashboard

The Service Extension (SE) dashboard contains sections of panels that show performance metrics for your AGS Extend Service Extension app.

To view the data of a specific Service Extension app, set the environment and game namespace at the top of the dashboard. Then, from the App dropdown, select the app you want to view. The dashboard displays the data for your selected app.

See below for what panels are in each section and description of the information they convey.

SE Overview section

This section's panels show the app's general information and basic performance metrics.

See below for this section's specific panel information.

SE App Information panel

General app information.

  • App ID: The ID for this app, used in code.
  • App: The name of the app.
  • Game Namespace: What game namespace this app is under.
  • Extend Scenario: The type of AGS Extend app (event-handler, function-override, or service-extension)

SE App Creation Duration panel

How long in minutes it took for the selected app to be created.

SE Deployment Duration panel

Information related to the select app's image deployments. Each item represents a new image version deployed.

  • deployment_time: The date and time the image was deployed.
  • deployment_id: The ID for the deployed image.
  • deployment_duration: How long in seconds it took for this image to deploy.
tip

You can use the information on this panel to troubleshoot image deployment problems.. For example, if performance started degrading when a specific image was deployed, or the deployment_duration of a specific image took longer than other deployments, it could be worth investigating if there's something wrong with that specific image. If you can't find anything wrong with the image, you can reach out to AccelByte support for help.

SE Failed Deployment (count) panel

The number of image deployment attempts that were unsuccessful.

tip

A high number of failed deployment attempts could indicate something wrong with your image deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE Timeout Deployment (count) panel

The number of image deployment attempts that took longer than the deployment timeout duration limit would allow.

tip

If there is a pattern of images taking too long to deploy, there could be inefficient logic in your deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE Replica Status

Shows the statuses of the replicas for the selected app.

SE Replica Count & Limit panel

The number of replicas that have been created for the selected Extend app, and the max number of replicas that can be created. Replicas increase the resources available for the app. A new replica is created when this app's CPU or memory utilization exceeds 80%.

tip

If your replica count is reaching the replica limit, you can reach out to AccelByte support to discuss increasing your replica limit, or to get help in decreasing your resource utilization.

SE Container OOMKilled per Replica panel

Out of Memory Killed (OOMKilled). A list of replicas that have been killed or stopped due to running out of memory.

SE Service Container CPU Usage

A graph showing the amount of CPU usage within the time frame selected. Hover over the graph to see specific usage and time details.

note

This graph only shows the CPU usage for the selected app. It is not an aggregate of CPU usage with other apps and services.

SE Service Container Memory Usage

A graph showing the amount of memory usage within the time frame selected. Hover over the graph to see specific usage and time details.

note

This graph only shows the memory usage for the selected app. It is not an aggregate of memory usage with other apps and services.

SE Total Service Error Logs panel

A graph showing the number of error logs generated within the time frame selected. Hover over a time to see how many logs were generated at that time.

tip

You can use this to see when more errors started occurring. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE Service Error Logs panel

The last 20 error logs that were generated.

tip

This will show you the most recent error logs generated, which can help you resolve current issues. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE HTTP section

This section's panels show metrics related to HTTP request and responses.

See below for this section's specific panel information.

SE Request Success Rate/5m (Linkerd) panel

A graph showing the rate of successful HTTP requests per five minutes from the selected Extend app to the AGS backend.

SE Response Latency panel

A graph showing the delay in milliseconds from when an HTTP request is received by the AGS backend from the selected Extend app and when AGS sends a response.

tip

A large delay in response time may indicate a temporary slowdown of the AGS backend. If this occurs, reach out to AccelByte support for more information.

SE 2xx/5m (Linkerd) panel

Number of completed HTTP responses with a status code between 200-299 (successfully completed).

SE 4xx/5m (Linkerd) panel

Number of completed HTTP responses with a status code between 400-499 (client error responses), indicating an issue with your selected Extend app.

tip

These errors indicate an issue with the client (your Extend app). Troubleshoot your app to see what might be causing the error. If you're unable to determine the cause, reach out to AccelByte support for help.

SE 5xx/5m (Linkerd) panel

Number of completed HTTP responses with a status code between 400-499 (server error responses), indicating a temporary issue with the AGS backend.

tip

These errors indicate a temporary issue with the AGS backend. If this occurs, reach out to AccelByte support for more information.