メインコンテンツまでスキップ

Extend app performance dashboards

Last updated on July 15, 2024

Introduction

AccelByte Gaming Services (AGS) Extend Observability provides you access to Grafana Cloud dashboards where you can get a high-level view of your Extend apps' performance metrics. By using these dashboards, you can quickly see how each of your Extend apps are running without needing to read through detailed logs and metrics, which will allow you to catch problems that arise earlier and make troubleshooting more efficient.

This article covers how to access the dashboards, what information they show, and some potential use cases for that information.

Prerequisites

To use the AGS Extend Observability dashboards, you will need to have configured at least one AGS Extend app (Override, Service Extension, or Events Handler), configured it, and integrated it with AGS.

Access the dashboards

To access your Extend app dashboards, you can use the steps within Introduction to Observability. Once logged in to Grafana Cloud, do the following:

  1. Click Dashboards on the sidebar.
  2. In the search bar, search for "Extend dashboard". The available Extend dashboards will appear in the list.
  3. Click your desired dashboard in the list based on your app type (Override, Service Extension, or Events Handler).

Select your app

After clicking into one of the dashboards, to see the performance metrics for your desired Extend app, do the following:

  1. At the top of the dashboard page, select the appropriate Game Namespace where your app is located from the dropdown.
  2. Select the desired App from the dropdown.
注記

If your game namespace or Extend app are not appearing in the dropdowns at the top of the dashboard page, ensure that your game namespace and Extend app are configured correctly, and that the app is under the correct game namespace.

Event Handler dashboard

The Event Handler (EH) dashboard contains sections of panels that show performance metrics for your AGS Extend Event Handler app. See below for what panels are in each section and description of the information they convey.

EH Overview section

This section's panels show the app's general information and basic performance metrics.

See below for this section's specific panel information.

EH App Information panel

General app information.

  • App ID: The ID for this app, used in code.
  • App: The name of the app.
  • Game Namespace: What game namespace this app is under.
  • Extend Scenario: The type of AGS Extend app (event-handler, function-override, or service-extension)

EH App Creation Duration panel

How long in minutes it took for the selected app to be created.

EH Deployment Duration panel

Information related to the select app's image deployments. Each item represents a new image version deployed.

  • deployment_time: The date and time the image was deployed.
  • deployment_id: The ID for the deployed image.
  • deployment_duration: How long in seconds it took for this image to deploy.
ヒント

You can use the information on this panel to troubleshoot image deployment problems.. For example, if performance started degrading when a specific image was deployed, or the deployment_duration of a specific image took longer than other deployments, it could be worth investigating if there's something wrong with that specific image. If you can't find anything wrong with the image, you can reach out to AccelByte support for help.

EH Failed Deployment (count) panel

The number of image deployment attempts that were unsuccessful.

ヒント

A high number of failed deployment attempts could indicate something wrong with your image deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Timeout Deployment (count) panel

The number of image deployment attempts that took longer than the deployment timeout duration limit would allow.

ヒント

If there is a pattern of images taking too long to deploy, there could be inefficient logic in your deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Pod Count & Limit panel

The number of pods that have been created for the selected Extend app, and the max number of pods that can be created. Pods increase the resources available for the app. A new pod is created when this app's CPU or memory utilization exceeds 80%.

ヒント

If your pod count is reaching the pod limit, you can reach out to AccelByte support to discuss increasing your pod limit, or to get help in decreasing your resource utilization.

EH Container OOMKilled per Pod panel

Out of Memory Killed (OOMKilled). A list of pods that have been killed or stopped due to running out of memory.

EH Total Service Error Logs panel

A graph showing the number of error logs generated in the past 12 hours. Hover over a time to see how many logs were generated at that time.

ヒント

You can use this to see when more errors started occurring. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Service Error Logs panel

The last 20 error logs that were generated.

ヒント

This will show you the most recent error logs generated, which can help you resolve current issues. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Kafka Connect section

This section's panels show information related to Kafka Connect and the consumption of events.

See below for this section's specific panel information.

EH Partition Count panel

A graph showing the number of topic partitions (or event sources) assigned to this app at a given time within.

EH Record Read Total panel

A graph showing the number of Kafka Connect events that were listened for by this app at a given time.

ヒント

You can use this information to see what times your app receives a lot of requests, aiding in troubleshooting and error prevention.

EH Records Consumed Rate panel

A graph showing the number of Kafka Connect events that were listened for and consumed by this app at a given time.

ヒント

A consumption rate that is inconsistent with the record reads could indicate a problem with logic or insufficient resources. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

EH Records Lag panel

The number of events that were unable to be handled upon request.

ヒント

A trend of records experiencing lag could indicate insufficient resources to handle the demand for the requests to your app. You can reach out to AccelByte support to discuss allocating more resources to the app.

Overridable Features dashboard

The Overridable Features (OF) dashboard contains sections of panels that show performance metrics for your AGS Extend Override app. See below for what panels are in each section and description of the information they convey.

OF Overview section

This section's panels show the app's general information and basic performance metrics.

See below for this section's specific panel information.

OF App Information panel

General app information.

  • App ID: The ID for this app, used in code.
  • App: The name of the app.
  • Game Namespace: What game namespace this app is under.
  • Extend Scenario: The type of AGS Extend app (event-handler, function-override, or service-extension)

OF App Creation Duration panel

How long in minutes it took for the selected app to be created.

OF Deployment Duration panel

Information related to the select app's image deployments. Each item represents a new image version deployed.

  • deployment_time: The date and time the image was deployed.
  • deployment_id: The ID for the deployed image.
  • deployment_duration: How long in seconds it took for this image to deploy.
ヒント

You can use the information on this panel to troubleshoot image deployment problems.. For example, if performance started degrading when a specific image was deployed, or the deployment_duration of a specific image took longer than other deployments, it could be worth investigating if there's something wrong with that specific image. If you can't find anything wrong with the image, you can reach out to AccelByte support for help.

OF Failed Deployment (count) panel

The number of image deployment attempts that were unsuccessful.

ヒント

A high number of failed deployment attempts could indicate something wrong with your image deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF Timeout Deployment (count) panel

The number of image deployment attempts that took longer than the deployment timeout duration limit would allow.

ヒント

If there is a pattern of images taking too long to deploy, there could be inefficient logic in your deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF Pod Count & Limit panel

The number of pods that have been created for the selected Extend app, and the max number of pods that can be created. Pods increase the resources available for the app. A new pod is created when this app's CPU or memory utilization exceeds 80%.

ヒント

If your pod count is reaching the pod limit, you can reach out to AccelByte support to discuss increasing your pod limit, or to get help in decreasing your resource utilization.

OF Container OOMKilled per Pod panel

Out of Memory Killed (OOMKilled). A list of pods that have been killed or stopped due to running out of memory.

OF Total Service Error Logs panel

A graph showing the number of error logs generated in the past 12 hours. Hover over a time to see how many logs were generated at that time.

ヒント

You can use this to see when more errors started occurring. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF Service Error Logs panel

The last 20 error logs that were generated.

ヒント

This will show you the most recent error logs generated, which can help you resolve current issues. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

OF gRPC section

This section's panels show information related to gRPC requests and responses.

See below for this section's specific panel information.

OF Response Rate panel

The number of responses made by AGS to the requests received by the selected Extend app.

ヒント

If there is a discrepancy between the received rate and the response rate, it could indicate a temporary issue with the AGS backend. If this occurs, reach out to AccelByte support for more information.

OF Response Rate per Status Code panel

The rate of responses made by AGS to the selected Extend app, categorized by gRPC status code.

ヒント

You can use this number to compare the number of requests you anticipate your Extend app to make with the number AGS receives by status code. If there is a discrepancy, you can investigate the issue. If you're unable to determine the cause, you can reach out to AccelByte to support for help.

OF Response Latency panel

A graph showing the delay in milliseconds between when a request is received by AGS from the selected Extend app and when a response is sent by AGS.

ヒント

A large delay in response time may indicate a temporary slowdown of the AGS backend. If this occurs, reach out to AccelByte support for more information.

OF Received Rate panel

The rate of requests received by AGS from the selected Extend app.

ヒント

You can use this number to compare the number of requests you anticipate your Extend app to make with the number AGS receives. If there is a discrepancy, you can investigate the issue. If you're unable to determine the cause, you can reach out to AccelByte to support for help.

Service Extension dashboard

The Service Extension (SE) dashboard contains sections of panels that show performance metrics for your AGS Extend Service Extension app. See below for what panels are in each section and description of the information they convey.

SE Overview section

This section's panels show the app's general information and basic performance metrics.

See below for this section's specific panel information.

SE App Information panel

General app information.

  • App ID: The ID for this app, used in code.
  • App: The name of the app.
  • Game Namespace: What game namespace this app is under.
  • Extend Scenario: The type of AGS Extend app (event-handler, function-override, or service-extension)

SE App Creation Duration panel

How long in minutes it took for the selected app to be created.

SE Deployment Duration panel

Information related to the select app's image deployments. Each item represents a new image version deployed.

  • deployment_time: The date and time the image was deployed.
  • deployment_id: The ID for the deployed image.
  • deployment_duration: How long in seconds it took for this image to deploy.
ヒント

You can use the information on this panel to troubleshoot image deployment problems.. For example, if performance started degrading when a specific image was deployed, or the deployment_duration of a specific image took longer than other deployments, it could be worth investigating if there's something wrong with that specific image. If you can't find anything wrong with the image, you can reach out to AccelByte support for help.

SE Failed Deployment (count) panel

The number of image deployment attempts that were unsuccessful.

ヒント

A high number of failed deployment attempts could indicate something wrong with your image deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE Timeout Deployment (count) panel

The number of image deployment attempts that took longer than the deployment timeout duration limit would allow.

ヒント

If there is a pattern of images taking too long to deploy, there could be inefficient logic in your deployment process. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE Pod Count & Limit panel

The number of pods that have been created for the selected Extend app, and the max number of pods that can be created. Pods increase the resources available for the app. A new pod is created when this app's CPU or memory utilization exceeds 80%.

ヒント

If your pod count is reaching the pod limit, you can reach out to AccelByte support to discuss increasing your pod limit, or to get help in decreasing your resource utilization.

SE Container OOMKilled per Pod panel

Out of Memory Killed (OOMKilled). A list of pods that have been killed or stopped due to running out of memory.

SE Total Service Error Logs panel

A graph showing the number of error logs generated in the past 12 hours. Hover over a time to see how many logs were generated at that time.

ヒント

You can use this to see when more errors started occurring. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE Service Error Logs panel

The last 20 error logs that were generated.

ヒント

This will show you the most recent error logs generated, which can help you resolve current issues. If you're unable to find the source of the problem, you can reach out to AccelByte support for help.

SE HTTP section

This section's panels show metrics related to HTTP request and responses.

See below for this section's specific panel information.

SE Request Success Rate/5m (Linkerd) panel

A graph showing the rate of successful HTTP requests per five minutes from the selected Extend app to the AGS backend.

SE Response Latency panel

A graph showing the delay in milliseconds from when an HTTP request is received by the AGS backend from the selected Extend app and when AGS sends a response.

ヒント

A large delay in response time may indicate a temporary slowdown of the AGS backend. If this occurs, reach out to AccelByte support for more information.

SE 2xx/5m (Linkerd) panel

Number of completed HTTP responses with a status code between 200-299 (successfully completed).

SE 4xx/5m (Linkerd) panel

Number of completed HTTP responses with a status code between 400-499 (client error responses), indicating an issue with your selected Extend app.

ヒント

These errors indicate an issue with the client (your Extend app). Troubleshoot your app to see what might be causing the error. If you're unable to determine the cause, reach out to AccelByte support for help.

SE 5xx/5m (Linkerd) panel

Number of completed HTTP responses with a status code between 400-499 (server error responses), indicating a temporary issue with the AGS backend.

ヒント

These errors indicate a temporary issue with the AGS backend. If this occurs, reach out to AccelByte support for more information.