メインコンテンツまでスキップ

データコネクタサービス入門

Last updated on March 11, 2025
note

This Data Connector service is only available on the AccelByte Intelligence Service (AIS). You can refer to Analytics Data Connector if you want to utilize the AGS version.

Overview

AccelByte Intelligence Service includes a data connector service, which is a data streaming solution designed to facilitate seamless and efficient data transfer of the service telemetry or custom telemetry to your chosen data warehouses.

  • Amazon S3: Scalable object storage provided by Amazon Web Services.
  • Amazon Redshift: Data warehouse product that forms part of the larger cloud-computing platform Amazon Web Services.
  • Snowflake: Cloud-hosted, relational database that enables you to build data warehouses on demand.

In this section, you will learn about the key concepts of the Data Connector service, supported features, limitations, and best practices to utilize the service.

Shared Cloud Multi-Tenant Limitations

When using the Data Connector service in a shared cloud multi-tenant environment, the following limitations apply:

  1. Maximum Connectors: Limited to 2 connectors per game namespace.

  2. Flush Configuration: Custom flush configurations are not available. Default settings are:

    • Flush Interval: 1 minute
    • Flush Size: 300 events
    • Flush Memory: 500 KB
  3. S3 Path Format: Custom S3 path format configuration is not available. The default format is:

    {eventType}/realm/{realm}/namespaces/{namespace}/topics/{topic}/year={yyyy}/month={MM}/day={dd}/hour={hh}/minute={mm}/{topic}-{namespace}.json
  4. Redshift and Snowflake Restrictions: Only single table model is supported with no column flattening.

備考

The above limitations apply only to shared cloud multi-tenant environments.

Key concepts

To make your integration as seamless as possible, it is important to understand some of the key concepts that are used in the design of this service.

Event types

Two types of events can be streamed to the data warehouse:

  1. Service Telemetry: system-generated events from AccelByte services.
  2. Custom Telemetry: custom telemetry events that are sent from game clients.

Flush configuration

  1. Flush Interval: The maximum time interval in minutes that the data should be periodically streamed to the data warehouse. The flush interval range is between one and 5 minutes.
  2. Flush Size: The maximum number of events that should be streamed into the data warehouse. The flush size range is 100 and 1000.
  3. Flush Memory: The maximum memory size in kilobytes that will be used to store the data before it is streamed to the data warehouse. The flush memory range is between 100 and 1000 KB.

The data will be streamed depending on which condition is reached first between Flush Interval, Flush Size, or Flush Memory.

S3 path format configuration

When configuring your Amazon S3 destination, you will need to specify the S3 path where the streamed data will be stored. The S3 path includes both the bucket name and the directory structure within the bucket.

Data Connector service allows for customization using placeholders such as timestamps, event types, or other variables.

For detailed instructions and examples on configuring the S3 path, please refer to our S3 path format configuration guide.

Redshift and Snowflake table model

When setting up data streaming from AccelByte services or game telemetry services to Redshift or Snowflake using Connector, you have the option to choose between two types of table models: Single and Mapping.

1. Single table model

In the single table model, all events are inserted into one table based on the event type. This approach is particularly useful for scenarios where you want to consolidate data into a single table for easier analysis and reporting.

Example topics:

  • analytics_game_telemetry.dev.lightfantastic.gameStarted
  • analytics_game_telemetry.dev.lightfantastic.gameEnded

Expected schema and table:

  • public.game_telemetry_dev

2. Mapping table model

In the mapping table model, events are inserted into multiple tables based on the topics. This enables you to have separate tables for each topic, allowing for a more granular organization of your data.

This approach is suitable when you want to maintain distinct tables for different types of events.

Example topics:

  • analytics_game_telemetry.dev.lightfantastic.gameStarted
  • analytics_game_telemetry.dev.lightfantastic.gameEnded

Expected schema and tables:

  • lightfantastic.gameStarted
  • lightfantastic.gameEnded

Redshift and Snowflake table name format

The Table Name Format determines how table names are created in Redshift and Snowflake when using the mapping table model. There are two types of table name formats: Topic and Event.

Example topic format:

  • analytics_game_telemetry_dev_lightfantastic_gameStarted

Example event format:

  • gameStarted

Redshift and Snowflake column flatten

When configuring the Connector for Redshift or Snowflake, you have the option to enable or disable the Column Flatten feature. This feature determines how events are inserted into your database tables based on their structure.

Disable column flattening

When the column flattens feature is disabled, all the event properties are inserted into a single column named event. The entire event payload is stored as a JSON object within this column.

Example event:

{
"EventNamespace": "lightfantastic",
"EventTimestamp": "2023-07-20T03:30:00.036483Z",
"EventId": "d110582c54804a29ab1d95650ca4c644",
"Payload": {
"winning": true,
"hero": "Captain America",
"kill": 9,
"network": 912.27,
"item": [
{
"name": "vibranium shield",
"defense": 10,
"attack": 1
},
{
"name": "mjolnir hammer",
"defense": 1,
"attack": 9
}
]
},
"EventName": "gameEnded"
}

Expected column:

event
{"EventNamespace":"lightfantastic","EventTimestamp":"2023-07-20T03:30:00.036483Z","EventId":"d110582c54804a29ab1d95650ca4c644","Payload":{"winning":true,"hero":"Captain America","kill":9,"network":912.27,"item":[{"name":"vibranium shield","defense":10,"attack":1},{"name":"mjolnir hammer","defense":1,"attack":9}]},"EventName":"gameEnded"}

Enable column flattening

When you enable the column flatten feature, each event's properties will be inserted into separate columns in the database table. This offers a more granular and structured representation of the data.

Here's how the event you provided would be inserted into columns:

Expected column:

eventideventnamespaceeventtimestampeventnamepayload
d110582c54804a29ab1d95650ca4c644lightfantastic2023-07-20T03:30:00.036483ZgameEnded{"winning":true,"hero":"Captain America","kill":9,"network":912.27,"item":[{"name":"vibranium shield","defense":10,"attack":1},{"name":"mjolnir hammer","defense":1,"attack":9}]}
注記

Column flatten feature cannot be applied to the single table model.

Filtering

Data Connector service offers advanced filtering capabilities that allow clients to selectively stream data. Clients can specify particular namespaces they wish to stream from service telemetry or custom telemetry services. This feature ensures that only the relevant data is transferred to the chosen data warehouses, optimizing storage and reducing unnecessary data transfer.

Security

S3 bucket policy

To enhance security compliance, you should implement an appropriate S3 bucket policy for the destination bucket where data will be stored. The bucket policy should restrict access to authorized users and systems only.

To learn how to implement the S3 bucket policy, refer to the S3 Bucket Policy Configuration guide.

Server-side encryption with AWS Key Management Service (SSE-KMS)

The Data Connector service supports server-side encryption with AWS KMS keys (SSE-KMS) for enhanced data security when streaming to S3 buckets. This feature provides additional encryption control for your data at rest.

Enable SSE-KMS for your S3 Bucket

  1. Contact your AccelByte Account Manager and provide the KMS key ID.

  2. Ensure your KMS key policy grants these permissions:

    • kms:Decrypt
    • kms:GenerateDataKey
  3. In the AWS Management Console, update the Key Management Service (KMS) key policy to include the data connector IAM Role.

    {
    "Version":"2012-10-17",
    "Id":"key-default-1",
    "Statement":[
    {
    "Sid":"Enable IAM User Permissions",
    "Effect":"Allow",
    "Principal":{
    "AWS":[
    "arn:aws:iam::xxxxxxxxxxx:root",
    "arn:aws:iam::xxxxxxxxxxx:role/irsa_analytics_connector"
    ]
    },
    "Action":"kms:*",
    "Resource":"*"
    }
    ]
    }

Redshift IAM role authentication

Redshift IAM role authentication provides a secure way to establish connections between Connector and your Amazon Redshift cluster using AWS IAM roles. This approach allows you to grant temporary access to Connector to execute specific actions on your Redshift resources.

For detailed instructions on implementing the Redshift IAM role authentication, please refer to our Redshift IAM Role Authentication Configuration.

Snowflake key pair authentication & key pair rotation

When setting up Snowflake as a destination for your data warehouse with Connector, you have the option to use key pair authentication for enhanced security. This method involves the use of public and private key pairs to establish secure connections between Connector and your Snowflake environment. Additionally, it's crucial to consider regular key pair rotation as a best practice to maintain a high level of security.

Key pair authentication

Key pair authentication introduces an additional layer of security to your data streaming setup. In this method, Snowflake generates a pair of keys: a public key and a private key.

The public key is shared with the Connector, while the private key is securely stored within the Data Connector service.

When Connector initiates a connection, it uses the public key to encrypt data and uses its private key to decrypt the data.

Key pair rotation

For robust security maintenance, implementing a regular key pair rotation strategy is vital. This involves generating new public and private key pairs at scheduled intervals. By periodically updating the keys used for authentication, you substantially reduce the risk of unauthorized access in case any keys are compromised.

Limitations

Editing the Connector configuration

Once the Connector configuration is successfully created, the event type (source) and data warehouse (destination) configurations cannot be edited. It's important to ensure correct configurations during the initial setup.

Pausing the Connector

The events are temporarily stored in Kafka for one to seven days, depending on the Kafka cluster configuration. If the Connector is paused for longer than the data retention policy, there is a risk of data loss.

Data Duplication

The connector service prioritizes data delivery, which may occasionally lead to duplicate events in your storage system (S3, Snowflake, or Redshift).

Each event is identified by a unique id field, and duplicate events will have the same id. These duplicates may occur due to network issues or system restarts.

To maintain data integrity, it's important to handle these duplicates on your end.

Best practices

The best scenario to use the Data Connector service is when you want to seamlessly stream data from AccelByte services or game telemetry services to data warehouses, such as Amazon S3, Snowflake, and Amazon Redshift.

This is especially beneficial for consolidating and analyzing data for reporting, analytics, and business intelligence purposes.