Skip to main content

Introduction to the Data Connector Service

Last updated on September 24, 2024
note

This Data Connector service is only available on the AccelByte Intelligence Service (AIS). You can refer to Analytics Data Connector if you want to utilize the AGS version.

Overview

AccelByte Intelligence Service includes a data connector service, which is a data streaming solution designed to facilitate seamless and efficient data transfer of the service telemetry or custom telemetry to your chosen data warehouses.

  • Amazon S3: Scalable object storage provided by Amazon Web Services.
  • Amazon Redshift: Data warehouse product that forms part of the larger cloud-computing platform Amazon Web Services.
  • Snowflake: Cloud-hosted, relational database that enables you to build data warehouses on demand.
  • GameSight: The marketing platform for video game publishers to build community, engage influencers, and measure their digital marketing performance.

In this section, you will learn about the key concepts of the Data Connector service, supported features, limitations, and best practices to utilize the service.

Key concepts

To make your integration as seamless as possible, it is important to understand some of the key concepts that are used in the design of this service.

Event types

Two types of events can be streamed to the data warehouse:

  1. Service Telemetry: system-generated events from AccelByte services.
  2. Custom Telemetry: custom telemetry events that are sent from game clients.

Flush configuration

  1. Flush Interval: The maximum time interval in minutes that the data should be periodically streamed to the data warehouse. The flush interval range is between one and 15 minutes
  2. Flush Size: The maximum number of events that should be streamed into the data warehouse. The flush size range is 100 and 1,000.

The data will be streamed depending on which condition is reached first between Flush Interval or Flush Size.

S3 path format configuration

When configuring your Amazon S3 destination, you will need to specify the S3 path where the streamed data will be stored. The S3 path includes both the bucket name and the directory structure within the bucket.

Data Connector service allows for customization using placeholders such as timestamps, event types, or other variables.

For detailed instructions and examples on configuring the S3 path, please refer to our S3 path format configuration guide.

Redshift and Snowflake table model

When setting up data streaming from AccelByte services or game telemetry services to Redshift or Snowflake using Connector, you have the option to choose between two types of table models: Single and Mapping.

1. Single table model

In the single table model, all events are inserted into one table based on the event type. This approach is particularly useful for scenarios where you want to consolidate data into a single table for easier analysis and reporting. If you choose the Single Table Model, the events will be streamed into a table in the public schema.

Example topics:

  • analytics_game_telemetry.dev.lightfantastic.gameStarted
  • analytics_game_telemetry.dev.lightfantastic.gameEnded

Expected schema and table:

  • public.game_telemetry_dev

2. Mapping table model

In the mapping table model, events are inserted into multiple tables based on the topics. This enables you to have separate tables for each topic, allowing for a more granular organization of your data.

This approach is suitable when you want to maintain distinct tables for different types of events.

Example topics:

  • analytics_game_telemetry.dev.lightfantastic.gameStarted
  • analytics_game_telemetry.dev.lightfantastic.gameEnded

Expected schema and tables:

  • lightfantastic.gameStarted
  • lightfantastic.gameEnded
note

If you opt for the single table model, all events will be streamed into the public schema. This provides a straightforward way to manage and query your data in Redshift or Snowflake.

On the other hand, the mapping table model allows you to organize your data into separate tables based on the topics, offering a more structured approach to data storage and retrieval.

Redshift and Snowflake column flatten

When configuring the Connector for Redshift or Snowflake, you have the option to enable or disable the Column Flatten feature. This feature determines how events are inserted into your database tables based on their structure.

Disable column flattening

When the column flattens feature is disabled, all the event properties are inserted into a single column named events. The entire event payload is stored as a JSON object within this column.

Example event:

{
"EventNamespace": "lightfantastic",
"EventTimestamp": "2023-07-20T03:30:00.036483Z",
"EventId": "d110582c54804a29ab1d95650ca4c644",
"Payload": {
"winning": true,
"hero": "Captain America",
"kill": 9,
"network": 912.27,
"item": [
{
"name": "vibranium shield",
"defense": 10,
"attack": 1
},
{
"name": "mjolnir hammer",
"defense": 1,
"attack": 9
}
]
},
"EventName": "gameEnded"
}

Expected column:

events
{"EventNamespace":"lightfantastic","EventTimestamp":"2023-07-20T03:30:00.036483Z","EventId":"d110582c54804a29ab1d95650ca4c644","Payload":{"winning":true,"hero":"Captain America","kill":9,"network":912.27,"item":[{"name":"vibranium shield","defense":10,"attack":1},{"name":"mjolnir hammer","defense":1,"attack":9}]},"EventName":"gameEnded"}

Enable column flattening

When you enable the column flatten feature, each event's properties will be inserted into separate columns in the database table. This offers a more granular and structured representation of the data.

Here's how the event you provided would be inserted into columns:

Expected column:

eventideventnamespaceeventtimestampeventnamepayload_itempaylaod_killpayload_winningpayload_networkpayload_hero
d110582c54804a29ab1d95650ca4c644lightfantastic2023-07-20T03:30:00.036483ZgameEnded[{"defense":10,"attack":1,"name":"vibranium shield"},{"defense":1,"attack":9,"name":"mjolnir hammer"}]9true912.27Captain America
note

Column flatten feature cannot be applied to the single table model, as each event may have different payload structures, which could result in a large number of columns.

GameSight API Key

To be able to stream the data from AIS to GameSight, you will need the GameSight API Key. To get the GameSight API Key, you can read this documentation.

Filtering

Data Connector service offers advanced filtering capabilities that allow clients to selectively stream data. Clients can specify particular namespaces and Kafka topics they wish to stream from service telemetry or custom telemetry services. This feature ensures that only the relevant data is transferred to the chosen data warehouses, optimizing storage and reducing unnecessary data transfer.

Security

S3 bucket policy

To enhance security compliance, you should implement an appropriate S3 bucket policy for the destination bucket where data will be stored. The bucket policy should restrict access to authorized users and systems only.

For detailed instructions on implementing the S3 bucket policy, please refer to our S3 Bucket Policy Configuration.

Redshift IAM role authentication

Redshift IAM role authentication provides a secure way to establish connections between Connector and your Amazon Redshift cluster using AWS IAM roles. This approach allows you to grant temporary access to Connector to execute specific actions on your Redshift resources.

For detailed instructions on implementing the Redshift IAM role authentication, please refer to our Redshift IAM Role Authentication Configuration.

Snowflake key pair authentication & key pair rotation

When setting up Snowflake as a destination for your data warehouse with Connector, you have the option to use key pair authentication for enhanced security. This method involves the use of public and private key pairs to establish secure connections between Connector and your Snowflake environment. Additionally, it's crucial to consider regular key pair rotation as a best practice to maintain a high level of security.

Key pair authentication

Key pair authentication introduces an additional layer of security to your data streaming setup. In this method, Snowflake generates a pair of keys: a public key and a private key.

The public key is shared with the Connector, while the private key is securely stored within the Data Connector service.

When Connector initiates a connection, it uses the public key to encrypt data and uses its private key to decrypt the data.

Key pair rotation

For robust security maintenance, implementing a regular key pair rotation strategy is vital. This involves generating new public and private key pairs at scheduled intervals. By periodically updating the keys used for authentication, you substantially reduce the risk of unauthorized access in case any keys are compromised.

Limitations

Editing the Connector configuration

Once the Connector configuration is successfully created, the event type (source) and data warehouse (destination) configurations cannot be edited. It's important to ensure correct configurations during the initial setup.

Pausing the Connector

The events are temporarily stored in Kafka for one to seven days, depending on the Kafka cluster configuration. If the Connector is paused for longer than the data retention policy, there is a risk of data loss.

Data Duplication

The connector service prioritizes data delivery, which may occasionally lead to duplicate events in your storage system (S3, Snowflake, or Redshift).

Each event is identified by a unique id field, and duplicate events will have the same id. These duplicates may occur due to network issues or system restarts.

To maintain data integrity, it's important to handle these duplicates on your end.

Best practices

The best scenario to use the Data Connector service is when you want to seamlessly stream data from AccelByte services or game telemetry services to data warehouses, such as Amazon S3, Snowflake, and Amazon Redshift.

This is especially beneficial for consolidating and analyzing data for reporting, analytics, and business intelligence purposes.