Development

Guide to Transferring Data from AWS S3 to Snowflake

In today’s data-driven landscape, seamless integration between platforms like AWS S3 and Snowflake is essential for efficient data storage, management, and analysis. AWS S3 serves as a reliable storage solution, while Snowflake is widely used for data processing and analytics. This guide will walk you through the steps for transferring data from AWS S3 to Snowflake, covering the setup of IAM policies, storage integrations, stages, tables, and Snowflake pipes for automated data ingestion.

Step 1: Setting Up AWS IAM Policies

To grant the necessary permissions for accessing S3 resources, you’ll start by creating an AWS IAM policy. The following sample policy provides permissions for reading, writing, and managing objects in your S3 bucket:

JSON
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole",
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::your-s3-bucket-name",
                "arn:aws:s3:::your-s3-bucket-name/*",
                "roleArnPlaceholder"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::your-s3-bucket-name",
            "Condition": {
                "StringLike": {
                    "s3:prefix": "*"
                }
            }
        },
        {
           "Effect": "Allow",
           "Action": "sts:AssumeRole",
           "Resource": "roleArnPlaceholder"
        }
    ]
}

Step 2: Create and Assign IAM Role

Next, create a new IAM role in AWS and attach the above policy. This role will enable Snowflake to access your S3 bucket. Once created, copy the role’s ARN and replace roleArnPlaceholder in the policy above.

Step 3: Create a Storage Integration in Snowflake

Storage integrations in Snowflake allow for smooth connectivity to external data sources like S3. Here’s how to create a storage integration in Snowflake:

SQL
CREATE OR REPLACE STORAGE INTEGRATION AwsData_Integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'roleArnPlaceholder'
STORAGE_ALLOWED_LOCATIONS = ('s3://your-s3-bucket-location');

Remember to replace roleArnPlaceholder with the ARN of the IAM role created in the previous step.

Step 4: Establish Trust Using an External ID in AWS

To secure the Snowflake-to-S3 connection, configure an External ID in the AWS IAM role’s trust relationship. Retrieve this ID by describing the integration in Snowflake:

SQL
DESC INTEGRATION AwsData_Integration;

Then, add the External ID to the trust policy in AWS as follows:

JSON
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "your-aws-user-arn"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "externalIdValue"
                }
            }
        }
    ]
}

Replace your-aws-user-arn and externalIdValue with the actual values from Snowflake.

Step 5: Configure a Stage in Snowflake

Stages in Snowflake serve as intermediary storage locations for data loading from external sources. Configure a stage using the storage integration created earlier, specifying the URL of your S3 bucket and file format:

SQL
CREATE OR REPLACE STAGE AwsData_Stage
STORAGE_INTEGRATION = AwsData_Integration
URL = 's3://your-s3-bucket-location'
FILE_FORMAT = Database.Schema.FileFormat;

Grant the necessary usage permissions on this stage:

SQL
GRANT USAGE ON STAGE AwsData_Stage TO ROLE ACCOUNTADMIN;

Step 6: Create a Table in Snowflake

Now, define a Snowflake table to store the data transferred from S3. Structure the table to match the schema of your incoming data:

SQL
CREATE OR REPLACE TABLE TableForSnowflake (
    Id INTEGER NOT NULL PRIMARY KEY,
    Data1 BIGINT NOT NULL,
    Data2 VARCHAR(255) NOT NULL,
    Data3 VARIANT NOT NULL
);

Step 7: Set Up Data Pipes for Continuous Data Ingestion

To enable continuous data ingestion from the S3 stage, set up a Snowflake pipe. This configuration allows data to flow from the stage into your table automatically.

SQL
CREATE OR REPLACE PIPE Database.Schema.DataPipe AUTO_INGEST = TRUE AS
COPY INTO TableForSnowflake (
    Id,
    Data1,
    Data2,
    Data3    
)
FROM (
    SELECT
        $1:Id::INTEGER AS Id,
        $1:Data1::BIGINT AS Data1,
        $1:Data2::VARCHAR(255) AS Data2,
        $1:Data3::VARIANT AS Data3
    FROM @AwsData_Stage
);

Summary

Following these steps will set up a secure and automated data pipeline from AWS S3 to Snowflake. By using IAM policies, Snowflake storage integration, stages, tables, and data pipes, you can streamline data transfers, enabling effective data management and analysis across both platforms. This setup not only improves data handling efficiency but also supports real-time insights for data-driven decision-making.

Shares: