In today’s data-driven landscape, seamless integration between platforms like AWS S3 and Snowflake is essential for efficient data storage, management, and analysis. AWS S3 serves as a reliable storage solution, while Snowflake is widely used for data processing and analytics. This guide will walk you through the steps for transferring data from AWS S3 to Snowflake, covering the setup of IAM policies, storage integrations, stages, tables, and Snowflake pipes for automated data ingestion.
Step 1: Setting Up AWS IAM Policies
To grant the necessary permissions for accessing S3 resources, you’ll start by creating an AWS IAM policy. The following sample policy provides permissions for reading, writing, and managing objects in your S3 bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sts:AssumeRole",
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": [
"arn:aws:s3:::your-s3-bucket-name",
"arn:aws:s3:::your-s3-bucket-name/*",
"roleArnPlaceholder"
]
},
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::your-s3-bucket-name",
"Condition": {
"StringLike": {
"s3:prefix": "*"
}
}
},
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "roleArnPlaceholder"
}
]
}
Step 2: Create and Assign IAM Role
Next, create a new IAM role in AWS and attach the above policy. This role will enable Snowflake to access your S3 bucket. Once created, copy the role’s ARN and replace roleArnPlaceholder
in the policy above.
Step 3: Create a Storage Integration in Snowflake
Storage integrations in Snowflake allow for smooth connectivity to external data sources like S3. Here’s how to create a storage integration in Snowflake:
CREATE OR REPLACE STORAGE INTEGRATION AwsData_Integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'roleArnPlaceholder'
STORAGE_ALLOWED_LOCATIONS = ('s3://your-s3-bucket-location');
Remember to replace roleArnPlaceholder
with the ARN of the IAM role created in the previous step.
Step 4: Establish Trust Using an External ID in AWS
To secure the Snowflake-to-S3 connection, configure an External ID in the AWS IAM role’s trust relationship. Retrieve this ID by describing the integration in Snowflake:
DESC INTEGRATION AwsData_Integration;
Then, add the External ID to the trust policy in AWS as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "your-aws-user-arn"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "externalIdValue"
}
}
}
]
}
Replace your-aws-user-arn
and externalIdValue
with the actual values from Snowflake.
Step 5: Configure a Stage in Snowflake
Stages in Snowflake serve as intermediary storage locations for data loading from external sources. Configure a stage using the storage integration created earlier, specifying the URL of your S3 bucket and file format:
CREATE OR REPLACE STAGE AwsData_Stage
STORAGE_INTEGRATION = AwsData_Integration
URL = 's3://your-s3-bucket-location'
FILE_FORMAT = Database.Schema.FileFormat;
Grant the necessary usage permissions on this stage:
GRANT USAGE ON STAGE AwsData_Stage TO ROLE ACCOUNTADMIN;
Step 6: Create a Table in Snowflake
Now, define a Snowflake table to store the data transferred from S3. Structure the table to match the schema of your incoming data:
CREATE OR REPLACE TABLE TableForSnowflake (
Id INTEGER NOT NULL PRIMARY KEY,
Data1 BIGINT NOT NULL,
Data2 VARCHAR(255) NOT NULL,
Data3 VARIANT NOT NULL
);
Step 7: Set Up Data Pipes for Continuous Data Ingestion
To enable continuous data ingestion from the S3 stage, set up a Snowflake pipe. This configuration allows data to flow from the stage into your table automatically.
CREATE OR REPLACE PIPE Database.Schema.DataPipe AUTO_INGEST = TRUE AS
COPY INTO TableForSnowflake (
Id,
Data1,
Data2,
Data3
)
FROM (
SELECT
$1:Id::INTEGER AS Id,
$1:Data1::BIGINT AS Data1,
$1:Data2::VARCHAR(255) AS Data2,
$1:Data3::VARIANT AS Data3
FROM @AwsData_Stage
);
Summary
Following these steps will set up a secure and automated data pipeline from AWS S3 to Snowflake. By using IAM policies, Snowflake storage integration, stages, tables, and data pipes, you can streamline data transfers, enabling effective data management and analysis across both platforms. This setup not only improves data handling efficiency but also supports real-time insights for data-driven decision-making.