Snowflake & AWS Secure Integration Framework via Integration Objects

Shreyas MS
9 min readJul 9, 2023

--

Snowflake, as we all know is a data warehouse built for the cloud. It allows the users to read/write the data from/to cloud storage. So it is essential to have secure integration between the platforms. When it comes to security (In AWS), we should always Apply least-privilege permissions & follow best practices.

Users need programmatic access if they want to interact with AWS outside of the AWS Management Console. So in order to make programmatic access, the credential like AWS access keys (AWS key ID and AWS secret keys) is required. It is not a good practice to hardcode those keys in the program to establish the connection.

In this blog we would discuss about aspect of creating a secured framework that allows Snowflake to seamlessly and securely integrate with AWS.

To ingest data stored in AWS S3 into Snowflake, you can use either direct COPY INTO statements or a Snowpipe for automation. Both options require a Snowflake object called an external stage to connect Snowflake to AWS.

This is where “Storage Integration” object in Snowflake comes into picture & snowflake recommends to use this for authentication instead of using AWS secret keys. Some key advantages of using it are :

  • A single storage integration can support multiple external stages.
  • Easy, available, and fast connections between data stores.
  • Secure connection without the need of AWS credentials (Secret Key)

Pre-requisites

This article assumes that the following objects are already configured:

An AWS & Snowflake Account.

A Snowflake user with ACCOUNTADMIN role or another role with granted CREATE INTEGRATION privileges

Lets setup the environment to achieve the integration between Snowflake & AWS by configuring some of the objects & resources in both Snowflake & AWS Cloud.

The steps needed to configure this:
1. Create the IAM Policy in AWS.
2. Create the IAM Role in AWS.
3. Create the S3 bucket in AWS & Update the AWS IAM Policy.
4. Create a Storage Integration object on Snowflake.
5. Enable the Trust Relationship in AWS IAM Role.

  1. Create the IAM Policy in AWS

The following step-by-step instructions describe how to configure access permissions for Snowflake in your AWS Management Console using IAM Policy
Log into the AWS Management Console, navigate to Identity & Access Management (IAM), choose Policies from the left-hand navigation pane and click on the Create Policy button.

— Navigate to JSON tab & enter the below JSON code in the editor window, then click Next and again click Next to move to preview window.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket>",
"Condition": {
"StringLike": {
"s3:prefix": [
"<prefix>/*"
]
}
}
}
]
}

— In the preview window, provide the meaningful policy name & click on Create Policy

So we have completed the Step 1 & will move on Step 2 i.e. Creating the IAM Role.

2. Create the IAM Role in AWS

In this step we create an IAM Role with “Trusted entity” to be selected as “AWS account”, later this role would be used to connect with Snowflake.
— In the AWS Console, navigate to Identity & Access Management (IAM) & choose Roles from the left-hand navigation pane then Click the Create role button.
— Select AWS account as trusted entity type.
— select this account(****) as the AWS account
— Select the Require external ID option. Enter a dummy ID such as “12345”. Later, we will modify the trusted relationship and specify the external ID for the Snowflake stage. An external ID is required to grant access to your AWS resources (i.e. S3) to a third party (i.e. Snowflake).

— Click Next, locate the policy (in my case “Snowflake-Access-Policy” ) which we created in the Step 1.

— Click next for preview window, provide Role Name & create the role.

So we have completed the Step 1 & 2, now will move on Step 3 i.e. Creating a S3 bucket in AWS.

3. Create the S3 Bucket in AWS & Update the AWS IAM Policy

In this step we create a standard bucket in AWS by using S3 service.
— In the AWS console, Navigate to S3 window and click on create bucket.
— Below is the snapshot of the S3 console where we create the bucket, and for our use case I have kept all the default settings except AWS Region, which I have selected as “US East (N. Virginia) us-east-1” because my Snowflake Account is hosted in that region.

— Once the S3 bucket is created, I have uploaded the sample CSV files having sample data. This can be any data which we want to upload into the snowflake database tables. In my case I have uploaded the data which is of Tabular Format in the CSV files.

— After creating the AWS S3 Bucket, we have to update the AWS Policy JSON Code by providing the S3 Bucket details in it. So that snowflake can access only the specific bucket and can perform controlled operation on the bucket. We discussed this in Step1 while creating the IAM Policy.
— Log into AWS Management Console → navigate to Identity & Access Management (IAM) → Choose Policy from the left-hand navigation pane → Locate the Policy created in the step 1 & click on it.
— In the permissions tab, click on the Edit Policy button and modify the JSON code (highlighted in yellow) by providing the actual AWS — S3 Bucket name which we created earlier (in my case it is “snowflake-s3-bucket-5710”).

— So we have successfully created the bucket & uploaded the sample files in the Step 3. Now we will move onto Step 4 i.e. Creating a Storage Integration object on snowflake

4. Create a storage integration object on Snowflake.

A Storage Integration is a Snowflake object that stores a generated Identity and Access management (IAM) entity for your external cloud storage. This option allows users to avoid supplying credentials when creating stages or when loading or unloading data.
— The below is the syntax for the Storage Integration object

CREATE STORAGE INTEGRATION <integration_name>
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = '<iam_role>'
STORAGE_ALLOWED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/') ;

Before creating the Storage Integration object, we need to obtain some information & use those information while creating the Storage Integration object like, STORAGE_AWS_ROLE_ARN & STORAGE_ALLOWED_LOCATIONS information. These information we can get it from the AWS console.
— Login to AWS Management Console → navigate to Identity & Access Management (IAM) → Choose Roles from the left-hand navigation pane → Locate the role created in the step 2 & click on it.
— In the IAM Roles window we can get the information about the AWS_ROLE_ARN (highlighted in yellow), copy the information and paste it against the STORAGE_AWS_ROLE_ARN in the Storage Integration syntax.

Similarly get the AWS S3 bucket name
— In the AWS Management Console → navigate to S3 window → Choose Buckets from the left-hand navigation pane → Click on the S3 bucket which was created in the Step 3 & paste it against the STORAGE_ALLOWED_LOCATIONS in the Storage Integration syntax.

After mapping the values against the Storage Integration parameters the Integration looks like something like below.

So we are done with creating the Storage Integration object. Post this we need to enable the “Trust Relationship” with AWS to establish the connection.

5. Enable Trust Relationship in AWS IAM Role

Run the below command to retrieve the ARN for the AWS IAM user that was created automatically for your Snowflake account.

 desc integration aws_int;

Now will map these information (highlighted in yellow) in the AWS IAM Role which we created in Step2.
— Log into the AWS Management Console → navigate to Identity & Access Management (IAM) window.
— Choose Roles from the left-hand navigation pane → Locate the Role which was created in the Step 2 and click on it.
— Navigate to Trust relationships tab. → Click the Edit trust policy button.

Change the values as described below and Update the Role Trust Policy. These values we have copied from snowflake in the above step.

AWS → STORAGE_AWS_IAM_USER_ARN
ExternalId → STORAGE_AWS_EXTERNAL_ID

Note: While creating the Role in Step 2 we have provided Account ID value to AWS account & some dummy value to External ID. Now we are editing those values & providing the ARN & External ID value respectively. So that connection between the Snowflake & AWS will be established.

At the end, the trust policy looks something like below

After completing the Steps 1 to 5 now we are all set and we can start using the “storage integration object” to load and unload the data from Snowflake to AWS S3. In the next section we will see some example by creating the snowflake tables and upload the data to it using the Storage Integration which we created.

Load the data from the file available in the S3 bucket(external stage).

  1. Create the File Format object as given below in the snowflake. Remember I have uploaded the tabular table data in the form of CSV into the S3 bucket in the Step 3, so I have chosen a CSV format here.
CREATE OR REPLACE file format AWS_S3_CSV_Fileformat
type = csv
field_delimiter = ','
skip_header = 1
null_if = ('NULL','null')
empty_field_as_null = TRUE
FIELD_OPTIONALLY_ENCLOSED_BY='"';

2. Create the named External Stage using the storage integration object and AWS — S3 Bucket which we created in earlier.

CREATE or replace STAGE my_s3_stage_retail
STORAGE_INTEGRATION = aws_int
URL = 's3://snowflake-s3-bucket-5710/'
FILE_FORMAT = AWS_S3_CSV_Fileformat;

3. After creating the named External Stage we can run the below script to list out all the CSV files available in the AWS — S3 bucket.

list @my_s3_stage_retail;

Note: If the policy configuration step and creating trust relationship with AWS Role steps is not created properly, then we will end up with access denied error (shown below). So if you are getting this error please validate those steps once again.

4. Once we are able to list all the files in AWS — S3 without any error then use the below commands to load the data from the S3 files into Snowflake standard tables.

Note: Before loading the data into the snowflake tables we need to have the Tables Structure created in the Snowflake. Code to create the tables are available in this GitHub link.

COPY INTO REGIONS FROM @my_s3_stage_retail pattern ='REGIONS.*' FORCE = TRUE;
COPY INTO COUNTRIES FROM @my_s3_stage_retail pattern ='COUNTRIES.*' FORCE = TRUE;

We have successfully loaded the data into the Snowflake tables from the CSV file available in the AWS S3 bucket (External Stage).

To Summarize:

When we are dealing with cloud based platforms, then security is most important aspect and we should adhere to best security practices. By using the Snowflake Storage Integration object we don’t have to embedded or share the Secret keys to establish the connections to the external Cloud Storages.

Thank You for reading, I hope this blog will help you getting the understanding of Snowflake integration process with AWS S3.

All the scripts used in this blog are available at this GitHub Link.

--

--

Shreyas MS

Data Engineer by Profession | Data & Cloud Enthusiast - Snowflake | AWS | Connect - linkedin.com/in/shreyas-ms-48661533