Snowflake - External Data Loading


Snowflake supports cloud storage from client side as well. It means that client can have data in their clouds, and they can load into Snowflake by referring the location. As of now, Snowflake supports 3 clouds – AWS S3, Microsoft Azure and Google Cloud Platform Location. These are known as External Stages. However, Snowflake provides snowflake managed stages those are known as Internal Stages.

External Stages are client-side location where internal stages are used when user working with their local system directory.

To upload data from external clouds, the following set up is required −

In this chapter, we will discuss about how to set up these requirements and load the data into tables.

Using UI

To create external stage, follow the instructions shown below −

Login into Snowflake. Click the Databases present at the top ribbon. In the database view, click on database name as TEST_DB. Now, click the Stages tab. Now, click the Create button present at top as shown in the following screenshot −

External Storage

It pops up Create Stage dialog box, select amazon|s3 in the list and click on the Next button as shown below −

Create Stage Dialog Box

It will go to the next screen where the user should enter the following details −

After providing these details, click the Finish button. The following screenshot describes the above steps −

Create Stage

User can see newly created external stage in the View panel.

Using SQL

To create the external stage using SQL is very easy. Just run the following query providing all details as Name, AWS Key, Password, Master Key, it will create the stage.

CREATE STAGE "TEST_DB"."TEST_SCHEMA_1".Ext_S3_stage URL = 's3://***/***** 
CREDENTIALS = (AWS_KEY_ID = '*********' AWS_SECRET_KEY = '********') ENCRYPTION = (MASTER_KEY = '******');

File format defines the structure of the uploaded file into S3. If the file structure doesn't match with the table structure, then loading will be failed.

Using UI

To create File Format, follow the instructions given below.

Login into Snowflake. Click Databases present at the top ribbon. In database view, click on the database name as TEST_DB. Now, click the File Format tab. Now, click on Create button present at top. It will pop up the Create File Format dialog box. Enter the following details −

Other things can be left as it is. Click the Finish button after entering the details. The following screenshot displays the above details −

File Format Process

User will be able to see created file format in view panel.

Using SQL

To create the file format using SQL is very easy. Just run the following query by providing all details as below.

CREATE FILE FORMAT "TEST_DB"."TEST_SCHEMA_1".ext_csv TYPE = 'CSV' COMPRESSION = 'AUTO' 
FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY = 
'NONE' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' 
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');

Load data from S3

In this chapter, we will discuss how to set up all required parameters like Stages, File Format, Database to load data from S3.

User can run the following query to see what all files present in the given stage −

LS @<external_stage_name>

Now, to load the data, run the following query −

Syntax

COPY INTO @<database_name>.<schema_name>.<table_name>
FROM @<database_name>.<schema_name>.<ext_stage_name>
FILES=('<file_name>')
FILE_FORMAT=(FORMAT_NAME=<database_name>.<schema_name>.<file_format_name>);

Example

COPY INTO @test_db.test_schema_1.TEST_USER
FROM @test_db.test_schema_1.EXT_STAGE
FILES=('data.csv')
FILE_FORMAT=(FORMAT_NAME=test_db.test_schema_1.CSV);

After running the above query, user can verify data into table by running the following simple query −

Select count(*) from Test_Table

If the user wants to upload all files present in external stage, no need to pass "FILES=(<file_name>)"