Skip to main content

Load Data from Amazon S3 using COPY

tip

Expected time: 5 minutes ⏱

Databend COPY can read object files(CSV or Parquet format) from Amazon S3 buckets, To understand Amazon S3’s core concepts please read Amazon S3 documentation.

Before you begin

  • AWS Account: This Quickstart uses Amazon S3 and requires an AWS account’s access key id and secret access key.
  • Databend: You will connect to the database and using COPY to pull data from your Amazon S3 bucket, please see How to deploy Databend.

Part 1: Creating an Amazon S3 Bucket and Adding a File

  1. On your local machine, create a text file with the following CSV contents and name it books.csv:
books.csv
Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004

This CSV file field delimiter is , and the record delimiter is \n.

  1. In S3 create a bucket and upload books.csv to the bucket.

Part 2: Creating a Database and Table using COPY

create database book_db;
use book_db;
create table books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);

Now that the database and table have been created. In Part 1 of this Quickstart, you uploaded the books.csv file to your bucket. To use the COPY data loading, you will need the following information:

  • The name of the S3 URI(s3://bucket/to/path/), such as: s3://databend-bohu/data/
  • Your AWS account’s access keys, such as:
    • Access Key ID: your-access-key-id
    • Secret Access Key: your-secret-access-key

Using this URI and keys, execute the following statement, replacing the placeholder values with your own:

copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0);

Now, let's check the data to make sure data has actually loaded:

mysql> select * from books;
+------------------------------+----------------------+-------+
| title | author | date |
+------------------------------+----------------------+-------+
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+----------------------+-------+
tip

If the file(s) is large and we want to check the file format is ok to parse, we can use the SIZE_LIMIT:


copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0)
size_limit = 1; -- only load 1 rows