This is Snowflake specific; You can create a table definition directly from the file you want to ingest by inferring the file structure then building a template to create the table:
CREATE OR REPLACE TABLE RAW.SAMPLE_TABLE USING TEMPLATE
(
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
FROM TABLE(
INFER_SCHEMA(
LOCATION => @SANDBOX_STAGE/sample_data.parquet,
FILE_FORMAT => 'SAMPLE_PARQUET_FORMAT'
)
)
);
Looks simple enough but lets take a closer look at what we are actually doing. A few keywords might stand out to you: USING TEMPLATE
, ARRAY_AGG, OBJECT_CONSTRUCT
, and INFER_SCHEMA. Lets see what each of those do and how they can work together.
CREATE OR REPLACE TABLE RAW.SAMPLE_TABLE(
id int
name varchar
region varchar
)
What does INFER_SCHEMA
do? It does what it sounds like, it detect the file schema for semi-structured data. The file needs to be housed in staged Snowflake area and the appropriate file format for the file needs to be defined
-- Create a file format that sets the file type as Parquet.
CREATE FILE FORMAT my_parquet_format
TYPE = parquet;
-- Query the INFER_SCHEMA function.
SELECT *
FROM TABLE(
INFER_SCHEMA(
LOCATION=>'@mystage'
, FILE_FORMAT=>'my_parquet_format'
)
);
+-------------+---------+----------+---------------------+--------------------------+----------+
| COLUMN_NAME | TYPE | NULLABLE | EXPRESSION | FILENAMES | ORDER_ID |
|-------------+---------+----------+---------------------+--------------------------|----------+
| continent | TEXT | True | $1:continent::TEXT | geography/cities.parquet | 0 |
| country | VARIANT | True | $1:country::VARIANT | geography/cities.parquet | 1 |
| COUNTRY | VARIANT | True | $1:COUNTRY::VARIANT | geography/cities.parquet | 2 |
+-------------+---------+----------+---------------------+--------------------------+----------+
What about OBJECT_CONTRUCT
? Builds an object meaning whatever it read in it create object representation of that data.
SELECT OBJECT_CONSTRUCT('a',1,'b','BBBB', 'c',null);
+----------------------------------------------+
| OBJECT_CONSTRUCT('A',1,'B','BBBB', 'C',NULL) |
|----------------------------------------------|
| { |
| "a": 1, |
| "b": "BBBB" |
| } |
+----------------------------------------------+
And ARRAY_AGG
? Can combine a result set into an array
SELECT O_ORDERKEY AS order_keys
FROM orders
WHERE O_TOTALPRICE > 450000
ORDER BY O_ORDERKEY;
+------------+
| ORDER_KEYS |
|------------|
| 41445 |
| 55937 |
| 67781 |
| 80550 |
| 95808 |
| 101700 |
| 103136 |
+------------+
How does combining the work? This is where USING TEMPLATE
comes into play. This is special syntax to make use of the INFER_SCHEMA
function
CREATE OR REPLACE TABLE RAW.SAMPLE_TABLE USING TEMPLATE
(
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
FROM TABLE(
INFER_SCHEMA(
LOCATION => @SANDBOX_STAGE/sample_data.parquet,
- [ ] FILE_FORMAT => 'SAMPLE_PARQUET_FORMAT'
)
)
);