Working with parquet
queriesIn Azure Synapse Analytics dedicated SQL pools, and Analytics Platform System, PolyBase
can consume a maximum of 33,000 files per folder when running 32 concurrent PolyBase
queries. This maximum number includes both files and subfolders in each HDFS folder. If
the degree of concurrency is less than 32, a user can run PolyBase queries against folders
in HDFS that contain more than 33,000 files. We recommend that users of Hadoop and
PolyBase keep file paths short and use no more than 30,000 files per HDFS folder. When
too many files are referenced, a JVM out-of-memory exception occurs.
In serverless SQL pools, external tables can’t be created in a location where you currently
have data. To reuse a location that has been used to store data, the location must be
manually deleted on ADLS. For more limitations and best practices, see
Filter optimization
best practices.
In Azure Synapse Analytics dedicated SQL pools, and Analytics Platform System, when
selects from an RCFile, the column values in the RCFile must not
contain the pipe (
) character.
SET ROWCOUNT (Transact-SQL)
has no effect on CREATE EXTERNAL TABLE AS SELECT. To
achieve a similar behavior, use
TOP (Transact-SQL).
Review
Naming and Referencing Containers, Blobs, and Metadata
for limitations on file names.
The following characters present in data can cause errors including rejected records with
to Parquet files.
In Azure Synapse Analytics and Analytics Platform System, this also applies to ORC files.
(quotation mark character)
To use
containing these characters, you must first run the
statement to export the data to delimited text files where
you can then convert them to Parquet or ORC by using an external tool.
CREATE
EXTERNAL TABLE AS SELECT
|
CREATE EXTERNAL TABLE AS SELECT
|
"
\r\n
\r
\n
CREATE EXTERNAL TABLE AS SELECT
CREATE EXTERNAL TABLE AS SELECT