6
● Partitioned tables - Most common type of partitioning which is based on TIMESTAMP
or DATE column. Data is written to a partition based on the date value in that column.
Queries can specify predicate filters based on this partitioning column to reduce the
amount of data scanned.
❖ You should use the date or timestamp column which is most frequently used in
queries as partition column.
❖ Partition column should also distribute data evenly across each partition. Make
sure it has enough cardinality.
❖ Also, note that the Maximum number of partitions per partitioned table is 4,000.
❖ Legacy SQL is not supported for querying or for writing query results to
partitioned tables.
● Sharded Tables - You can also think of shard tables using a time-based naming
approach such as [PREFIX]_YYYYMMDD and use a UNION while selecting data.
Generally, Partitioned tables perform better than tables sharded by date. However, if you
have any specific use-case to have multiple tables you can use sharded tables.
Ingestion-time partitioned tables can be tricky if you are inserting data again as part of some
bug fix. You can read a detailed comparison here.
Best Practices for High Performance ETL To BigQuery