场景介绍
在大数据导入场景下,使用默认方式导入数据较慢,本文介绍如何使用 unlogged table 特性提高数据导入速度。
风险提示
unlogged table 使用场景和存在的风险,参考文档[1]
- 可以接受数据丢失的风险,数据可以从其他源进行导入。
- unlogged table 通常用于中间结果,频繁变更的会话数据,或者是数据导入场景
下文来自官方文档,参考文档[2]: If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well.
运行环境
- PostgreSQL 12.9
- 数据文件大小:4GB,27450499 行
操作步骤
创建表
// 创建普通表
CREATE TABLE land_registry_price_paid_uk_1 (
TRANSACTION uuid,
price NUMERIC,
transfer_date date,
postcode text,
property_type CHAR ( 1 ),
newly_built boolean,
duration CHAR ( 1 ),
paon text,
saon text,
street text,
locality text,
city text,
district text,
county text,
ppd_category_type CHAR ( 1 ),
record_status CHAR ( 1 )
);
// 创建 unlogged table
CREATE unlogged TABLE land_registry_price_paid_uk_2 (
TRANSACTION uuid,
price NUMERIC,
transfer_date date,
postcode text,
property_type CHAR ( 1 ),
newly_built boolean,
duration CHAR ( 1 ),
paon text,
saon text,
street text,
locality text,
city text,
district text,
county text,
ppd_category_type CHAR ( 1 ),
record_status CHAR ( 1 )
);
查看表属性:
// relpersistence
*p = permanent table/sequence, u = unlogged table/sequence, t = temporary table/sequence
postgres=# SELECT relname, relpersistence FROM pg_class WHERE relname like 'land_registry_price_paid_uk_%';
relname | relpersistence
-------------------------------+----------------
land_registry_price_paid_uk_2 | u
land_registry_price_paid_uk_1 | p
将数据导入普通表
将数据导入普通表,耗时 111s 左右。
rudonx=> \timing
Timing is on.
postgres=# \copy public.land_registry_price_paid_uk_1 from '/home/postgres/pp-complete.csv' with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
COPY 27450499
Time: 167954.899 ms (02:47.955)
将数据导入 unlogged table
将数据导入到 unlogged table,耗时 95s 左右。
postgres=# \copy public.land_registry_price_paid_uk_2 from '/home/postgres/pp-complete.csv' with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
COPY 27450499
Time: 95480.829 ms (01:35.481)
其他操作
在 PostgreSQL 中,支持普通表和 unlogged table 相互转化
rudonx=> alter table <table_name> set logged;
rudonx=> alter table <table_name> set unlogged;
需要注意的是,转化的过程会阻塞在表上的读写,建议在业务低峰期进行操作
-[ RECORD 1 ]---+------------------------------
clock_timestamp | 2023-11-23 13:53:28.340714+08
relname | land_registry_price_paid_uk_2
locktype | relation
relation | 35427
pid | 420765
mode | AccessExclusiveLock
granted | t
参考文档