使用unlogged table 提高数据导入性能

数据库关系型数据库技术服务知识库
场景介绍

在大数据导入场景下,使用默认方式导入数据较慢,本文介绍如何使用 unlogged table 特性提高数据导入速度。

风险提示

unlogged table 使用场景和存在的风险,参考文档[1]

  1. 可以接受数据丢失的风险,数据可以从其他源进行导入。
  2. unlogged table 通常用于中间结果,频繁变更的会话数据,或者是数据导入场景

下文来自官方文档,参考文档[2]: If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well.

运行环境

  • PostgreSQL 12.9
  • 数据文件大小:4GB,27450499 行
操作步骤

创建表

// 创建普通表
CREATE TABLE land_registry_price_paid_uk_1 (
        TRANSACTION uuid,
        price NUMERIC,
        transfer_date date,
        postcode text,
        property_type CHAR ( 1 ),
        newly_built boolean,
        duration CHAR ( 1 ),
        paon text,
        saon text,
        street text,
        locality text,
        city text,
        district text,
        county text,
        ppd_category_type CHAR ( 1 ),
        record_status CHAR ( 1 )
);

// 创建 unlogged table
CREATE unlogged TABLE land_registry_price_paid_uk_2 (
        TRANSACTION uuid,
        price NUMERIC,
        transfer_date date,
        postcode text,
        property_type CHAR ( 1 ),
        newly_built boolean,
        duration CHAR ( 1 ),
        paon text,
        saon text,
        street text,
        locality text,
        city text,
        district text,
        county text,
        ppd_category_type CHAR ( 1 ),
        record_status CHAR ( 1 )
);

查看表属性:

// relpersistence
*p = permanent table/sequence, u = unlogged table/sequence, t = temporary table/sequence

postgres=# SELECT relname, relpersistence FROM pg_class WHERE relname like  'land_registry_price_paid_uk_%';
            relname            | relpersistence 
-------------------------------+----------------
 land_registry_price_paid_uk_2 | u
 land_registry_price_paid_uk_1 | p

将数据导入普通表

将数据导入普通表,耗时 111s 左右。

rudonx=> \timing
Timing is on.

postgres=# \copy public.land_registry_price_paid_uk_1 from '/home/postgres/pp-complete.csv' with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
COPY 27450499
Time: 167954.899 ms (02:47.955)

将数据导入 unlogged table

将数据导入到 unlogged table,耗时 95s 左右。

postgres=# \copy public.land_registry_price_paid_uk_2 from '/home/postgres/pp-complete.csv' with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
COPY 27450499
Time: 95480.829 ms (01:35.481)
其他操作

在 PostgreSQL 中,支持普通表和 unlogged table 相互转化

rudonx=> alter table <table_name> set logged;

rudonx=> alter table <table_name> set unlogged;

需要注意的是,转化的过程会阻塞在表上的读写,建议在业务低峰期进行操作

-[ RECORD 1 ]---+------------------------------
clock_timestamp | 2023-11-23 13:53:28.340714+08
relname         | land_registry_price_paid_uk_2
locktype        | relation
relation        | 35427
pid             | 420765
mode            | AccessExclusiveLock
granted         | t
参考文档
0
0
0
0
关于作者

文章

0

获赞

0

收藏

0

所属团队号:
相关资源
基于火山引擎 EMR 构建企业级数据湖仓
火山引擎 EMR 是一款云原生开源大数据平台,提供主流的开源大数据引擎,加持了字节跳动内部的优化、海量数据处理的最佳实践。本次演讲将为大家介绍火山引擎 EMR 的架构及核心特性,如何基于开源架构构建企业级数据湖仓,同时向大家介绍火山 EMR 产品的未来规划。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论