使用 pgcopydb 工具迁移数据

数据库关系型数据库技术服务知识库
场景介绍

在数据迁移场景中,如果对实时性要求不高,可以采用离线的方式来进行迁移,pg_dump 以及 pg_restore 是不错的工具,pg_dump 提供了并行复制的方式,可以极大的提高备份恢复效率。遗憾的是,要使用并行,必须使用 Directory 的格式。在这种格式下,无法使用数据不落地的方式来进行迁移,即 pg_dump | pg_restore。

pgcopydb 基于并行复制原理,进行并行备份导出,数据不落地并行恢复数据,使用 concurrently 模式并行创建索引,极大的提高了备份恢复效率。

来自官方文档相关介绍

pgcopydb is a tool that automates running pg_dump | pg_restore between two running Postgres servers. To make a copy of a database to another server as quickly as possible, one would like to use the parallel options of pg_dump and still be able to stream the data to as many pg_restore jobs.

The idea would be to use pg_dump --jobs=N --format=directory postgres://user@source/dbname | pg_restore --jobs=N --format=directory -d postgres://user@target/dbname in a way. This command line can't be made to work, unfortunately, because pg_dump --format=directory writes to local files and directories first, and then later pg_restore --format=directory can be used to read from those files again.

Given that, pgcopydb then uses pg_dump and pg_restore for the schema parts of the process, and implements its own data copying multi-process streaming parts. Also, pgcopydb bypasses pg_restore index building and drives that internally so that all indexes may be built concurrently.

操作步骤

导入环境变量

导入环境变量,指明需要进行备份的源库和恢复的目标库

root@rudonx:/tmp# export PGCOPYDB_SOURCE_PGURI="postgres://rudonx:xxxx@192.168.xx.xx:5432/demo"
root@rudonx:/tmp# export PGCOPYDB_TARGET_PGURI="postgres://rudonx:xxxx@postgres358xxxx.rds-pg.ivolces.com/demo"

进行备份恢复

root@rudonx:/tmp# pgcopydb clone --table-jobs 4 --index-jobs 2
22:41:31.096 7927 INFO   Running pgcopydb version 0.15-1.pgdg120+1 from "/usr/bin/pgcopydb"
22:41:31.096 7927 INFO   [SOURCE] Copying database from "postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:41:31.096 7927 INFO   [TARGET] Copying database into "postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:41:31.121 7927 INFO   Using work dir "/tmp/pgcopydb"
22:41:31.129 7927 INFO   Exported snapshot "00000007-0001E35A-1" from the source database
22:41:31.132 7929 INFO   STEP 1: fetch source database tables, indexes, and sequences
22:41:31.318 7929 INFO   Fetched information for 8 tables (including 0 tables split in 0 partitions total), with an estimated total of 21 million tuples and 1513 MB on-disk
22:41:31.349 7929 INFO   Fetched information for 11 indexes (supporting 11 constraints)
22:41:31.352 7929 INFO   Fetching information for 1 sequences
22:41:31.360 7929 INFO   Fetched information for 2 extensions
22:41:31.425 7929 INFO   Found 0 indexes (supporting 0 constraints) in the target database
22:41:31.429 7929 INFO   STEP 2: dump the source database schema (pre/post data)
22:41:31.430 7929 INFO    /usr/bin/pg_dump -Fc --snapshot 00000007-0001E35A-1 --section pre-data --file /tmp/pgcopydb/schema/pre.dump 'postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
22:41:31.541 7929 INFO    /usr/bin/pg_dump -Fc --snapshot 00000007-0001E35A-1 --section post-data --file /tmp/pgcopydb/schema/post.dump 'postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
22:41:31.647 7929 INFO   STEP 3: restore the pre-data section to the target database
22:41:31.655 7929 INFO   ALTER DATABASE demo SET "search_path" TO 'bookings', 'public';
22:41:31.683 7929 INFO    /usr/bin/pg_restore --dbname 'postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --jobs 2 --use-list /tmp/pgcopydb/schema/pre-filtered.list /tmp/pgcopydb/schema/pre.dump
22:41:31.843 7938 INFO   STEP 4: starting 4 table-data COPY processes
22:41:31.849 7941 INFO   STEP 8: starting 4 VACUUM processes
22:41:31.852 7939 INFO   STEP 6: starting 2 CREATE INDEX processes
22:41:31.852 7939 INFO   STEP 7: constraints are built by the CREATE INDEX processes
22:41:31.857 7929 INFO   Skipping large objects: none found.
22:41:31.859 7929 INFO   STEP 9: reset sequences values
22:41:31.861 7952 INFO   Reset sequences values on the target database
22:42:20.591 7929 INFO   STEP 10: restore the post-data section to the target database
22:42:20.618 7929 INFO    /usr/bin/pg_restore --dbname 'postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --jobs 2 --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/post.dump
22:42:34.416 7929 INFO   All step are now done,  1m03s elapsed
22:42:34.416 7929 INFO   Printing summary for 8 tables and 11 indexes

在最后会显示统计信息,指明整个操作的耗时: 图片

比对源和目标表结构

root@rudonx:/tmp# pgcopydb compare schema
22:47:44.425 8113 INFO   Running pgcopydb version 0.15-1.pgdg120+1 from "/usr/bin/pgcopydb"
22:47:44.451 8113 INFO   Using work dir "/tmp/pgcopydb"
22:47:44.452 8113 INFO   SOURCE: Connecting to "postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:47:44.628 8113 INFO   Fetched information for 8 tables (including 0 tables split in 0 partitions total), with an estimated total of 21 million tuples and 1513 MB on-disk
22:47:44.658 8113 INFO   Fetched information for 11 indexes (supporting 11 constraints)
22:47:44.661 8113 INFO   Fetching information for 1 sequences
22:47:44.668 8113 INFO   TARGET: Connecting to "postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:47:44.846 8113 INFO   Fetched information for 8 tables (including 0 tables split in 0 partitions total), with an estimated total of 21 million tuples and 1513 MB on-disk
22:47:44.875 8113 INFO   Fetched information for 11 indexes (supporting 11 constraints)
22:47:44.878 8113 INFO   Fetching information for 1 sequences
22:47:44.883 8113 INFO   [SOURCE] table: 8, index: 11, constraint: 11, sequence: 1
22:47:44.883 8113 INFO   [TARGET] table: 8, index: 11, constraint: 11, sequence: 1
22:47:44.885 8113 INFO   pgcopydb schema inspection is successful

比对源端和目标端数据

比对源和目标端行数和 checksum 值,具体执行 SQL 语句如下:

2024-04-17 00:12:37.245 CST [13652] LOG:  00000: duration: 7090.996 ms  statement: select count(1) as cnt, md5(format('%s-%s', sum(hashtext((ticket_no, flight_id, fare_conditions, amount)::text)::bigint),      count(1)))::uuid as chksum from only bookings.ticket_flights

数据比对是相对耗时的操作,在较小的实例上,负载上升较为明显

root@rudonx:/tmp# pgcopydb compare data
22:48:09.724 8115 INFO   Running pgcopydb version 0.15-1.pgdg120+1 from "/usr/bin/pgcopydb"
22:48:09.750 8115 INFO   Using work dir "/tmp/pgcopydb"
22:48:09.751 8115 INFO   SOURCE: Connecting to "postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:48:09.751 8115 INFO   A previous run has run through completion
22:48:09.752 8115 INFO   Re-using catalog caches
22:48:09.753 8115 INFO   Starting 4 table compare processes
                    Table Name | ! |                      Source Checksum |                      Target Checksum 
-------------------------------+---+--------------------------------------+------------------------------------- 
       bookings.ticket_flights |   |  1f2aea59-c683-53eb-8338-975f7bfb813 |  1f2aea59-c683-53eb-8338-975f7bfb813 
      bookings.boarding_passes |   |  c21e7333-5fa4-6ecf-4423-04dc5413eb5 |  c21e7333-5fa4-6ecf-4423-04dc5413eb5 
              bookings.tickets |   |  dbe64644-6c9c-b6fa-4c9c-c789b22d1e2 |  dbe64644-6c9c-b6fa-4c9c-c789b22d1e2 
             bookings.bookings |   |  e319094d-34fe-62bc-b23b-19265b31ecd |  e319094d-34fe-62bc-b23b-19265b31ecd 
              bookings.flights |   |  ec0c496e-bdbe-383a-e593-2070b4736c7 |  ec0c496e-bdbe-383a-e593-2070b4736c7 
                bookings.seats |   |  19948e9d-24e8-8573-0257-7a6ae59646a |  19948e9d-24e8-8573-0257-7a6ae59646a 
        bookings.airports_data |   |  043603e5-3a2e-71f8-1eab-2917421e8c5 |  043603e5-3a2e-71f8-1eab-2917421e8c5 
       bookings.aircrafts_data |   |  3cb46192-afde-60fc-29d0-3b014f0fa46 |  3cb46192-afde-60fc-29d0-3b014f0fa46 
参考文档

pgcopydb 提供了丰富的功能,还支持进行 CDC 复制,更多功能,可以参考如下文档:

0
0
0
0
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论