在数据迁移场景中,如果对实时性要求不高,可以采用离线的方式来进行迁移,pg_dump 以及 pg_restore 是不错的工具,pg_dump 提供了并行复制的方式,可以极大的提高备份恢复效率。遗憾的是,要使用并行,必须使用 Directory 的格式。在这种格式下,无法使用数据不落地的方式来进行迁移,即 pg_dump | pg_restore。
pgcopydb 基于并行复制原理,进行并行备份导出,数据不落地并行恢复数据,使用 concurrently 模式并行创建索引,极大的提高了备份恢复效率。
来自官方文档相关介绍
pgcopydb is a tool that automates running
pg_dump | pg_restore
between two running Postgres servers. To make a copy of a database to another server as quickly as possible, one would like to use the parallel options ofpg_dump
and still be able to stream the data to as manypg_restore
jobs.The idea would be to use
pg_dump --jobs=N --format=directory postgres://user@source/dbname | pg_restore --jobs=N --format=directory -d postgres://user@target/dbname
in a way. This command line can't be made to work, unfortunately, becausepg_dump --format=directory
writes to local files and directories first, and then laterpg_restore --format=directory
can be used to read from those files again.Given that, pgcopydb then uses pg_dump and pg_restore for the schema parts of the process, and implements its own data copying multi-process streaming parts. Also, pgcopydb bypasses pg_restore index building and drives that internally so that all indexes may be built concurrently.
导入环境变量
导入环境变量,指明需要进行备份的源库和恢复的目标库
root@rudonx:/tmp# export PGCOPYDB_SOURCE_PGURI="postgres://rudonx:xxxx@192.168.xx.xx:5432/demo"
root@rudonx:/tmp# export PGCOPYDB_TARGET_PGURI="postgres://rudonx:xxxx@postgres358xxxx.rds-pg.ivolces.com/demo"
进行备份恢复
root@rudonx:/tmp# pgcopydb clone --table-jobs 4 --index-jobs 2
22:41:31.096 7927 INFO Running pgcopydb version 0.15-1.pgdg120+1 from "/usr/bin/pgcopydb"
22:41:31.096 7927 INFO [SOURCE] Copying database from "postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:41:31.096 7927 INFO [TARGET] Copying database into "postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:41:31.121 7927 INFO Using work dir "/tmp/pgcopydb"
22:41:31.129 7927 INFO Exported snapshot "00000007-0001E35A-1" from the source database
22:41:31.132 7929 INFO STEP 1: fetch source database tables, indexes, and sequences
22:41:31.318 7929 INFO Fetched information for 8 tables (including 0 tables split in 0 partitions total), with an estimated total of 21 million tuples and 1513 MB on-disk
22:41:31.349 7929 INFO Fetched information for 11 indexes (supporting 11 constraints)
22:41:31.352 7929 INFO Fetching information for 1 sequences
22:41:31.360 7929 INFO Fetched information for 2 extensions
22:41:31.425 7929 INFO Found 0 indexes (supporting 0 constraints) in the target database
22:41:31.429 7929 INFO STEP 2: dump the source database schema (pre/post data)
22:41:31.430 7929 INFO /usr/bin/pg_dump -Fc --snapshot 00000007-0001E35A-1 --section pre-data --file /tmp/pgcopydb/schema/pre.dump 'postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
22:41:31.541 7929 INFO /usr/bin/pg_dump -Fc --snapshot 00000007-0001E35A-1 --section post-data --file /tmp/pgcopydb/schema/post.dump 'postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
22:41:31.647 7929 INFO STEP 3: restore the pre-data section to the target database
22:41:31.655 7929 INFO ALTER DATABASE demo SET "search_path" TO 'bookings', 'public';
22:41:31.683 7929 INFO /usr/bin/pg_restore --dbname 'postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --jobs 2 --use-list /tmp/pgcopydb/schema/pre-filtered.list /tmp/pgcopydb/schema/pre.dump
22:41:31.843 7938 INFO STEP 4: starting 4 table-data COPY processes
22:41:31.849 7941 INFO STEP 8: starting 4 VACUUM processes
22:41:31.852 7939 INFO STEP 6: starting 2 CREATE INDEX processes
22:41:31.852 7939 INFO STEP 7: constraints are built by the CREATE INDEX processes
22:41:31.857 7929 INFO Skipping large objects: none found.
22:41:31.859 7929 INFO STEP 9: reset sequences values
22:41:31.861 7952 INFO Reset sequences values on the target database
22:42:20.591 7929 INFO STEP 10: restore the post-data section to the target database
22:42:20.618 7929 INFO /usr/bin/pg_restore --dbname 'postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --jobs 2 --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/post.dump
22:42:34.416 7929 INFO All step are now done, 1m03s elapsed
22:42:34.416 7929 INFO Printing summary for 8 tables and 11 indexes
在最后会显示统计信息,指明整个操作的耗时:
比对源和目标表结构
root@rudonx:/tmp# pgcopydb compare schema
22:47:44.425 8113 INFO Running pgcopydb version 0.15-1.pgdg120+1 from "/usr/bin/pgcopydb"
22:47:44.451 8113 INFO Using work dir "/tmp/pgcopydb"
22:47:44.452 8113 INFO SOURCE: Connecting to "postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:47:44.628 8113 INFO Fetched information for 8 tables (including 0 tables split in 0 partitions total), with an estimated total of 21 million tuples and 1513 MB on-disk
22:47:44.658 8113 INFO Fetched information for 11 indexes (supporting 11 constraints)
22:47:44.661 8113 INFO Fetching information for 1 sequences
22:47:44.668 8113 INFO TARGET: Connecting to "postgres://rudonx@postgres358xxxx.rds-pg.ivolces.com/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:47:44.846 8113 INFO Fetched information for 8 tables (including 0 tables split in 0 partitions total), with an estimated total of 21 million tuples and 1513 MB on-disk
22:47:44.875 8113 INFO Fetched information for 11 indexes (supporting 11 constraints)
22:47:44.878 8113 INFO Fetching information for 1 sequences
22:47:44.883 8113 INFO [SOURCE] table: 8, index: 11, constraint: 11, sequence: 1
22:47:44.883 8113 INFO [TARGET] table: 8, index: 11, constraint: 11, sequence: 1
22:47:44.885 8113 INFO pgcopydb schema inspection is successful
比对源端和目标端数据
比对源和目标端行数和 checksum 值,具体执行 SQL 语句如下:
2024-04-17 00:12:37.245 CST [13652] LOG: 00000: duration: 7090.996 ms statement: select count(1) as cnt, md5(format('%s-%s', sum(hashtext((ticket_no, flight_id, fare_conditions, amount)::text)::bigint), count(1)))::uuid as chksum from only bookings.ticket_flights
数据比对是相对耗时的操作,在较小的实例上,负载上升较为明显
root@rudonx:/tmp# pgcopydb compare data
22:48:09.724 8115 INFO Running pgcopydb version 0.15-1.pgdg120+1 from "/usr/bin/pgcopydb"
22:48:09.750 8115 INFO Using work dir "/tmp/pgcopydb"
22:48:09.751 8115 INFO SOURCE: Connecting to "postgres://rudonx@192.168.xx.xx:5432/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
22:48:09.751 8115 INFO A previous run has run through completion
22:48:09.752 8115 INFO Re-using catalog caches
22:48:09.753 8115 INFO Starting 4 table compare processes
Table Name | ! | Source Checksum | Target Checksum
-------------------------------+---+--------------------------------------+-------------------------------------
bookings.ticket_flights | | 1f2aea59-c683-53eb-8338-975f7bfb813 | 1f2aea59-c683-53eb-8338-975f7bfb813
bookings.boarding_passes | | c21e7333-5fa4-6ecf-4423-04dc5413eb5 | c21e7333-5fa4-6ecf-4423-04dc5413eb5
bookings.tickets | | dbe64644-6c9c-b6fa-4c9c-c789b22d1e2 | dbe64644-6c9c-b6fa-4c9c-c789b22d1e2
bookings.bookings | | e319094d-34fe-62bc-b23b-19265b31ecd | e319094d-34fe-62bc-b23b-19265b31ecd
bookings.flights | | ec0c496e-bdbe-383a-e593-2070b4736c7 | ec0c496e-bdbe-383a-e593-2070b4736c7
bookings.seats | | 19948e9d-24e8-8573-0257-7a6ae59646a | 19948e9d-24e8-8573-0257-7a6ae59646a
bookings.airports_data | | 043603e5-3a2e-71f8-1eab-2917421e8c5 | 043603e5-3a2e-71f8-1eab-2917421e8c5
bookings.aircrafts_data | | 3cb46192-afde-60fc-29d0-3b014f0fa46 | 3cb46192-afde-60fc-29d0-3b014f0fa46
pgcopydb 提供了丰富的功能,还支持进行 CDC 复制,更多功能,可以参考如下文档: