简单探究pg_resetwal的作用

pg_resetwal作用

pg_resetwal主要作用就是重置wal日志和控制文件的一些信息,在出现wal日志、控制文件损坏的时候就可能需要使用到pg_resetwal。在官网中特别强调了pg_resetwal具有一定的风险,应该作为数据修复最后的手段,并且在使用pg_resetwal之后需要立刻对数据库进行dump操作,重新初始化导入数据之后在使用。

pg_resetwal做了什么

pg_resetwal会对控制文件做修改,各个参数对控制文件的影响字段如下:

通过一个试验去理解pg_resetwal对控制文件的修改,理解验证pg_resetwal是怎样读取到幽灵元组的。

a) 首先准备一些测试数据

vicdb=# insert into test values (200);
INSERT 0 1
vicdb=# insert into test values (201);
INSERT 0 1
vicdb=# insert into test values (202);
INSERT 0 1
vicdb=# select xmin,id from test;
xmin | id
---------+-----
1265019 | 1
1265021 | 3
1265020 | 100
1265021 | 100
1265022 | 100
1265047 | 200
1265048 | 201
1265049 | 202

在test表中,id<200的数据是之前就存在表里的,现在从里面插入了3条数据,分别为200,201,202.

b) 删除部分数据

vicdb=# delete from test where id<200;
DELETE 5
vicdb=# insert into test values (203);
INSERT 0 1
vicdb=# select xmin,id from test;
xmin | id
---------+-----
1265047 | 200
1265048 | 201
1265049 | 202
1265051 | 203

在这个操作中,我删除了id<200的所有数据。需要注意的是,pg里面对元组的删除并不是真的删除了,只是因为可见性规则的问题让那些被删除的数据变得不可见了,数据依然存在与数据文件中

c) 查看当前系统的状态

[postgres@vm002 db1]$ pg_controldata  | grep check
Latest checkpoint location: 0/9F0006C0
Latest checkpoint's REDO location: 0/9F000688
Latest checkpoint's REDO WAL file: 00000009000000000000009F
Latest checkpoint's TimeLineID: 9
Latest checkpoint's PrevTimeLineID: 9
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1265047
Latest checkpoint's NextOID: 49210
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 726
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1265047
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Thu 01 Sep 2022 03:00:11 PM CST
Data page checksum version: 0

pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+---------------------------------
0/9F000C40 | 00000009000000000000009F | (00000009000000000000009F,3136)

从上可知,0/9F000688之前的数据已经全部写盘,00000009000000000000009F文件之前的wal文件已经可以删除(之后的不能)。

d) 删除pg_wal里面的wal日志,强制关机

[postgres@vm002 pg_wal]$ rm -rf 000000090000000000000*
[postgres@vm002 db1]$ pg_ctl stop -m immediate
waiting for server to shut down.... done
server stopped

e) 已经无法正常启动了

[postgres@vm002 db1]$ pg_controldata 
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7135267332592588601
Database cluster state: in production
pg_control last modified: Thu 01 Sep 2022 03:00:11 PM CST
Latest checkpoint location: 0/9F0006C0
Latest checkpoint's REDO location: 0/9F000688
Latest checkpoint's REDO WAL file: 00000009000000000000009F
Latest checkpoint's TimeLineID: 9
Latest checkpoint's PrevTimeLineID: 9
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1265047
Latest checkpoint's NextOID: 49210
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 726
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1265047
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Thu 01 Sep 2022 03:00:11 PM CST
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 200
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: 9fdca12ba32064b7f560c9bc3503291d0ff3fc7019eda6589a6447f1265ce198

很显然,因为Database cluster state:in production,所以数据库认为pg是异常关闭的,所以在启动的时候需要做实例恢复,根据控制文件里面的信息它需要用到0/9F000688之后的所有wal用于实例恢复,也就是包含00000009000000000000009F之后的所有wal文件。很遗憾,这些数据已经没有了。

d) dump一下wal日志,看看里面有什么内容

 postgres@vm002 pg_wal]$ pg_waldump 00000009000000000000009F 
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/9F000028, prev 0/00000000, desc: CHECKPOINT_SHUTDOWN redo 0/9F000028; tli 9; prev tli 9; fpw true; xid 0:1265022; oid 49210; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1265023; shutdown
rmgr: XLOG len (rec/tot): 54/ 54, tx: 0, lsn: 0/9F0000A0, prev 0/9F000028, desc: PARAMETER_CHANGE max_connections=200 max_worker_processes=8 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=64 wal_level=replica wal_log_hints=off track_commit_timestamp=off
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F0000D8, prev 0/9F0000A0, desc: RUNNING_XACTS nextXid 1265022 latestCompletedXid 1265021 oldestRunningXid 1265022
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265022, lsn: 0/9F000110, prev 0/9F0000D8, desc: COMMIT 2022-09-01 10:38:36.035312 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000138, prev 0/9F000110, desc: RUNNING_XACTS nextXid 1265023 latestCompletedXid 1265022 oldestRunningXid 1265023
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265023, lsn: 0/9F000170, prev 0/9F000138, desc: COMMIT 2022-09-01 10:38:37.457959 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265024, lsn: 0/9F000198, prev 0/9F000170, desc: COMMIT 2022-09-01 10:38:38.125193 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265025, lsn: 0/9F0001C0, prev 0/9F000198, desc: COMMIT 2022-09-01 10:38:38.501912 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265026, lsn: 0/9F0001E8, prev 0/9F0001C0, desc: COMMIT 2022-09-01 10:38:38.844197 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265027, lsn: 0/9F000210, prev 0/9F0001E8, desc: COMMIT 2022-09-01 10:38:39.087947 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265028, lsn: 0/9F000238, prev 0/9F000210, desc: COMMIT 2022-09-01 10:38:39.329163 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265029, lsn: 0/9F000260, prev 0/9F000238, desc: COMMIT 2022-09-01 10:38:39.539805 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265030, lsn: 0/9F000288, prev 0/9F000260, desc: COMMIT 2022-09-01 10:38:39.744425 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265031, lsn: 0/9F0002B0, prev 0/9F000288, desc: COMMIT 2022-09-01 10:38:40.118526 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265032, lsn: 0/9F0002D8, prev 0/9F0002B0, desc: COMMIT 2022-09-01 10:38:40.585174 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265033, lsn: 0/9F000300, prev 0/9F0002D8, desc: COMMIT 2022-09-01 10:38:41.456662 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265034, lsn: 0/9F000328, prev 0/9F000300, desc: COMMIT 2022-09-01 10:38:41.983325 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265035, lsn: 0/9F000350, prev 0/9F000328, desc: COMMIT 2022-09-01 10:38:49.257192 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265036, lsn: 0/9F000378, prev 0/9F000350, desc: COMMIT 2022-09-01 10:38:49.656845 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265037, lsn: 0/9F0003A0, prev 0/9F000378, desc: COMMIT 2022-09-01 10:38:49.999686 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265038, lsn: 0/9F0003C8, prev 0/9F0003A0, desc: COMMIT 2022-09-01 10:38:50.245227 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265039, lsn: 0/9F0003F0, prev 0/9F0003C8, desc: COMMIT 2022-09-01 10:38:50.493068 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 000688, desc: CHECKPOINT_ONLINE redo 0/9F000688; tli 9; prev tli 9; fpw true; xid 0:1265047; oid 49210; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1265047; online
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000738, prev 1265040, lsn: 0/9F000418, prev 0/9F0003F0, desc: COMMIT 2022-09-01 10:38:50.707671 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265041, lsn: 0/9F000440, prev 0/9F000418, desc: COMMIT 2022-09-01 10:38:50.912370 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265042, lsn: 0/9F000468, prev 0/9F000440, desc: COMMIT 2022-09-01 10:38:51.083189 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265043, lsn: 0/9F000490, prev 0/9F000468, desc: COMMIT 2022-09-01 10:38:51.329881 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265044, lsn: 0/9F0004B8, prev 0/9F000490, desc: COMMIT 2022-09-01 10:38:51.511165 CST
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265045, lsn: 0/9F0004E0, prev 0/9F0004B8, desc: COMMIT 2022-09-01 10:38:51.865979 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000508, prev 0/9F0004E0, desc: RUNNING_XACTS nextXid 1265046 latestCompletedXid 1265045 oldestRunningXid 1265046
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000540, prev 0/9F000508, desc: RUNNING_XACTS nextXid 1265046 latestCompletedXid 1265045 oldestRunningXid 1265046
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/9F000578, prev 0/9F000540, desc: CHECKPOINT_ONLINE redo 0/9F000540; tli 9; prev tli 9; fpw true; xid 0:1265046; oid 49210; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1265046; online
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F0005F0, prev 0/9F000578, desc: RUNNING_XACTS nextXid 1265046 latestCompletedXid 1265045 oldestRunningXid 1265046
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265046, lsn: 0/9F000628, prev 0/9F0005F0, desc: COMMIT 2022-09-01 14:56:36.643104 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000650, prev 0/9F000628, desc: RUNNING_XACTS nextXid 1265047 latestCompletedXid 1265046 oldestRunningXid 1265047

rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000688, prev 0/9F000650, desc: RUNNING_XACTS nextXid 1265047 latestCompletedXid 1265046 oldestRunningXid 1265047
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/9F0006C0, prev 0/9F0/9F0006C0, desc: RUNNING_XACTS nextXid 1265047 latestCompletedXid 1265046 oldestRunningXid 1265047
rmgr: Heap len (rec/tot): 54/ 330, tx: 1265047, lsn: 0/9F000770, prev 0/9F000738, desc: INSERT off 7 flags 0x00, blkref #0: rel 1663/16385/49207 blk 0 FPW
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265047, lsn: 0/9F0008C0, prev 0/9F000770, desc: COMMIT 2022-09-01 15:21:43.829679 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F0008E8, prev 0/9F0008C0, desc: RUNNING_XACTS nextXid 1265048 latestCompletedXid 1265047 oldestRunningXid 1265048
rmgr: Heap len (rec/tot): 59/ 59, tx: 1265048, lsn: 0/9F000920, prev 0/9F0008E8, desc: INSERT off 8 flags 0x00, blkref #0: rel 1663/16385/49207 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265048, lsn: 0/9F000960, prev 0/9F000920, desc: COMMIT 2022-09-01 15:21:48.268932 CST
rmgr: Heap len (rec/tot): 59/ 59, tx: 1265049, lsn: 0/9F000988, prev 0/9F000960, desc: INSERT off 9 flags 0x00, blkref #0: rel 1663/16385/49207 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265049, lsn: 0/9F0009C8, prev 0/9F000988, desc: COMMIT 2022-09-01 15:21:51.682943 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F0009F0, prev 0/9F0009C8, desc: RUNNING_XACTS nextXid 1265050 latestCompletedXid 1265049 oldestRunningXid 1265050
rmgr: Heap len (rec/tot): 54/ 54, tx: 1265050, lsn: 0/9F000A28, prev 0/9F0009F0, desc: DELETE off 1 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16385/49207 blk 0
rmgr: Heap len (rec/tot): 54/ 54, tx: 1265050, lsn: 0/9F000A60, prev 0/9F000A28, desc: DELETE off 3 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16385/49207 blk 0
rmgr: Heap len (rec/tot): 54/ 54, tx: 1265050, lsn: 0/9F000A98, prev 0/9F000A60, desc: DELETE off 4 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16385/49207 blk 0
rmgr: Heap len (rec/tot): 54/ 54, tx: 1265050, lsn: 0/9F000AD0, prev 0/9F000A98, desc: DELETE off 5 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16385/49207 blk 0
rmgr: Heap len (rec/tot): 54/ 54, tx: 1265050, lsn: 0/9F000B08, prev 0/9F000AD0, desc: DELETE off 6 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16385/49207 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265050, lsn: 0/9F000B40, prev 0/9F000B08, desc: COMMIT 2022-09-01 15:22:23.116160 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000B68, prev 0/9F000B40, desc: RUNNING_XACTS nextXid 1265051 latestCompletedXid 1265050 oldestRunningXid 1265051
rmgr: Heap len (rec/tot): 59/ 59, tx: 1265051, lsn: 0/9F000BA0, prev 0/9F000B68, desc: INSERT off 10 flags 0x00, blkref #0: rel 1663/16385/49207 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1265051, lsn: 0/9F000BE0, prev 0/9F000BA0, desc: COMMIT 2022-09-01 15:22:31.884786 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/9F000C08, prev 0/9F000BE0, desc: RUNNING_XACTS nextXid 1265052 latestCompletedXid 1265051 oldestRunningXid 1265052
pg_waldump: fatal: error in WAL record at 0/9F000C08: invalid record length at 0/9F000C40: wanted 24, got 0
[postgres@vm002 pg_wal]$

里面有很多,重点关系如下内容:

24行:开始发起checkpoint,目标位置0/9F000688

38行:完成检查点,对应controldata中的Latest checkpoint location: 0/9F0006C0

38行以下的变更,也就是0/9F000688之后的是在内存中的数据,没有写盘,强制关闭之后需要wal做恢复,但是wal被删除,所以会丢失。

第40,43,45的记录分别对应id=200,201,202的数据插入,48~52的delete就是对应删除id<200的那5条数据

第55行的插入就对应id=203的数据插入

d) 使用pg_resetwal修改控制文件,对比一下

[postgres@vm002 db1]$ pg_resetwal -f -x 1265050 -D /pg_data/db1/
Write-ahead log reset

左边为数据库正常情况下的控制信息,右边为修改之后的信息

简单探究pg_resetwal的作用

变化1:Database cluster state的状态由in production-->shut down,启动的时候不需要做实例恢复了

变化2:Latest checkpoint's REDO WAL file的值也变化了,因为pg_resetwal会删除pg_wal里面的wal,然后根据一系列的规则重新生成出一个wal文件

变化3:Latest checkpoint location,Latest checkpoint's REDO location:都变了,个人的理解是pg_resetwal的目的就是拉平LSN,让数据库认为数据是一致的可以正常启动(实际上可能不一致,这也就是pg_resetwal危险的原因)

变化4:Latest checkpoint's NextXID的值变成了我指定的1265050。因为修改之前的pg_controldata不是宕机那一刹的信息,是我在早些时候获取的到信息,所以左图中的Latest checkpoint's NextXID只有0:1265047。

变化5:wal_level setting:的值变成了minimal。很奇怪吧,我的理解是执行过pg_resetwal的控制文件是新的,wal_level_setting的默认就minimal。在启动数据库之后,会自动修改

启动数据库之后会发现有修改参数的操作

[postgres@vm002 pg_wal]$ pg_waldump 0000000900000000000000A0 
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/A0000028, prev 0/00000000, desc: CHECKPOINT_SHUTDOWN redo 0/A0000028; tli 9; prev tli 9; fpw true; xid 0:1265050; oid 49210; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1265047; shutdown
rmgr: XLOG len (rec/tot): 54/ 54, tx: 0, lsn: 0/A00000A0, prev 0/A0000028, desc: PARAMETER_CHANGE max_connections=200 max_worker_processes=8 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=64 wal_level=replica wal_log_hints=off track_commit_timestamp=off
pg_waldump: fatal: error in WAL record at 0/A00000A0: invalid record length at 0/A00000D8: wanted 24, got 0
[postgres@vm002 pg_wal]$

验证数据:

vicdb=# select txid_current();
txid_current
--------------
1265069
(1 row)

vicdb=# select xmin,xmax,* from test;
xmin | xmax | id
---------+------+-----
1265019 | 0 | 1
1265021 | 0 | 3
1265020 | 0 | 100
1265021 | 0 | 100
1265022 | 0 | 100
(5 rows)

就算我把txid拉倒1265069,也发现数据就是这么多了。为什么?查看之前的pg_waldump数据,检查点的位置为0/9F000688,那么这个点后续的变更也就全部丢失了。

应用场景

  • pg_wal数据丢失,数据库异常关闭的时候
  • 误删了数据,可以使用pg_resetwal回到过去,利用可见性规则读取数据。个人觉得有点oracle闪回查询的意思,只不过oracle的闪回查询利用的undo里面的镜像数据。pg_resetwal的原理是利用pg数据库的可见性规则。