postgresql/lightdb数据文件丢失和损坏恢复–系统表

文件删除

rm -rf 1255 # pg_proc

select * from pg_class [order by oid]; ##不报错

select * from pg_class where oid = 1255; ## 报错

ERROR: could not open file "base/16397/1255": No such file or directory

zjh@test=# \d
ERROR: could not open file "base/16397/1255": No such file or directory

touch空文件

[zjh@hs-10-20-30-193 16397]$ ll | grep 1255
-rw------- 1 zjh zjh 24576 Aug 6 15:36 1255_fsm
-rw------- 1 zjh zjh 8192 Aug 6 15:36 1255_vm
[zjh@hs-10-20-30-193 16397]$ touch 1255
[zjh@hs-10-20-30-193 16397]$ ll | grep 1255
-rw-rw-r-- 1 zjh zjh 0 Sep 13 11:43 1255
-rw------- 1 zjh zjh 24576 Aug 6 15:36 1255_fsm
-rw------- 1 zjh zjh 8192 Aug 6 15:36 1255_vm

zjh@test=#      
select * from pg_class where oid=1255;
ERROR: could not read block 3 in file "base/16397/1255": read only 0 of 8192 bytes
zjh@test=# \d
ERROR: could not read block 3 in file "base/16397/1255": read only 0 of 8192 bytes
zjh@test=# select * from pg_proc; -- 因为系统表通过BKI初始化,所以定义存在
oid | proname | pronamespace | proowner | prolang | procost | prorows | provariadic | prosupport | prokind | prosecdef | proleakproof | proisstrict | proretset | provolatile | proparallel | pronargs | pronargdefa
ults | prorettype | proaccess | proargtypes | proallargtypes | proargmodes | proargnames | proargdefaults | protrftypes | prosrc | probin | proconfig | proacl
-----+---------+--------------+----------+---------+---------+---------+-------------+------------+---------+-----------+--------------+-------------+-----------+-------------+-------------+----------+------------
-----+------------+-----------+-------------+----------------+-------------+-------------+----------------+-------------+--------+--------+-----------+--------
(0

文件内容损坏(模拟裂页 partial-write)

zjh@test=# select * from pg_proc limit 247;
ERROR: cache lookup failed for type 0
zjh@test=# select * from pg_proc limit 246;
oid | proname | pronamespace | proowner | prolang | procost | prorows | provariadic | prosupport | prokind | prosecdef | proleakproof | proisstrict | proretset | provolatile | proparallel |
pronargs | pronargdefaults | prorettype | proaccess | proargtypes | proallargtypes | proargmodes | proargnames | proargdefaults | protrftypes | prosrc | probin | proconfig | proacl
------+---------------------+--------------+----------+---------+---------+---------+-------------+---------------------+---------+-----------+--------------+-------------+-----------+-------------+-------------+-
---------+-----------------+------------+-----------+----------------------+----------------+-------------+-------------+----------------+-------------+---------------------+--------+-----------+--------
1242 | boolin | 11 | 10 | 12 | 1 | 0 | 0 | - | f | f | f | t | f | i | s |
1 | 0 | 16 | n | 2275 | | | | | | boolin | | |
1243 | boolout | 11 | 10 | 12 | 1 | 0 | 0 | - | f | f | f | t | f | i | s |
1 | 0 | 2275 | n | 16 | | | | | | boolout | | |
5,40) |  293 | float8eq            |           11 |       10 |      12 |       1 |       0 |           0 | -                   | f       | f         | t            | t           | f         | i           | s    
| 2 | 0 | 16 | n | 701 701 | | | | | | float8eq | | |
(5,41) | 294 | float8ne | 11 | 10 | 12 | 1 | 0 | 0 | - | f | f | t | t | f | i | s
| 2 | 0 | 16 | n | 701 701 | | | | | | float8ne | | |
(6,3) | 297 | float8gt | 11 | 10 | 12 | 1 | 0 | 0 | - | f | f | t | t | f | i | s
| 2 | 0 | 16 | n | 701 701 | | | | | | float8gt | | |
(6,4) | 298 | float8ge | 11 | 10 | 12 | 1 | 0 | 0 | - | f | f | t | t | f | i | s
| 2 | 0 | 16 | n | 701 701 | | | | | | float8ge | | |
(248 rows)

zjh@test=# select ctid,a.* from pg_proc a where ctid='(6,2)' limit 248;
ERROR: cache lookup failed for type 0

  然后根据ctid过滤掉损坏的记录,然后lt_dump/lt_restore重建库即可。

  zjh@test=# select ctid,a.* from pg_proc a where ctid!='(6,2)';

  对于系统表,一般都是通过DDL导入的,所以可以考虑开启log_statement = 'ddl'记录所有执行过的DDL。这样可以尽量减少DDL丢失。当然,如果对库的依赖较少,可以从别的库把数据拷贝回来。如下:

[zjh@hs-10-20-30-193 16397]$ ll | grep 1255
-rw-rw-r-- 1 zjh zjh 679936 Sep 13 11:58 1255
-rw------- 1 zjh zjh 24576 Aug 6 15:36 1255_fsm
-rw------- 1 zjh zjh 8192 Aug 6 15:36 1255_vm
[zjh@hs-10-20-30-193 16397]$ cd ../
1/ 13591/ 13592/ 16397/ ltsql_tmp/
[zjh@hs-10-20-30-193 16397]$ cd ../13592/
[zjh@hs-10-20-30-193 13592]$ ll | grep 1255
-rw------- 1 zjh zjh 704512 Sep 11 00:13 1255
-rw------- 1 zjh zjh 24576 Sep 10 22:43 1255_fsm
-rw------- 1 zjh zjh 8192 Sep 10 17:46 1255_vm
[zjh@hs-10-20-30-193 13592]$ cd ../13591/
[zjh@hs-10-20-30-193 13591]$ ll | grep 1255
-rw------- 1 zjh zjh 679936 Aug 6 15:36 1255
-rw------- 1 zjh zjh 24576 Aug 6 15:36 1255_fsm
-rw------- 1 zjh zjh 8192 Aug 6 15:36