最近在幫助用戶恢復數據庫時遇到了一則罕見的歸檔日志損壞案例。
在進行歸檔recover時,數據庫報錯,提示歸檔日志損壞:
***
Corrupt block seq: 37288 blocknum=1.
Bad header found during deleting archived log
Data in bad block - seq:810559520. bno:170473264. time:707406346
beg:21280 cks:21061
calculated check value: 9226
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
***
信息比較詳細,說37288號歸檔日志Header損壞,無法讀取數據。
提一個小問題:如果你遇到了這樣的錯誤?會怎樣思考?
如果這個歸檔日志損壞了,其實我們仍然有辦法跳過去,繼續嘗試恢復其他日志,但是客戶數據重要,不能容忍不一致性,這時候就只能放棄部分數據,由前臺重新提交數據了。這在業務上可以實現,也就不是大問題了。
好了,問題是為什么日志會損壞?是如何損壞的?
我首先要做的就是,看看日志文件的內容,通過最簡單的命令將日志文件中的內容輸出出來:
strings arch_1_37288_632509987.dbf > log.txt
然后檢查生成的這個日志文件,我們就發現了問題。
在這個歸檔日志文件中,被寫入了大量的跟蹤文件內容,其中開頭部分就是一個跟蹤文件的全部信息。
這時一種我從來沒有遇到過的現象,也就是說,當操作系統在寫出跟蹤文件時,錯誤的覆蓋掉了已經存在的歸檔文件,最后導致歸檔日志損壞,非常奇妙,從所未見。
最后我的判斷是,這個故障應當是操作系統在寫出時出現了問題,存在文件的空間仍然被認為是可寫的,這樣就導致了寫沖突,出現這類問題,應當立即檢查硬件,看看是否是硬件問題導致了如此嚴重的異常。
Dump file /ADMIN/bdump/erp_p007_19216.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /DBMS/erp/erpdb/10g
eygle.com
2.6.9-34.ELhugemem
#1 SMP Fri Feb 24 17:04:34 EST 2006
i686
Instance name: erp
Redo thread mounted by this instance: 1
Oracle process number: 22
Unix process pid: 19216, image: oracle@eygle.com (P007)
*** SERVICE NAME:() 2010-11-10 10:37:26.247
*** SESSION ID:(2184.1) 2010-11-10 10:37:26.247
*** 2010-11-10 10:37:26.247
KCRP: blocks claimed = 61, eliminated = 0
----- Recovery Hash Table Statistics ---------
Hash table buckets = 32768
Longest hash chain = 1
Average hash chain = 61/61 = 1.0
Max compares per lookup = 0
Avg compares per lookup = 0/61 = 0.0
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 32768
Longest hash chain = 1
Average hash chain = 61/61 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 1426/1426 = 1.0
----------------------------------------------
\\GPAYMENTdxn
AP_CHECKS
Q(xn
.1=N
\\Gxn
.1=N
^0e
^0e!
^0e"
^0e#
^0e$
^0e%
^0e&
^0e\'
eygle.com!/
^0e(
^0e)
^0e*
^0e+
^0e+
^0e&
^ij1
R0:b
Q(xn
PaymentsN
a\'VND
Userxn
AP_INVOICE_PAYMENTS
105273
5406105305-20101020-003
3001CASH CLEARING
CREATED
Dump file /ADMIN/bdump/erp_p002_19206.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /DBMS/erp/erpdb/10g
Linux
eygle.com
2.6.9-34.ELhugemem
#1 SMP Fri Feb 24 17:04:34 EST 2006
i686
Instance name: erp
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 19206, image: oracle@eygle.com (P002)
*** SERVICE NAME:() 2010-11-10 10:37:26.263
*** SESSION ID:(2187.1) 2010-11-10 10:37:26.263
*** 2010-11-10 10:37:26.263
原文轉自:http://blogread.cn/it/article/3277