Fix Zpool Permanent Errors:-
The backend storage was powered off, due to the maintenance of storage.
After storage back online, try powering on the system and the below pool came up with an error.
Error logs:-
[root@proddb ~]# zpool status -v
pool: production
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 44h14m with 0 errors on Sat May 19 07:07:03 2018
config:
NAME STATE READ WRITE CKSUM
act_per_pool000 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:2:0 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:3:0 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:5:0 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
production/sda:<0x1q>
production/sdb:<0x1c>
production/sdc:<0x1d>
production/sdd:<0x1e>
Explanation :
That error is telling you that inodes <0x1q> , <0x1c>,<0x1d>,<0x1e> is corrupt (deleting the file broke the filename->inode mapping, so it's just reporting the inode now). Either something still has the file open or the metadata just needs to be cleaned up (which a scrub should do).
To clear the error if a scrub won't you need to get down and dirty with zdb, which is not publicly documented by Oracle (and poorly documented elsewhere) - and at any rate probably indicates something more fundamentally wrong.
Resolution :
Need to Scrub the pool and wait for them to complete. ZPOOL needs to take another look and update its records.
ZPOOL scrub will do it, to start the scrub use below command.
#zpool scrub production
The scrub will take 2 hours to complete. It is not strictly necessary to wait all that time. zpool seems to run a check just before it terminates. If the scrub is stopped immediately it will still do the check and mark the pool as clean.
To stop the scrub.
#zpool scrub -s production
Results :
After canceling the scrub just see the status pool again, the pool is clean and healthy.
[root@proddb ~]# zpool status -v
pool: production
state: ONLINE
scan: scrub canceled on Mon May 21 14:47:13 2018
config:
NAME STATE READ WRITE CKSUM
act_per_pool000 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:2:0 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:3:0 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:5:0 ONLINE 0 0 0
errors: No known data errors
That error is telling you that inodes <0x1q> , <0x1c>,<0x1d>,<0x1e> is corrupt (deleting the file broke the filename->inode mapping, so it's just reporting the inode now). Either something still has the file open or the metadata just needs to be cleaned up (which a scrub should do).
To clear the error if a scrub won't you need to get down and dirty with zdb, which is not publicly documented by Oracle (and poorly documented elsewhere) - and at any rate probably indicates something more fundamentally wrong.
Resolution :
Need to Scrub the pool and wait for them to complete. ZPOOL needs to take another look and update its records.
ZPOOL scrub will do it, to start the scrub use below command.
#zpool scrub production
The scrub will take 2 hours to complete. It is not strictly necessary to wait all that time. zpool seems to run a check just before it terminates. If the scrub is stopped immediately it will still do the check and mark the pool as clean.
To stop the scrub.
#zpool scrub -s production
Results :
After canceling the scrub just see the status pool again, the pool is clean and healthy.
[root@proddb ~]# zpool status -v
pool: production
state: ONLINE
scan: scrub canceled on Mon May 21 14:47:13 2018
config:
NAME STATE READ WRITE CKSUM
act_per_pool000 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:2:0 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:3:0 ONLINE 0 0 0
pci-0000:03:00.0-scsi-0:0:5:0 ONLINE 0 0 0
errors: No known data errors
No comments:
Post a Comment