Tuesday 22 May 2018

How to resolve ZPOOL Permanent errors have been detected in the following files


Fix Zpool Permanent Errors:-


The backend storage was powered off,  due to the maintenance of storage.

After storage back online, try powering on the system and the below pool came up with an error.

Error logs:-

[root@proddb ~]# zpool status -v


pool: production
state: ONLINE 
status: One or more devices has experienced an error resulting in data 
corruption. Applications may be affected. 
action: Restore the file in question if possible. Otherwise restore the 
entire pool from backup. 
see: http://zfsonlinux.org/msg/ZFS-8000-8A 
scan: scrub repaired 0 in 44h14m with 0 errors on Sat May 19 07:07:03 2018

config: 

NAME STATE READ WRITE CKSUM 

act_per_pool000 ONLINE 0 0 0 
pci-0000:03:00.0-scsi-0:0:2:0 ONLINE 0 0 0 
pci-0000:03:00.0-scsi-0:0:3:0 ONLINE 0 0 0 
pci-0000:03:00.0-scsi-0:0:5:0 ONLINE 0 0 0 

errors: Permanent errors have been detected in the following files: 

production/sda:<0x1q> 
production/sdb:<0x1c> 
production/sdc:<0x1d> 
production/sdd:<0x1e> 



Explanation :

That error is telling you that inodes <0x1q> , <0x1c>,<0x1d>,<0x1e>  is corrupt (deleting the file broke the filename->inode mapping, so it's just reporting the inode now). Either something still has the file open or the metadata just needs to be cleaned up (which a scrub should do).

To clear the error if a scrub won't you need to get down and dirty with zdb, which is not publicly documented by Oracle (and poorly documented elsewhere) - and at any rate probably indicates something more fundamentally wrong.



Resolution :


Need to Scrub the pool and wait for them to complete. ZPOOL needs to take another look and update its records. 

ZPOOL scrub will do it, to start the scrub use below command.

#zpool scrub production

The scrub will take 2 hours to complete. It is not strictly necessary to wait all that time. zpool seems to run a check just before it terminates. If the scrub is stopped immediately it will still do the check and mark the pool as clean.

To stop the scrub.


#zpool scrub -s production



Results :

After canceling the scrub just see the status pool again, the pool is clean and healthy.

[root@proddb ~]# zpool status -v
pool: production
state: ONLINE 
scan: scrub canceled on Mon May 21 14:47:13 2018


config: 


NAME STATE READ WRITE CKSUM 


act_per_pool000 ONLINE 0 0 0 

pci-0000:03:00.0-scsi-0:0:2:0 ONLINE 0 0 0 
pci-0000:03:00.0-scsi-0:0:3:0 ONLINE 0 0 0 
pci-0000:03:00.0-scsi-0:0:5:0 ONLINE 0 0 0 


errors: No known data errors 

No comments:

Post a Comment