[BlueOnyx:24970] Re: EXT4-FS error.
Arie Ceelie
BlueOnyx at Ceelie.info
Fri Jun 11 13:56:07 -05 2021
Michael,
Fdisk not necessary, I know the drive config. The disks are on a megaraid controller. SDA is the system/boot disk, which consists of a megaraid raid 1 containing two disks. SDB and SDC are the two disks that make up MD0.
I checked them all with MRM, and the output is they are all optimal. Checked the raid status in webmin, array is clean.
Looked up the DID’s, and checked with smartctl -a -d megaraid,DID /dev/sdX
SDC:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 41
3 Spin_Up_Time 0x0027 173 169 021 Pre-fail Always - 4325
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 62
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 75968
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 58
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 10
194 Temperature_Celsius 0x0022 120 107 000 Old_age Always - 27
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
SDB:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 16
3 Spin_Up_Time 0x0027 173 170 021 Pre-fail Always - 2341
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 39
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 022 022 000 Old_age Always - 57667
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 39
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 34
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1872
194 Temperature_Celsius 0x0022 120 114 000 Old_age Always - 23
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Some raw read errors found, but the disks are ancient…. They came with the server when I bought it. Soon had a disk failure so replaced two system/boot disks.
I guess they are EOL.
No problem though, the MD0 array is just there for overnight backups. Will be replacing the drives someday soon. Thnx for your help!
Van: Michael Stauber
Verzonden: vrijdag 11 juni 2021 19:32
Aan: blueonyx at mail.blueonyx.it
Onderwerp: [BlueOnyx:24969] Re: EXT4-FS error.
Hi Arie,
> Secure log:
> Jun 9 23:53:01 nuserver kernel: EXT4-fs error (device dm-0):
> ext4_ext_remove_space:2976: inode #1836090: comm rm: pblk 7374840 bad
> header/extent: invalid magic - magic 2, entries 0, max 0(0), depth 0(0)
> Jun 9 23:53:01 nuserver kernel: EXT4-fs error (device dm-0) in
> ext4_ext_truncate:4688: IO failure
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): error count since last
> fsck: 2
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): initial error at time
> 1623275581: ext4_ext_remove_space:2976: inode 1836090
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): last error at time
> 1623275581: ext4_ext_truncate:4688: inode 1836090
Plugging the relevant part of the error message ("bad header/extent:
invalid magic") into a search engine like this ...
https://duckduckgo.com/?t=ffab&q=bad+header%2Fextent%3A+invalid+magic
... yields some answers.
The error messages listed above mean that file system had been
corrupted. Running fsck or mode adequately e2fsck to repair it might
help, but it also could be an indicator that the hard disk(s) in
question have hardware issues.
When I see stuff like that I usually take a look at what the disk health
monitor says.
Check with "fdisk -l" to see what your disks are named and how many
there are. Usually its starts with something like /dev/sda. Then use
"smartctl" to poll the health state of each:
smartctl -a /dev/sda
There is a section in the output of that which looks similar to the text
block below, although for the purpose of this message I cut out a few
irrelevant columns to prevent word wrap:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate Always - 0
2 Throughput_Performance Offline - 70
3 Spin_Up_Time Always - 271 (Average 302)
4 Start_Stop_Count Always - 204
5 Reallocated_Sector_Ct Always - 0
7 Seek_Error_Rate Always - 0
8 Seek_Time_Performance Offline - 33
9 Power_On_Hours Always - 44589
10 Spin_Retry_Count Always - 0
12 Power_Cycle_Count Always - 204
192 Power-Off_Retract_Count Always - 823
193 Load_Cycle_Count Always - 823
194 Temperature_Celsius Always - 40 (Min/Max 23/61)
196 Reallocated_Event_Count Always - 0
197 Current_Pending_Sector Always - 0
198 Offline_Uncorrectable Offline - 0
199 UDMA_CRC_Error_Count Always - 0
The interesting parts as far as errors go are:
Raw_Read_Error_Rate
Reallocated_Sector_Ct
Seek_Error_Rate
Reallocated_Event_Count
Current_Pending_Sector
Offline_Uncorrectable
UDMA_CRC_Error_Count
All in all the disk whose output I showed above is surprisingly free of
errors, although "Power_On_Hours" shows it has been running for 44589
hours, which is slightly more than five years. I prefer to swap out
disks after 4-5 years of usage, so this one will be replaced soonish,
even if it did behave very well so far.
If a drive has critical (recent) errors, "smartcl" might also report
this in other parts of its lengthy output, so it's worth studying all of
it and to give it some thought.
--
With best regards
Michael Stauber
_______________________________________________
Blueonyx mailing list
Blueonyx at mail.blueonyx.it
http://mail.blueonyx.it/mailman/listinfo/blueonyx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.blueonyx.it/pipermail/blueonyx/attachments/20210611/df4f8686/attachment.html>
More information about the Blueonyx
mailing list