[BlueOnyx:24970] Re: EXT4-FS error.

Fri Jun 11 13:56:07 -05 2021

Michael,

Fdisk not necessary, I know the drive config. The disks are on a megaraid controller. SDA is the system/boot disk, which consists of a megaraid raid 1 containing two disks. SDB and SDC are the two disks that make up MD0.
I checked them all with MRM, and the output is they are all optimal. Checked the raid status in webmin, array is clean. 
Looked up the DID’s, and checked with smartctl -a -d megaraid,DID /dev/sdX

SDC:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       41
  3 Spin_Up_Time            0x0027   173   169   021    Pre-fail  Always       -       4325
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       62
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       75968
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       58
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       51
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       10
194 Temperature_Celsius     0x0022   120   107   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

SDB:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       16
  3 Spin_Up_Time            0x0027   173   170   021    Pre-fail  Always       -       2341
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       39
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   022   022   000    Old_age   Always       -       57667
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       39
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       34
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1872
194 Temperature_Celsius     0x0022   120   114   000    Old_age   Always       -       23
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

Some raw read errors found, but the disks are ancient…. They came with the server when I bought it. Soon had a disk failure so replaced two system/boot disks.
I guess they are EOL.

No problem though, the MD0 array is just there for overnight backups. Will be replacing the drives someday soon. Thnx for your help!

Van: Michael Stauber
Verzonden: vrijdag 11 juni 2021 19:32
Aan: blueonyx at mail.blueonyx.it
Onderwerp: [BlueOnyx:24969] Re: EXT4-FS error.

Hi Arie,

> Secure log:
> Jun  9 23:53:01 nuserver kernel: EXT4-fs error (device dm-0):
> ext4_ext_remove_space:2976: inode #1836090: comm rm: pblk 7374840 bad
> header/extent: invalid magic - magic 2, entries 0, max 0(0), depth 0(0)
> Jun  9 23:53:01 nuserver kernel: EXT4-fs error (device dm-0) in
> ext4_ext_truncate:4688: IO failure
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): error count since last
> fsck: 2
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): initial error at time
> 1623275581: ext4_ext_remove_space:2976: inode 1836090
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): last error at time
> 1623275581: ext4_ext_truncate:4688: inode 1836090

Plugging the relevant part of the error message ("bad header/extent:
invalid magic") into a search engine like this ...

https://duckduckgo.com/?t=ffab&q=bad+header%2Fextent%3A+invalid+magic

... yields some answers.

The error messages listed above mean that file system had been
corrupted. Running fsck or mode adequately e2fsck to repair it might
help, but it also could be an indicator that the hard disk(s) in
question have hardware issues.

When I see stuff like that I usually take a look at what the disk health
monitor says.

Check with "fdisk -l" to see what your disks are named and how many
there are. Usually its starts with something like /dev/sda. Then use
"smartctl" to poll the health state of each:

smartctl -a /dev/sda

There is a section in the output of that which looks similar to the text
block below, although for the purpose of this message I cut out a few
irrelevant columns to prevent word wrap:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     Always       -       0
  2 Throughput_Performance  Offline      -       70
  3 Spin_Up_Time            Always       -       271 (Average 302)
  4 Start_Stop_Count        Always       -       204
  5 Reallocated_Sector_Ct   Always       -       0
  7 Seek_Error_Rate         Always       -       0
  8 Seek_Time_Performance   Offline      -       33
  9 Power_On_Hours          Always       -       44589
 10 Spin_Retry_Count        Always       -       0
 12 Power_Cycle_Count       Always       -       204
192 Power-Off_Retract_Count Always       -       823
193 Load_Cycle_Count        Always       -       823
194 Temperature_Celsius     Always       -       40 (Min/Max 23/61)
196 Reallocated_Event_Count Always       -       0
197 Current_Pending_Sector  Always       -       0
198 Offline_Uncorrectable   Offline      -       0
199 UDMA_CRC_Error_Count    Always       -       0

The interesting parts as far as errors go are:

Raw_Read_Error_Rate
Reallocated_Sector_Ct
Seek_Error_Rate
Reallocated_Event_Count
Current_Pending_Sector
Offline_Uncorrectable
UDMA_CRC_Error_Count

All in all the disk whose output I showed above is surprisingly free of
errors, although "Power_On_Hours" shows it has been running for 44589
hours, which is slightly more than five years. I prefer to swap out
disks after 4-5 years of usage, so this one will be replaced soonish,
even if it did behave very well so far.

If a drive has critical (recent) errors, "smartcl" might also report
this in other parts of its lengthy output, so it's worth studying all of
it and to give it some thought.

-- 
With best regards

Michael Stauber
_______________________________________________
Blueonyx mailing list
Blueonyx at mail.blueonyx.it
http://mail.blueonyx.it/mailman/listinfo/blueonyx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.blueonyx.it/pipermail/blueonyx/attachments/20210611/df4f8686/attachment.html>