[BlueOnyx:24969] Re: EXT4-FS error.

Fri Jun 11 12:27:07 -05 2021

Hi Arie,

> Secure log:
> Jun  9 23:53:01 nuserver kernel: EXT4-fs error (device dm-0):
> ext4_ext_remove_space:2976: inode #1836090: comm rm: pblk 7374840 bad
> header/extent: invalid magic - magic 2, entries 0, max 0(0), depth 0(0)
> Jun  9 23:53:01 nuserver kernel: EXT4-fs error (device dm-0) in
> ext4_ext_truncate:4688: IO failure
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): error count since last
> fsck: 2
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): initial error at time
> 1623275581: ext4_ext_remove_space:2976: inode 1836090
> Jun 10 23:54:41 nuserver kernel: EXT4-fs (dm-0): last error at time
> 1623275581: ext4_ext_truncate:4688: inode 1836090

Plugging the relevant part of the error message ("bad header/extent:
invalid magic") into a search engine like this ...

https://duckduckgo.com/?t=ffab&q=bad+header%2Fextent%3A+invalid+magic

... yields some answers.

The error messages listed above mean that file system had been
corrupted. Running fsck or mode adequately e2fsck to repair it might
help, but it also could be an indicator that the hard disk(s) in
question have hardware issues.

When I see stuff like that I usually take a look at what the disk health
monitor says.

Check with "fdisk -l" to see what your disks are named and how many
there are. Usually its starts with something like /dev/sda. Then use
"smartctl" to poll the health state of each:

smartctl -a /dev/sda

There is a section in the output of that which looks similar to the text
block below, although for the purpose of this message I cut out a few
irrelevant columns to prevent word wrap:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     Always       -       0
  2 Throughput_Performance  Offline      -       70
  3 Spin_Up_Time            Always       -       271 (Average 302)
  4 Start_Stop_Count        Always       -       204
  5 Reallocated_Sector_Ct   Always       -       0
  7 Seek_Error_Rate         Always       -       0
  8 Seek_Time_Performance   Offline      -       33
  9 Power_On_Hours          Always       -       44589
 10 Spin_Retry_Count        Always       -       0
 12 Power_Cycle_Count       Always       -       204
192 Power-Off_Retract_Count Always       -       823
193 Load_Cycle_Count        Always       -       823
194 Temperature_Celsius     Always       -       40 (Min/Max 23/61)
196 Reallocated_Event_Count Always       -       0
197 Current_Pending_Sector  Always       -       0
198 Offline_Uncorrectable   Offline      -       0
199 UDMA_CRC_Error_Count    Always       -       0

The interesting parts as far as errors go are:

Raw_Read_Error_Rate
Reallocated_Sector_Ct
Seek_Error_Rate
Reallocated_Event_Count
Current_Pending_Sector
Offline_Uncorrectable
UDMA_CRC_Error_Count

All in all the disk whose output I showed above is surprisingly free of
errors, although "Power_On_Hours" shows it has been running for 44589
hours, which is slightly more than five years. I prefer to swap out
disks after 4-5 years of usage, so this one will be replaced soonish,
even if it did behave very well so far.

If a drive has critical (recent) errors, "smartcl" might also report
this in other parts of its lengthy output, so it's worth studying all of
it and to give it some thought.

-- 
With best regards

Michael Stauber