If a server process is blocking on accept(), wouldn’t it count as hung until a remote client connects? or do only certain operations count?
static void check_hung_task(struct task_struct *t, unsigned long timeout) https://github.com/torvalds/linux/blob/9f16d5e6f220661f73b36...
static void check_hung_uninterruptible_tasks(unsigned long timeout) https://github.com/torvalds/linux/blob/9f16d5e6f220661f73b36...
Is this saying that regular tasks that haven't been scheduled for two minutes and tasks that are uninterruptible (truly so, not idle or also killable despite being marked as uninterruptible) that haven't been woken up for two minutes are counted?
The zombie process state is a normal transient state for all exiting processes where the only remaining function of the process is as a container for the exiting process's id and exit status; they go away once the parent process calls some flavor of the "wait" system call to collect the exit status. A pileup of zombies indicates a userspace bug: a negligent parent process that isn't collecting the exit status in a timely manner.
INFO: task btrfs:103945 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Until eventually Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
So I'm looking forward to getting an actual count of how often this happens without needing to babysit the warning suppressions and count the incidents myself.On my home media server, however, I'm using ZFS in a RAID array, with regular scrubs and snapshots. ZFS has many features like RAID, scrubs, COW, snapshots, etc. that you just don't get on ext4. However, unlike btrfs, ZFS seems to have a great reputation for reliability with all its features.
I haven't missed out on any zfs or btrfs features. Yes, I know about their benefits, and no, I don't care if a few bits flip here or there over time.
Note also that modern filesystems do a lot of background work that doesn't strictly need to be done immediately for correctness.
(of course, it also seems common for people to completely disregard the well-documented "this feature is unreliable, don't use it" warnings that btrfs has, then complain that they have problems and not mention that they ignored the warnings until everyone is halfway through complaining)
The only problems I've encountered in all my years of using btrfs are:
* when (all copies of) a file bitrots on disk, you can't read it at all, rather than being able to copy the mostly-correct file and see if you can hand-correct it into something usable
* if you enable new compression algorithms on your btrfs volume, you can't read your data from old kernels (often on liveusb recovery disks)
* fsync is slow. Like, really really slow. And package managers designed for shitty CoW-less filesystems use fsync a lot.
In my case, I don't think this machine ever commits more than around 5GB of its 32GB available memory, so I doubt it's that.
> it also seems common for people to completely disregard the well-documented "this feature is unreliable, don't use it" warnings that btrfs has
Now that I am definitely doing. I won't give up raid6 until it eats all my data for a fourth time.
That's really the issue at heart, because I've seen these on zfs as well... but you'd think the filesystem would report some progress to keep bumping the timer so it doesn't start spamming dmesg. /shrug