I'm observing some strange kernel behavior since updating to F43. Memory usage steadily increases, until the system starts OOMing and other badness and I need to reboot. One time I caught this, I tried shutting down essentially ~all services and daemons, and yet a significant fraction of memory was still not being released despite no visible process (or tmpfs mount that I could see) using significant RAM. Unless I missed something hiding in plain sight, this suggests there's a kernel side memory leak.
This is a j474 server, so no UI or significant GPU usage (the GPU device is not open normally, and I don't think anything in particular would try to open it transiently though I'm not entirely sure).
Note that the machine had been stable for a long time, there is a clear change in behavior on 3/12 when I updated to F43. On 3/27 I added swap (zswap, replacing zram) but that didn't really help.
Kernel versions:
- Prior to the regression: 6.14.8-400.asahi.fc42
- After the regression: 6.18.15-400.asahi.fc43
- As of last night: 6.19.13-400.asahi.fc43
I will report back whether the new upgrade fixed anything, though it'll take a couple weeks to become evident in the graphs.
Poking around Node Exporter stats, these are the most relevant ones:
So Active + Inactive memory (which is basically userspace memory) remains relatively steady throughout. However, there is a very clear change in behavior after the regression date. Prior, MemAvailable closely tracked Cached. After, the kernel claims the leaked RAM as MemAvailable, but it is not cached (and it is definitely not available, I OOMed the system while testing this with ~all userspace shut down and a high number in MemAvailable). The leaked RAM is also reflected in Committed_AS.
For reference, the math used for "memory free" in the first screenshot is MemTotal - MemFree - (Cached + Buffers + SReclaimable). This is essentially a manual computation similar to MemAvailable, but it differs from what the kernel reports.
This machine is running a bunch of stuff, perhaps notably on the kernel side a couple CephFS mounts, as well as a number of daemons, services, and periodic timers, so it's hard to pin down what could be the trigger here.
I'm observing some strange kernel behavior since updating to F43. Memory usage steadily increases, until the system starts OOMing and other badness and I need to reboot. One time I caught this, I tried shutting down essentially ~all services and daemons, and yet a significant fraction of memory was still not being released despite no visible process (or tmpfs mount that I could see) using significant RAM. Unless I missed something hiding in plain sight, this suggests there's a kernel side memory leak.
This is a j474 server, so no UI or significant GPU usage (the GPU device is not open normally, and I don't think anything in particular would try to open it transiently though I'm not entirely sure).
Note that the machine had been stable for a long time, there is a clear change in behavior on 3/12 when I updated to F43. On 3/27 I added swap (zswap, replacing zram) but that didn't really help.
Kernel versions:
I will report back whether the new upgrade fixed anything, though it'll take a couple weeks to become evident in the graphs.
Poking around Node Exporter stats, these are the most relevant ones:
So Active + Inactive memory (which is basically userspace memory) remains relatively steady throughout. However, there is a very clear change in behavior after the regression date. Prior, MemAvailable closely tracked Cached. After, the kernel claims the leaked RAM as MemAvailable, but it is not cached (and it is definitely not available, I OOMed the system while testing this with ~all userspace shut down and a high number in MemAvailable). The leaked RAM is also reflected in Committed_AS.
For reference, the math used for "memory free" in the first screenshot is MemTotal - MemFree - (Cached + Buffers + SReclaimable). This is essentially a manual computation similar to MemAvailable, but it differs from what the kernel reports.
This machine is running a bunch of stuff, perhaps notably on the kernel side a couple CephFS mounts, as well as a number of daemons, services, and periodic timers, so it's hard to pin down what could be the trigger here.