Kubeadapt Status

Kubeadapt / History

History

Apr 2026-Jul 2026

July

No incidents reported

June

No incidents reported

May

21

Thu

Experiencing degraded performance with external api tenants

4:17 PM

This incident has been resolved. All services are operating normally.

Summary: A failing physical disk under one of our Ceph OSDs caused slow I/O operations on our Postgres primary. The slow operations held database connections open longer than usual, saturating the connection pool. Once the pool was saturated, both read and write requests that could not acquire a connection within our timeout window were returned as 500 errors by the Public API. During the period of degraded storage performance, two Postgres read replicas accumulated WAL replay inconsistencies that prevented them from catching up automatically.

We marked the affected OSD out of the Ceph cluster, allowed the cluster to rebalance to the remaining healthy OSDs, and then rebuilt the two affected Postgres replicas from the primary to restore full replication health.

We are conducting an internal review to improve our detection of slow OSD operations and replication lag. A more detailed post-mortem will be shared with affected customers within 48 hours.

April

No incidents reported