February 24th, 2026
Introduction
Dear Valued Amino Customer,
This incident report is to inform you that on 18 February 2026, Amino identified a service disruption affecting the Engage management webpage and API service. The incident was caused by disk exhaustion on a Kubernetes node hosting the application, which resulted in the management interface becoming temporarily unavailable. STB management and TR-069 services remained operational throughout the event. The issue was resolved a few hours later, and corrective measures have been implemented to prevent recurrence.
Incident
On 18 February 2026 at approximately 15:30 (UTC+0), the Engage server management webpage became inaccessible, returning an HTTP 503 (Service Unavailable) error. The issue persisted until 19 February 2026 at approximately 02:00 (UTC+0), when the management page was restored, and normal operation resumed.
Cause
Our investigation concluded that the outage was caused by disk exhaustion on the Kubernetes node hosting the application. The container runtime storage volume reached 100% utilization, leaving no available disk space. As a result, the application detected this condition and performed a graceful shutdown. Further analysis is ongoing to determine the underlying cause of the storage consumption.
Impact
The Engage management webpage (UI) and the Engage API service were impacted during the incident. STB management and TR-069 services remained operational, and the server continued processing STB requests using preconfigured settings.
Resolution
A backup was taken, and the faulty storage volume instance was removed. A new instance was provisioned, and the management service was redeployed on the new instance, restoring the affected services to normal operation.
To prevent recurrence, monitoring alerts will be enabled for storage utilisation on the Kubernetes nodes to provide proactive warnings and allow maintenance to be performed before similar issues arise. Further investigation will also be conducted to better understand storage consumption patterns on the application nodes and to implement long-term preventive measures.
Follow-up Actions
Enable storage utilisation monitoring and alerting on all Kubernetes nodes to provide early warning before disk usage reaches critical levels.
Review and optimise container image management and garbage collection policies to prevent unnecessary accumulation of container layers.
Evaluate and adjust storage capacity sizing for Kubernetes nodes, where required.
Conduct a detailed analysis of storage consumption patterns on application nodes to identify abnormal growth or inefficiencies.
Establish a periodic maintenance review process to ensure adequate disk capacity and prevent recurrence of similar incidents.
Enhance the team coverage model to strengthen monitoring oversight and maintain support continuity during staff absences.