Orange - Partial Outage

Incident Report for PEAK 15 Systems

Postmortem

Summary

On February 19 at approximately 4:30 AM PST, users experienced intermittent issues loading embedded content within the application. The issue was resolved by 6:10 AM PST after our Engineering team identified and restored the affected server.

All systems are now operating normally.

What Happened

Customer support tickets alerted our team to inconsistent behavior affecting embedded content. Investigation determined the issue was limited to one server that was not consistently delivering certain application components. Because traffic continued routing between servers, some requests succeeded while others failed, resulting in intermittent behavior rather than a full outage.

Impact

During the incident window, some users may have experienced:

  • Embedded content not loading in certain areas of the application
  • Pages appearing partially complete

No other functionality, customer data, integrations, or security systems were affected.

Timeline (PST)

  • 4:30 AM — Issue escalated to Engineering and Advanced Support after customer reports
  • 5:00 AM — Investigation began
  • 5:50 AM — Issue identified as isolated to one server
  • 5:50–6:00 AM — Both servers restarted to ensure stability
  • 6:10 AM — System checks confirmed normal operation

Resolution

To restore consistent service, our team restarted both servers to clear the condition affecting one node and verified that all components were loading properly across the environment. Monitoring confirmed stable operation before the incident was closed.

Root Cause

One server entered a partially degraded state that prevented it from reliably delivering certain embedded application components. While the server appeared healthy from an infrastructure standpoint, a required internal process was not functioning correctly, which caused intermittent failures depending on which server handled a request.

This incident presented similar symptoms to a prior event but originated from a different system layer.

Preventive Actions

We are continuing work on the following improvements:

  • Enhanced monitoring to better detect issues affecting individual servers
  • Additional validation checks following system restarts or maintenance
  • Improved diagnostics to speed identification of node-specific issues

We previously identified the need for improved per-server monitoring following an earlier incident and have been evaluating solutions. While this capability has not yet been fully implemented, it remains a priority and this event reinforces its importance.

We apologize for any inconvenience this may have caused and appreciate your patience while our team worked to resolve the issue. If you continue to experience any problems, please contact our support team.

Posted Feb 19, 2026 - 10:10 PST

Resolved

Today at approximately 4:30 AM PST, users experienced intermittent outages affecting iFrames across the system. The issue persisted until approximately 6:00 AM PST, when our team successfully implemented a resolution.
Posted Feb 19, 2026 - 04:30 PST