Red - System Down

Incident Report for PEAK 15 Systems

Postmortem

Incident Summary
On June 30, 2025 at 11:30 PM PT, PEAK 15 Systems experienced a temporary outage. Customer applications were unavailable for about 26 minutes. Full service returned by 11:56 PM PT.

What Happened

  • 11:30 PM PT: Our monitoring system alerted us that services were down.
  • 11:32 PM PT: We published an update on our status page to let customers know we were investigating.
  • 11:45 PM PT: Our team discovered that our main database server had lost connection and was unable to process requests.
  • 11:52 PM PT: We reconnected the server and restarted the database service.
  • 11:56 PM PT: We confirmed all applications were back online and updated the status page.

Customer Impact

  • Services were not accessible for up to 26 minutes.
  • No customer data was lost or corrupted.

Root Cause
A key database server lost its connection and a backup system did not take over automatically.

How We Fixed It

  1. Reconnected the database server to our system.
  2. Restarted the database service.
  3. Verified that all customer applications were working.

Preventative Measures

  • Automatic Backups: Finish configuring automatic switch‑overs so a backup server can take over without delay.
  • Better Alerts: Add new alerts to catch connection problems sooner.
  • Regular Drills: Run routine tests to make sure automatic switch‑overs work as expected.

Next Steps

  • Complete the setup for automatic switch‑overs.
  • Update our internal procedures and train the team.
  • Schedule quarterly tests and share the results with stakeholders.

Timeline (PT)

  • 11:30 PM: Outage detected
  • 11:32 PM: Status page updated
  • 11:45 PM: Issue identified
  • 11:52 PM: Service restored
  • 11:56 PM: Confirmed full recovery

Thank you for your patience as we work to improve our system’s reliability.

Posted Jul 08, 2025 - 14:27 PDT

Resolved

The incident is now resolved.
Posted Jul 01, 2025 - 07:32 PDT

Monitoring

Our engineering team found a network issue affecting the database servers. At this time, the connection has been restored and we are actively monitoring for further incidents.
Posted Jul 01, 2025 - 00:00 PDT

Investigating

We are receiving alerts of a system outage. Our team is actively investigating and will provide updates when able.
Posted Jun 30, 2025 - 23:33 PDT
This incident affected: PEAK 15 Systems (Beacon and Webforms, PEAK 15 Systems Application and Iframes, Stripe, Cybersource, Sign On Page).