Incident Summary
On June 30, 2025 at 11:30 PM PT, PEAK 15 Systems experienced a temporary outage. Customer applications were unavailable for about 26 minutes. Full service returned by 11:56 PM PT.
What Happened
- 11:30 PM PT: Our monitoring system alerted us that services were down.
- 11:32 PM PT: We published an update on our status page to let customers know we were investigating.
- 11:45 PM PT: Our team discovered that our main database server had lost connection and was unable to process requests.
- 11:52 PM PT: We reconnected the server and restarted the database service.
- 11:56 PM PT: We confirmed all applications were back online and updated the status page.
Customer Impact
- Services were not accessible for up to 26 minutes.
- No customer data was lost or corrupted.
Root Cause
A key database server lost its connection and a backup system did not take over automatically.
How We Fixed It
- Reconnected the database server to our system.
- Restarted the database service.
- Verified that all customer applications were working.
Preventative Measures
- Automatic Backups: Finish configuring automatic switch‑overs so a backup server can take over without delay.
- Better Alerts: Add new alerts to catch connection problems sooner.
- Regular Drills: Run routine tests to make sure automatic switch‑overs work as expected.
Next Steps
- Complete the setup for automatic switch‑overs.
- Update our internal procedures and train the team.
- Schedule quarterly tests and share the results with stakeholders.
Timeline (PT)
- 11:30 PM: Outage detected
- 11:32 PM: Status page updated
- 11:45 PM: Issue identified
- 11:52 PM: Service restored
- 11:56 PM: Confirmed full recovery
Thank you for your patience as we work to improve our system’s reliability.