SupportPal Website Downtime (14/09/2017)


Earlier today supportpal.com suffered downtime which unfortunately caused temporary loss of service to a small number of customers (~3.5%). We sincerely apologise to those who were affected by this incident. In this report, we look at what went wrong and the steps we’re taking to reduce the effect of such incidents in the future.

Event

  • supportpal.com became inaccessible via HTTP/HTTPS and email.
  • The licensing server remained accessible but was unable to read license data from supportpal.com and subsequently reported licenses as invalid.

Cause

We traced the downtime back to an automatic update carried out by the cPanel cron job. The server was upgraded to CentOS 7.4 which caused the firewalld process to restart. The cPanel firewalld service definition on the server was incompatible with CentOS 7.4 and caused the server to become inaccessible (cPanel issue reference: CPANEL-15104).

Steps taken

While server monitoring had been set up, the reporting was not effective enough to inform of the incident out of hours. We have already moved to a new and more effective monitoring system.

The use of local licenses significantly reduced the number of customers affected by this incident. However, we are actively exploring new ways to improve the license process and further reduce the effect of such incidents in the future.

Comments

  Add Comment

Confirm Submission

Please enter the text from the image in the box provided; this helps us to prevent spam.