Missing Emails Restored

We are happy to announce that we have now completed recovery of the missing emails from earlier this week. As we reported earlier on our blog, we had an incident which caused some emails from over a 20 hour period to disappear.

Immediately afterwards, we initiated data recovery steps and within a day, we were able to recover the data and begin restoring emails into users accounts. On Saturday, we finally finished restoring emails to the last impacted accounts.

Our goal is to maintain 100% data availability and we apologize to those users who weren’t able to access some emails for a couple days while we worked on the recovery. Needless to say, a number of steps have been taken to avoid a repeat of this problem and we have strengthened our standard operating procedures (SOP) to include even more safeguards.

Technical Details

The root cause was found to be a Linux service called monit which automatically restarts services when it detects them to be crashed or is otherwise not running for some reason.

In our SOP, the first step for most procedures is to shut down monit. However, when one of our new engineers went to perform some changes on Monday, this was not done. The database changes we were doing on Monday required the database server to be shut down for a period of time, and the commands to do this were indeed issued. However, since monit was still running, the database server was automatically turned back on unbeknownst to the engineer. As a result, changes were made on a running database leading to data corruption.

While it is easy to lay blame on an individual engineer for not following the SOP, there are also organizational deficiencies that allowed this lapse to occur. The team as a whole is under immense time pressure to work quickly and support more users, so shortcuts were tolerated. This was generally OK because the core developers understood the system very well and knew with certainty which steps could be skipped without risk. However, we also inadvertently created an environment for new employees where the SOP was treated a guideline and not rules that had to be followed to the letter.

To remedy this situation, we have now enacted new regulations where changes on the production systems can only be made with the approval of ALL core developers. Furthermore, SOP shortcuts will no longer be tolerated, regardless of who is making the change.

These changes will inevitably slightly slow down our development and scaling process, but as a group, our core priorities are security and reliability and these must come before all other considerations. We would  like to thank everybody (especially those still on the waiting list) for their understanding.

As a side note, we are also actively looking to grow our team so we can develop ProtonMail faster, if you or somebody you know is interested, please check out our current job openings.

About the Author

Andy Yen

Andy is the Co-Founder of ProtonMail. He is a long time advocate of privacy rights and has spoken around the world about online privacy issues. Previously, Andy was a research scientist at CERN and has a PhD in Particle Physics from Harvard University. You can watch his TED talk online to learn more about ProtonMail's mission.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

7 comments on “Missing Emails Restored

  • Hi Andy
    I think you are doıing a great job over there. We should all apprecaite your privacy and safety concerns. Tuning while still in beta will require some more experimentation. Hopefully we shall all have a sound and reliiable, safe system in a few weeks. Nice work, keep it up!

    Reply
  • Experience is a good teacher, but a bad experience is a better teacher. There’s never time to do it right the first time, but there is always money to do it over. Aphorisms from a career in nuclear power, where strict procedural compliance is everything.

    Reply
  • Keep up the great work your team is doing. Protonmail is still in beta and I keep that in mind whenever I use the service. I expect it will hit a few snags along the way, but as the project matures I look forward to migrating entirely to Protonmail. I am impressed with the immediate attention taken to recover the data, your commitment to find the source of the problem and take steps to prevent similar incidents, and the fact that you are forthcoming about the what happened. I only wish I had the knowledge and expertise to contribute to such a project, but I will spread the word and encourage people to sign up. Thank you for your efforts!

    Reply
  • Dear Andy, Jason and Wei

    Any chance you could write a weekly or bi-weekly update on progress and the current expected ETA for the GM release of the ProtonMail webmail service in this News section please?

    I’m aware that in early August you mentioned it would be about 1-2 months away. I’ve been wondering if this is still the case.

    Looking forward to it!

    Reply
      • Looking forward to seeing it. I’m hoping you’re able to bring in custom domains sooner rather than later as well. That’s the only thing holding me back from completely switching over from FastMail.

        Reply