Many of you might have noticed that ProtonMail had a brief scheduled downtime last week. That was actually the first step of a major infrastructure upgrade that we have just completed. Thanks to the support from our crowdfunding contributors and around-the-clock work of our team, ProtonMail today is more secure and reliable than it has ever been, even with the huge number of additional users we have recently invited from the waiting list.
For those users who have been on our waiting list for several months, the wait will soon be over as our new infrastructure will allow us to support almost everybody. We will be inviting nearly everybody over the next month!The reason it has taken us so long to get to this point is because building an email architecture that is secure, scalable, and also reliable is no easy task. In this post, we will be describing some of the work the ProtonMail team has been doing in the past couple months to keep your data safe.
Hardware and Network
ProtonMail’s infrastructure scaling is complicated by the fact that we run our own servers which means we also need to build in redundancy on the hardware and network level which greatly increases the required effort. Fortunately, our team has worked on building and managing large scale systems at CERN and are able to draw from that experience.
Because ProtonMail’s encryption is zero access and we do not have the ability to read our user’s encrypted data, in some ways, it does not matter where we store encrypted data. However, as we have seen in the past, third parties simply cannot be trusted to safeguard online privacy and freedom. The ONLY way to ensure the highest level of data security and uptime is to have full control over the server hardware and network. This is why despite the added difficulty and complexity, we go a step beyond and only use hardware that we physically own and control within Switzerland to host ProtonMail.
All of our servers feature fully encrypted disks and we use RAID arrays with high redundancy for our storage. The redundancy even extends to the way we power our servers. Within each datacenter, only half of our servers are connected to a single power unit so a failure of an upstream power unit cannot take all servers offline.
While we have excellent redundancy within our main datacenter, to ensure even higher reliability, ProtonMail began to build out in a second datacenter this summer. Today, ProtonMail’s hardware infrastructure is spread out across two datacenters in Switzerland to ensure that a catastrophic disaster at one datacenter will not lead to data loss. In a follow up post, we will talk more about ProtonMail’s datacenters.
The diagram below gives a high level overview of ProtonMail’s latest architecture after last week’s upgrade. The overarching design philosophy is to eliminate as many single points of failure as possible in order to make ProtonMail the most reliable encrypted email service ever built.
As ProtonMail’s userbase grew, we rapidly exceeded the capacity of a single server which made it necessary to load balance across multiple servers. Our load balancing system splits the load among multiple web and mail servers and also provide instantant failover in the event of a web or mail server crash.
All ProtonMail servers (web servers included) exclusively run open source software and are Linux based. Our architecture allows additional web servers to be added without downtime. Furthermore, any individual web servers can be taken offline without impacting users. This gives full redundancy in the event of a web server failure, and also allows us to take machines offline at any time to perform security updates.
ProtonMail’s mail infrastructure is also fully redundant and any mail server can fail without impacting inbound or outbound mail deliverability. Our mail software architecture also allows us to buffer mail on the mail servers. This means in the event of a database failure, mail servers can save incoming messages until the database servers come back online so a database failure will not lead to the loss of incoming messages.
We use a cluster of database servers to store encrypted user messages. We have multiple SQL servers with automatic failover which allows us to lose SQL servers without system downtime. The data servers are clusterized so that individual data servers can be lost without leading to data loss or downtime.
As an additional layer of security, we have a backup data cluster which replicates from the master cluster in real time so in the event of a catastrophic failure of the primary cluster, we can switch to the backup with minimal data loss.
For added security against DNS attacks and better control over our domain, ProtonMail also runs our own DNS infrastructure which is distributed between our two datacenters for redundancy. Our DNS root zone is managed by SWITCH which administers .ch domain names on behalf of the Swiss Federal Office of Communications (OFCOM).
ProtonMail utilizes a sophisticated monitoring system that is also distributed between two datacenters in order to monitor the health of our hardware and also detect for potential network intrusions or abnormalities.
When ProtonMail was first opened to the public back in May, our architecture at that time was run on just two servers (a primary and a backup) and was rapidly overloaded by users from around the world. Our current architecture is a huge advancement from that and would not have been possible without many months of hard work from our team and the support of our crowdfunding contributors.
There is still much infrastructure work to be done and we will continue to add improvements on two main fronts. First, we will keep pushing to eliminate single points of failure to reduce the risk of downtime. Secondly, we will work on bringing more components of the internet infrastructure needed to run ProtonMail under our direct control to improve privacy and reliability. We recently took a step in this direction by joining Réseaux IP Européens NCC and becoming a Local Internet Registry which serves ProtonMail exclusively. As you can see, we are far from done and 2015 will certainly be a busy year!