As many of you have noticed, in recent weeks, spam filtering performance at ProtonMail has dramatically improved. This is because we have recently deployed a series of updates to improve spam filtering performance. In this three part series of posts, we will discuss many of the spam challenges
ProtonMail faces and discuss in detail how to fight spam in the end-to-end encrypted world. ProtonMail’s spam challenges can be summarized broadly into three categories.
- Incoming Spam
- Outgoing Spam
- Internal Spam
Incoming spam is spam sent to ProtonMail from third party email providers, for example Hotmail. Outgoing spam is spammers using ProtonMail to send spam to third party email providers. Finally, Internal spam is ProtonMail accounts being used to spam other ProtonMail accounts. Each of these pose very different risks and challenges. As an encrypted email service provider with over 1 million users, spam is a continuous battle and one of the toughest challenges to overcome.
In this blog post, we will be discussing incoming spam filtering. In future blog posts, we will cover the challenges of preventing ProtonMail from being used by spammers and the challenges of doing spam filtering with end-to-end encrypted emails which we cannot read.
Incoming spam is not dangerous, but it can be a major inconvenience for users. Incoming spam doesn’t just clog up user inboxes, but it can also cause a performance problem if not handled efficiently due to the sheer volume of incoming spam emails which sometimes arrives at the ProtonMail servers.
Emails that come from third party email providers obviously cannot be delivered with end-to-end encryption, but upon reaching our mail servers, we will encrypt them with the recipient’s public key before saving the messages. All this is done in memory so that by the time anything is permanently stored to disk, the email is already un-readable to us. This gives us a very limited window to perform spam filtering on incoming messages.
When an incoming message is received, it goes through the following filtering steps. The goal is to use less computationally extensive methods first to reject as much spam as possible before more expensive methods are used.
1. First, the IP address of the incoming SMTP server is checked against spam blacklists which contain IP addresses of servers we have previously received spam from. If we receive a hit, the message is rejected.
2. Secondly, the message is passed through our customized Bayesian filters which marks suspicious messages as spam.
3. Next, we generate checksums of incoming messages and check them against a database of known spam messages. If there is a match, we mark the message as spam. The checksums are done in such a way that it is also effective against mutating spam emails.
4. Afterwards, we also apply a few other anti-spam techniques which we cannot detail here for security reasons (see below)
6. Finally, user specific spam rules are applied. This will apply user specified whitelists and blacklists to avoid false positives, or catch more spam messages.
Over the past few months, we have optimized and improved many of the above components to achieve a 500% improvement in spam detection. In doing this, we learned a few lessons:
Security through Obscurity
Generally speaking, security through obscurity is not recommended. This is why ProtonMail is open source, and we have all of our front end code open to inspection from the community. Security through obscurity is the anti-thesis of open source, and relies on the notion that security of a system can be improved if attackers do not know how the system works. Generally, this is a bad approach. It is better to have a system so secure that even if attackers know how it works, they cannot bypass it. This is certainly the case with the PGP email encryption that ProtonMail utilizes.
However, one case where security through obscurity DOES work is fighting spammers. This is particularly the case when it comes to fighting outgoing spam. Fighting spam is like trying to hit a moving target, it requires constant adjustment and tuning, especially since the distinction between spam and non-spam messages can be unclear at times.
There simply isn’t any foolproof method for defeating spam. Thus, if spammers don’t know how we are blocking their messages, it makes it much more difficult for them to find a workaround. This is why we cannot publish detailed specs of how our spam filters work. It also means we cannot open source our backend server configs which contain our spam filter settings.
In terms of privacy and trust, there is little advantage in open sourcing the server configs because even if the configs were released, there is no way to guarantee that is the config running on the server side. On the other hand, releasing the backend configs would let spammers know exactly what they need to do to bypass our spam filters, which would put the entire ProtonMail community at risk.
Personalized Spam Filtering
Before designing our spam filtering system, we looked through months of spam reports from the community. What we quickly learned is that every user has a different definition of spam. What you consider to be spam won’t be the same as what your neighbor considers to be spam. Thus, it is impossible to define a single ruleset that works for everybody. This pushed us in the direction of personalized rulesets.
Today, every single ProtonMail account comes with its own spam filter settings which are unique to that account. When you mark messages as spam or not-spam, the filter will dynamically adjust to take into account your personal preferences. You can also view and modify your personal spam filter settings. ProtonMail also accounts for whether an email came from one of your contacts or not. If it comes from a contact, it is allowed through the spam filter.
The Best Spam Filter is You
ProtonMail has a comprehensive multi-tiered protection system to prevent spam from entering your inbox, but actually you are the best protection against spam. The vast majority of spam can be avoided by simply not giving your ProtonMail email address to unscrupulous websites which then resell that information to spammers. To learn more about how to avoid receiving spam in the first place, read our guide to avoiding spam.