Email Filtering for Programmers

By Enzo Calamia

Since last year, I use imapfilter to manage my emails automatically.

imapfilter is an email filter engine that uses a set of rules implemented in Lua by the user. It is just a headless IMAP client that executes a script.

I run it as a regular cron job every 15 minutes, so my inbox never gets disorganized for too long.
It is possible to "host" imapfilter on a public server which is always up, but I prefer not to. The script contains IMAP credentials, so I found it wiser to run it on my home's RaspberryPi, which is always up too.

It replaced my desktop mail agent (Evolution), so I don't need to keep it running 24/24 on a power-hungry office machine.

Writing the Rules

On a UNIX system, the script should be located at ~/.imapfilter/config.lua by default.

The script is executed from the start to the end. That is, the first rule that matches an email is the first rule to be completed.

First, the initialization of the IMAP client:

my_mailbox = IMAP {
    server = 'imap.example.com',
    username = 'name@example.com',
    password = 'xxxxxx',
    ssl = 'tls1',
}

Then, I select the messages, according to the rules:

messages = my_mailbox['INBOX']:contain_from('j.doe@example.com')

messages will contain all the messages that matched the rule :contain_from('j.doe@example.com') fetched from the INBOX folder (the default root folder). It is possible to combine filters with the * (AND) and the + (OR) operators. Since I can code the rules with a real programming language, it is even possible to describe arbitrary complex rules.

Matching message from Amazon (sender address containing @amazon.com) which are older than 7 days:

messages =
    my_mailbox['INBOX']:contain_from('@amazon.com') *
    my_mailbox['INBOX']:is_older(7)

Matching messages from a mailing list via a specific header or from a specific sender:

messages =
    my_mailbox['INBOX']:contain_field('List-Id', 'frnog.frnog.org') +
    my_mailbox['INBOX']:contain_to(frnog.org)

Once selected, applying actions on messages is straightforward.

Moving the messages to another folder:

messages:move_messages(my_mailbox['my_other_folder'])

Deleting the messages:

messages:delete_messages()

Marking the messages as important (highligted by most mail agents):

messages:mark_flagged()

Setting as read:

messages:mark_seen()

Things get interesting with complex rules. The following is a real example from my rules.

messages = (
        sb['INBOX']:contain_from('no-reply-aws@amazon.com') *
        sb['INBOX']:contain_subject('Billing Statement')
    ) + (
        sb['INBOX']:contain_from('noreply@online.net') *
        sb['INBOX']:contain_subject('facture')
    ) + (
        sb['INBOX']:contain_from('no-reply@digitalocean.com') *
        sb['INBOX']:contain_subject('invoice')
    ) +
        sb['INBOX']:contain_from('billing-noreply@scaleway.com')
    + (
        sb['INBOX']:contain_from('uber.com') *
        sb['INBOX']:contain_subject('Votre course')
    ) + (
        sb['INBOX']:contain_from('github.com') *
        sb['INBOX']:contain_subject('Receipt')
    )
messages:move_messages(my_mailbox['Invoices'])

That set of rules determine if an email is an invoice.
Trying to match the sender is not always enough. I need to separate marketing emails (which I delete) from invoice emails (which I archive in a special directory). Hence the rule.

The + Trick

Did you know that any standard-compliant email service allows users to have a near-infinite number of email addresses? Without any alias.

Just by appending +[something] to the username before the @. For example, those following addresses are equivalent:

me@exmaple.com
me+marketing@example.com
me+newsletter@example.com
me+amazon@example.com
me+test@example.com

Using a "special" address for each service or type of service makes the filtering process very trivial:

Just by using the :contain_to('+[something]'), it is easy to select all relevant messages.

my_mailbox['INBOX']:contain_to('+newsletter')
messages:move_messages(my_mailbox['Newsletters_I_dont_read'])

my_mailbox['INBOX']:contain_to('+amazon')
messages:move_messages(my_mailbox['Amazon'])

my_mailbox['INBOX']:contain_to('+marketing')
messages:move_messages(my_mailbox['Marketing'])

my_mailbox['INBOX']:contain_to('+spam')
messages:delete_messages()

my_mailbox['INBOX']:contain_to('+test')
messages:move_messages(my_mailbox['Dev_Tests'])

...

Unfortunately, few people know about the + trick. Every decent SMTP servers and major email providers support it (and if they don't, they should, because of the standard). As far as I recall, it works with big providers such as Gmail and standard email servers such as Postfix.

Happy inbox organizing!