Update to DNS hosting

We’ve rolled out a change to our DNS hosting abilities to switch our backend from tinydns to powerdns. We’d previously tried this change once before but had some problems and had to roll back. After some more development work and testing, we believe we’ve fixed all these issues and so have moved forward to powerdns again.

This change should initially be invisible to users and things should continue to work as they were. In the long term, it will allow us to support more features and faster updates to DNS in the future.

Posted in Technical. Comments Off

Undo and other new features

Yesterday, we rolled out a number of new features and improvements to our
new webmail interface. Here’s a quick run down of what’s new:

  • Undo. Accidentally moved a message, deleted a
    contact, marked something unread etc? No problem. Your last action can
    now be undone; just click the “Undo” link in the confirmation message.
    Or, if you’re a keyboard user, hit ‘z’. Note, you can’t undo sending a
    message.
  • The pinned status of a message is now shown at the top on the
    conversation read screen, so you can see it even if the message is
    collapsed.
  • Security options and logs are now grouped together in their own
    section under “Account”, to make it easier to manage the security of
    your account. This includes changing your password, seeing (and remotely
    logging out) any existing sessions, and creating alternative
    logins.
  • The mailbox screen now shows an icon next to messages that have been
    replied to. With conversations enabled, this shows if the most recent
    message in the folder for that conversation has been replied to.
  • The “More” menu at the top right of each expanded message now has a
    “Reply to Sender” option if the message was sent to a mailing list, and
    an “Edit as New” option for all messages.
  • The unread count is now shown first in the title of a page, so you
    can still see it even if the tab cuts the title short.
  • Better support for non-conversations mode. Now faster and fully
    non-conversational: replies to messages are no longer threaded with the
    message being replied to.

And, of course, several more minor refinements and bug fixes.

Posted in News. Comments Off

Inter-tab communication using local storage

A few weeks ago we launched our new webmail service for all users at FastMail. Once being used by a wider audience, we of course received reports of a few edge cases our testing hadn’t managed to uncover. One of the more interesting issues we discovered came from this use case: our user liked to scroll down his inbox, opening each email he wanted to read in a new tab in the background. Then he would go through the tabs, closing each one as he was done with it. So far, so good. Except, in Chrome, his browser of choice, as soon as about 5 tabs were open, the rest failed to load, and the earlier ones then started having communication errors as well.

A quick bit of research and testing yielded the problem: Chrome limits itself to a maximum of 6 concurrent connections to a single origin across the whole browser. Each tab was loading a full instance of the mail application, which meant it was creating an EventSource object and connection to our push server, to be notified of new deliveries (see this previous post for how that works). Since these connections are permanent (that’s the whole idea!), opening lots of tabs quickly used up all the available connections, with none left to fetch any actual data. To the user, this appeared as “Could not connect to server” error messages.

The solution to this problem was not immediately obvious. Ideally, we would like to maintain a single push connection and share it between the tabs, but there’s no API for getting a reference to other tabs or windows in the browser, even if they’re pointed to the same domain. Then I remembered that setting a property on local storage triggers a “storage” event on the window object of every open tab with the same origin. This, I realised, could be used to synchronise behaviour across tabs.

The concept is fairly simple. Only one tab keeps a push connection; we call this the master tab. When it receives a push event, it broadcasts it by setting the event as a property on local storage called “broadcast”. When a tab receives the storage event for this key, it reads the JSON-encoded event object from local storage and processes it as though it had been received via an EventSource object.

The tricky part comes in coordinating between the tabs who should be master. The master tab also sets a value called “ping” on local storage roughly every 30 seconds to the current time stamp. When a tab first loads it checks for this value; if it is greater than 45 seconds ago it presumes there is no current master, so it becomes master. Otherwise, it becomes a slave. However, whilst it is a slave, it continuously monitors for storage events with a key of “ping”, and if it hasn’t heard a ping within a 45 second period, it takes over as master. This switches control to another tab when the master tab closes. On browsers supporting the “unload” event we can make the changeover happen pretty much instantly, by setting the “ping” value to 0 in local storage when the tab is closed.

This all works very well, but there’s one problem remaining: race conditions. There is no API for taking out an explicit lock on local storage, so the spec advocates the use of a per-origin mutex which would be acquired by scripts once they try to access the storage, and then released when the script finishes. Not all browsers have adopted this. The Chrome developers, for example, have decided the performance penalty is too great. Therefore, in some browsers, it is possible for scripts in different tabs to interleave such that, for example, each tries to take master at the same time, then each notices another has taken it so none end up as master! The solution we have adopted is to add a random component to the delay between pings and waiting for pings. This makes it unlikely that two tabs will both attempt to take master at the same time. Of course this can still happen, but should it do so, the random variation in each new master sending out a ping should ensure that one is quickly turned back to a slave. It will be eventually consistent, which is good enough for our purposes.

In case this is of use to anyone else, here’s the code we use (rewritten slightly to use pure JS rather than be based on our library code). It’s also available as a gist on github. You can try it out on this test page; just open the page in several windows or tabs, then close the master and see the control pass to another. You can also broadcast a message from any tab to the other tabs.

function WindowController () {
    var now = Date.now(),
        ping = 0;
    try {
        ping = +localStorage.getItem( 'ping' ) || 0;
    } catch ( error ) {}
    if ( now - ping > 45000 ) {
        this.becomeMaster();
    } else {
        this.loseMaster();
    }
    window.addEventListener( 'storage', this, false );
    window.addEventListener( 'unload', this, false );
}

WindowController.prototype.isMaster = false;
WindowController.prototype.destroy = function () {
    if ( this.isMaster ) {
        try {
            localStorage.setItem( 'ping', 0 );
        } catch ( error ) {}
    }
    window.removeEventListener( 'storage', this, false );
    window.removeEventListener( 'unload', this, false );
};

WindowController.prototype.handleEvent = function ( event ) {
    if ( event.type === 'unload' ) {
        this.destroy();
    } else {
        var type = event.key,
            ping = 0,
            data;
        if ( type === 'ping' ) {
            try {
                ping = +localStorage.getItem( 'ping' ) || 0;
            } catch ( error ) {}
            if ( ping ) {
                this.loseMaster();
            } else {
                // We add a random delay to try avoid the race condition in 
                // Chrome, which doesn't take out a mutex on local storage. It's
                // imperfect, but will eventually work out.
                clearTimeout( this._ping );
                this._ping = setTimeout(
                    this.becomeMaster.bind( this ),
                    ~~( Math.random() * 1000 )
                );
            }
        } else if ( type === 'broadcast' ) {
            try {
                data = JSON.parse(
                    localStorage.getItem( 'broadcast' )
                );
                this[ data.type ]( data.event );
            } catch ( error ) {}
        }
    }
};

WindowController.prototype.becomeMaster = function () {
    try {
        localStorage.setItem( 'ping', Date.now() );
    } catch ( error ) {}

    clearTimeout( this._ping );
    this._ping = setTimeout( this.becomeMaster.bind( this ),
        20000 + ~~( Math.random() * 10000 ) );

    var wasMaster = this.isMaster;
    this.isMaster = true;
    if ( !wasMaster ) {
        this.masterDidChange();
    }
};

WindowController.prototype.loseMaster = function () {
    clearTimeout( this._ping );
    this._ping = setTimeout( this.becomeMaster.bind( this ),
        35000 + ~~( Math.random() * 20000 ) );

    var wasMaster = this.isMaster;
    this.isMaster = false;
    if ( wasMaster ) {
        this.masterDidChange();
    }
};

WindowController.prototype.masterDidChange = function () {};

WindowController.prototype.broadcast = function ( type, event ) {
    try {
        localStorage.setItem( 'broadcast',
            JSON.stringify({
                type: type,
                event: event
            })
        );
    } catch ( error ) {}
};

Posted in Technical. Comments Off

The technology behind the classic and new interfaces

I recently wrote a postmortem for our old interface, now I want to explain how the addition of a modern interface alongside our classic interface is different.

In short, classic is here to stay.

For all that the interface has looked similar over the past few years, it’s had many changes under the hood.

Much of the interface is fully internationalised, both in classic and new. The code is all shared with the My Opera Mail product, where multiple language support is a core requirement.

It works much more nicely on small devices, in particular with Opera Mini.

Where possible, changes for the new interface have been rewritten as shared “library code” and integrated into both interfaces simultaneously. Some things (search, for example) work differently. But most core logic, and of course all low level mail routing and storage, are fully shared across our infrastructure.

This all adds up to a pile of “invisible” work we have done to make maintenance easier in future. Even the new search uses the same query builder library, so back-porting fully cross-folder search capability to classic will be achievable if there is demand.

Unlike the old interface, which was a completely separate copy of the code and grew stale over the years, there was never a “fork” (as it’s called in software development) for the new interface.

Indeed, you may have noticed that many screens on the new interface are really just “rebranded” classic. It’s the same HTML code as the classic desktop and mobile screens, with a different title bar. When you go back to the Mail or Address tabs, it reloads the javascript and hands control back. This was a deliberate decision to speed up the areas of our site where people spend 99% of their time (statistic taken from logs, not made up) without duplicating the rarely used screens. The client-side mailbox screen uses less bandwidth and is more responsive than the classic mechanism of downloading an entire html page on every click.

When we say “supported indefinitely” it really does mean that we have no plans to remove classic. There’s no internal timeline in our heads. The core technology is used by both interfaces, and we’re updating them together.

Finally to address concerns about continued IMAP access.

We have invested heavily on improvements to the Cyrus IMAP server, both myself and Greg Banks in the Australian office (who was hired to work full time on Cyrus, and is doing an awesome job).

Our new conversation features are built directly into Cyrus, and fully integrated into its replication system. Other features like storing previews and undelete information along with messages have been created by adding support to Cyrus for the standard message annotations described in RFC5257, contributing that work back to the community.

You can read more about the Cyrus project at http://cyrusimap.org/. This reliable and standards compliant server is the core of our technology stack. We’re not moving away from IMAP, even as we extend the server to support our specific use-cases.

You can read (or even download and play with) the exact code that runs on our servers from
http://github.com/brong/cyrus-imapd/ – our production systems run on the “fastmail” branch.

Bron.

Posted in Technical. Comments Off

Changes to delete behaviour with conversations

Over the past week we have changed how deletion works in our new modern interface. In this blog post I will explain what those changes were, and some actions we have taken to ensure no emails are accidentally lost.

This is a technical blog post, so it contains a moderate level of technical detail.

I will address how our backup and disaster recovery system works, and how we used it this week to recover emails which we suspected to be accidentally deleted.

Some background

Last week we rolled out the new conversations-enabled interface. However, we discovered we had under-estimated the impact of conversations on users’ existing workflow.

In particular, many users did not realise that when they selected a single item in a folder, it represented the entire conversation (all related messages, including those the user had sent).

When they pressed ‘Delete’ with one or more conversations selected, it deleted all messages in those conversations, including messages in other folders. For example, deleting a conversation in Inbox could also delete messages from “Sent Items” and “Important – Keep”.

We have altered the ‘Delete’ action to be safer in these ways:

  1. in a folder: only messages which in the same folder are deleted from the selected conversations.
  2. when viewing search results: only messages which match the search query are deleted, messages which are in the conversations but outside the search are not.
  3. when an action will cause more than one message to be deleted from a conversation, a warning message is shown to describe what will happen. The user must explicitly disable this warning if they don’t want to see it again.

What about the time before these changes?

Rather than leaving users to hunt for which emails were affected, We wrote a tool to data-mine our mail server logs. We log every create and delete of emails, along with enough data to identify which ones were “Delete to Trash”. We can also identify if the action came from an IMAP client or the web interface.

We found emails which could have been accidentally deleted using the following algorithm:

  1. the action came from the web interface.
  2. more than one message from the same conversation was deleted within 10 seconds.

All the emails which matched these criteria were restored back into the folders they were originally deleted from, with a custom keyword added. This makes it easy for users to find them again. Every affected user has been emailed with instructions on how to identify the restored emails.

How we restored data

When you delete an email on the FastMail servers, it isn’t immediately removed from disk, even if you manually expunge via IMAP. We do this:

  1. to guarantee that our “Restore from backup” feature can always find all your emails, even if they were delivered and then deleted in between backup runs.
  2. to make deletes appear faster to users.
  3. to reduce the load on our IMAP servers. Removing files is actually one of the slowest operations you can run on a modern filesystem.

So we actually batch up all deletes and run them once per week at the least busy time for our servers – Saturday night in the USA. It’s weekend everywhere in the world then.

We also never remove email files within one week after deletion, so that our “Restore” feature can work as advertised.

This is, of course, in addition to the safety provided by replication to an offsite datacentre, and daily backups to a different server running a different operating system.

Immediate response

As soon as we realised we may have to restore emails, we disabled the automated weekend cleanup job, and started collecting data from our servers.

Discoverability

The problem is that it is hard to know that an email is not there unless you actively look for it. We could disable cleanup temporarily, but not forever. Our turnover is about 2% of total email volume per week, so the disks would fill up if we never deleted anything ever again.

We decided the safe way forwards was to undo every deletion which had even the slightest chance of being by accident.

That way, if no action is taken, a few extra emails sit on disk gathering dust. It’s possible at any later time to discover them and clean them up. There is no requirement to act quickly.

We take your privacy very seriously. No contents of emails were accessed during this task, and each user’s account was processed separately to ensure there was no risk of disclosing data. You can read more about our privacy policy here: https://www.fastmail.fm/help/overview_privacy.html.

Data collection

We log every single time a message is added to or expunged from any folder on out backend servers. We collected an initial dataset of nearly 30 million “Delete to Trash” events from the log files.

The next step was identifying which of these were a single action involving more than one message from the same conversation. Every message was tagged with a session identifier and timestamp as well as the folder and IMAP “UID” which uniquely identifies it, but we were not logging the CID (conversation identifier). We do now, but that doesn’t help with log lines from the past!

Finding the CID involved writing custom code to read the index file on disk (which still contains the deleted record) and extract the CID field for every deleted message.

Finally of course, there was processing the logs for every single connection from the web servers over that time frame and finding which deletes were related to each other. There’s nothing in the log to show that it’s the same command, so we applied a heuristic of “within 10 seconds” to account for the outside case of a busy server and large folders being processed.

Restoring messages

We use the Cyrus IMAP server. One of the utilities included is called ‘unexpunge’, and it can be used to recover deleted emails. This is different from our usual restore command, which extracts messages from various sources and appends them a new temporary folder.

In this case we want to restore messages permanently, so unexpunge is the right tool… except – we want to tag every message with a keyword, and we want it to be reliable. Finding the messages afterwards is messy. We chose to add a new feature to unexpunge, setting a user-defined keyword on each message as it is restored. It is robust, and there’s no gap where messages appear without the keyword

The chosen keyword is RESTORED-20121107. Our web interface already supports global keyword search with “flag:$name”, so the email to users includes a pre-generated URL which will perform a global search on all that user’s folders for messages which were restored.

Restores are in progress now. Once they are completed, thousands of users will have some messages restored. This is almost certain to include messages which were intended to be deleted, but we must err on the side of safety here.

We have built a very robust infrastructure because of our strong commitment to data safety. These restores are in line with this commitment. It is easier to delete unwanted messages again than to recover messages which no longer exist.

Posted in Technical. Comments Off

New interface and login screens rolled out

Summary

Almost a year ago we rolled out a new webmail interface for users to test on our beta server. During the last year we have made significant updates, improvements and tweaks, and today we are launching this as the main interface for our users, along with a redesigned homepage and website at www.fastmail.fm.

Features

The main improvements the new interface brings are:

  1. Speed — Opening and dealing with email is now much faster, with many actions happening instantly, thanks to a full redesign with modern technologies that allow for pre-fetching and caching.
  2. Conversations — Emails on the same topic can be grouped together into conversations (even across folders), so you can see the back and forth history of messages, replies and forwards. Of course, this can be turned off if you like to deal with all messages individually.
  3. Push updates — New email deliveries are pushed straight to your Inbox, so they appear instantly without you needing to refresh.
  4. Archiving — To make dealing with your Inbox easier, the new default action is to Archive messages. This allows you to quickly move messages you have dealt with out of your Inbox, but still keep them for searching and referencing at a later date.
  5. Simplicity — FastMail has always provided power, but in some cases that has caused extra complexity where it is not needed. We have reduced the complexity of many of our configuration screens to make common tasks easier. For example, setting up automated retrieval of email from an external account now requires just a username and password in most cases, the rest is automatically determined.

We’ve put together a short (2 minute) video showing how all these improvements (and more) come together to create a great experience for our users.

Existing interface

While we believe that this new interface is a huge improvement, if you would like to continue using the existing interface, you can do so by checking the “Use classic interface” checkbox on the login screen (click the More link on the login screen to see it), or by logging in via https://classic.fastmail.fm. Older browsers that do not have the required support for modern web technologies used by the new interface, such as Internet Explorer 6 or 7, will also continue to get the existing interface automatically.

If you are thinking of continuing to use the existing interface because of some particular feature you like, then also consider that an equivalent might be available in the new interface as well. For instance, if you really don’t like conversations, you can disable conversations on the Settings –> Preferences pane, select the “Show every message separately” radio button. Similarly, if you don’t like the preview on every message, you can also turn that off on the same page. Unfortunately you can’t show a preview for only unread emails, it’s either all or none with the new interface.

Help

We have not yet updated our help documentation; we are currently working on that and hope to have it done soon. We do not believe this is a major impediment to users using the new interface as most of the features are highly discoverable as needed.

Posted in News. Comments Off

Changes to FastMail service levels

Summary:

  •  Guest accounts discontinued for new signups (existing accounts remain)
  •  New personal service level: Premier, which is the same as Enhanced, but with increased storage and included SMS credits.
  •  60-day free trials for all service levels

Details:

After FastMail became a part of Opera Software, our technology was used to build the free My Opera Mail service. This has proven very popular, to the extent that we now have considerably more free @myopera.com accounts than free @fastmail.fm accounts.

We’ve therefore decided to consolidate FastMail as a premium brand with only paying accounts. This will allow us to continue to offer the configurability that FastMail users have come to expect, and also to improve our customer support for FastMail users. Users that want to sign up a free account should go to the My Opera Mail service.

To this end, we are discontinuing sign-ups and downgrades to the Guest service level. Existing Guest accounts will continue to work. We have no plans to cancel active Guest accounts. However, if a Guest account is deactivated because it has not been used for 120 days, then it will not be reactivated.

To give people a chance to try out FastMail before committing funds, we are introducing a 60-day free trial for our paid service levels. You may have already noticed this if you have visited our sign-up pages recently.

In addition, we are also introducing another personal service level: Premier. This will have all the features of the Enhanced level, including Priority support, but with 60 GB mail storage, 30 GB file storage and 400 SMS for $119.95/year. The features of Enhanced have not changed in any way.

 

Posted in News. Comments Off
Follow

Get every new post delivered to your Inbox.

Join 4,622 other followers