New Folders screen and features released

We’ve just rolled out our new folder management screen to all our production servers. You can access it either via the Options screen then the Folders link, or from the (Edit/Refresh) link at the top of the folder tree on the Mailbox screen immediately after logging in.

The new screen makes it much easier to create sub-folders of other folders, to move and rename multiple folders, and includes many new features that can be controlled for each folder.

  • Empty link – Add an [Empty] link to this folder on the Mailbox screen to allow easy deletion of all its messages
  • Spam learning – Teach your personal spam filter that new messages in this folder are spam/non-spam. Folder is only scanned daily
  • Auto-purge – When messages in this folder meet the selected criteria they will automatically be PERMANENTLY deleted, unless they have been flagged
  • Hide folder – Hide this folder from the folder list on the mailbox screen. This does not affect IMAP clients or IMAP subscriptions

Additionally it’s much easier to edit multiple folders at once, including the ability to set many or only a few specific features/properties on a large set of folders at a time.

If you have any comments or encounter any problems with the new screen, please post to our forum thread here.

Posted in News. Comments Off

ReiserFS bugs, 32-bit vs 64-bit kernels, cache vs inode memory

A while back we bought some nice new IBM Xeon based servers as IMAP servers. Because email is an IO intensive application, we bought each of these machines with 12G of RAM so we could do as much caching as possible. Our previous machines with 8G of RAM showed that quite a lot of that RAM was eaten up as “active/application” RAM, leaving only about 1G to 2G of memory for caching, so we thought that by getting 12G we’d be leaving a lot more RAM available for caching.

So it was quite surprising when after a few days of running, we saw memory stats like this:

             total       used       free     buffers     cached
Mem:      12466848   12419764      47084      463564    1550232
-/+ buffers/cache:   10405968    2060880
Swap:      2048276      69828    1978448

All 12G of memory was being used, but only 1.5G was for caching, the other 10G+ was “active/application” memory again. How is it that a 12G server doing less work, with less running processes than a server with 8G was using more memory?

Well after some debugging work with Andrew Morton and Chris Mason, we found that the memory was being used due to a bug in ReiserFS where if you use data=journal with particular workloads, it was leaking “zero-refcount pages on the LRU”. As Chris noted, “The fastmail guys find all the fun bugs” (see here for some an example of a previous bug only our workload seemed to be hitting regularly)

So after Chris produced a patch, we tried it out. The good news was that it appeared to fix the leaking problem, “active/application” memory no longer increased to take up 10G+. The bad news was that neither did the cache memory seem to increase beyond 1.5G or so either, leaving us with about 10G+ of free memory not being used by anything!

Some more investigation suggested that the problem was that the machine was running out of “low memory”, which is the only memory that can be used for the inode caching, and when the inodes were being reclaimed, the page cache for the inodes was also being reclaimed. In general, most systems don’t need to cache lots and lots of inodes, but because of the way cyrus (our IMAP server) works, it stores each email on disk as a separate file. A quick calculation suggests that this one server alone had >50 million files on it. Because of the continuous access to lots of separate small files, it was causing the low memory to fill up with inodes items, causing older ones to quickly fall out and reclaim the associated cache memory.

Andrew suggested we try running a full 64-bit kernel, because that removes the low memory limitations that the 32-bit kernel introduces. Now these new servers support the x86-64 64-bit computing extensions, but because the rest of our existing IMAP servers don’t, we are running 32-bit kernels with PAE enabled to address memory > 4G. Previously this has never appeared to be a problem, it just worked.

We decided to try a 64-bit kernel on the new machines, and once we did that, everything came together nicely. So with the ReiserFS patch stopping the memory leak, and the 64-bit kernel removing the inode caching limitation, we can now use the full 12G in these machines for caching inodes and disk. The result has been a nice decrease in system load average as considerably more indexes and emails are now kept hot in the memory cache on these machines.

             total       used       free     buffers     cached
Mem:      12295924   12229816      66108     1503084    7841212
-/+ buffers/cache:    2885520    9410404
Swap:      2048276     121500    1926776

Some graphs really help illustrate this as well.

You can see on the left the 32-bit kernel before the patch, where all memory is listed as “apps”. Then in the middle you can see the ReiserFS patch, where most memory is left as “unused”. The spike in the middle was caused by some tests on a single 10G file which did cause the cache memory to be used. The right hand side shows with the ReiserFS patch and a 64 bit kernel, which shows most memory now being used for the cache.

Here you can see how before the 64-bit kernel, it wasn’t possible to have more than 100,000-200,000 inodes in the inode cache at a time. After the 64-bit kernel, the inode cache can easily grow up to almost 2 million items with no problems.

Kernel performance optimisation can often seem a bit of a dark art. There’s lots of potential bottle neck areas (network, IO, memory, CPU, scheduler, all sort of different caches, etc) and a number of knobs to change, including some not really even documented (eg lowmem_reserve_ratio), and things can change from one kernel version to the next. On top of that, when you run into bugs that people either other people haven’t run into, or don’t actually realise they’re running into, it can be interesting/frustrating process to investigate and dig to find out what’s actually going on.

In this case, it was nice to find a solution and be able to make the most of the new servers we bought.

Update (25-Sep-07): Someone suggested we try altering the value of vfs_cache_pressure in /proc/sys/vm/. According to the kernel documentation, vfs_cache_pressure:

Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure = 100 the kernel will attempt to reclaim dentries and inodes at a “fair” rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.

I tried lowering this, and it does help (cache goes from 1.5G up to 2.5G, and maximum inodes in use goes from < 100,000 up to about 250,000), but it’s still not nearly as much compared to what the 64-bit kernel is able to achieve by eliminating low zone memory altogether.

Posted in Technical. Comments Off

New photo gallery in beta testing

With Neils Jenkins great work implementing a new photo gallery mockup (mockup discussion forum thread), I’ve now gone and implemented the actual code to generate these for users real photo galleries.

Currently it’s only on the beta server, so you have to use a special syntax to access it. Basically if your current photo gallery URL is:

http://myphotos.myname.fastmail.fm/albumname/

Then to access it via the beta server you use:

http://www.fastmail.fm/userdirsb/myphotos.myname.fastmail.fm/albumname/

As an example, try this link which is a real link to a photo gallery that works:

http://www.fastmail.fm/userdirsb/testphotos.robm.fastmail.fm/

Using the above link syntax should show any existing photo galleries you have using the new style. For more discussion and to provide feedback about this new photo gallery format, please see use this forum thread.

Posted in Technical. Comments Off

Firewall rules on all servers updated

Bron has just updated the firewall rules on all our servers. The net result of this is that users should see no changes at all.

However if in the last 3 hours you’ve suddenly found that some SMTP port or web proxy or IMAP connection mechanism you were using has suddenly stopped, please report the details in this forum thread and we’ll look into it.

Posted in Technical. Comments Off

CSS and Javascript files now compressed

Ever since we started FastMail, we’ve been using server side compression to compress and serve HTML pages, first with mod_accel and more recently nginx. Using this significantly improves the download speed of HTML pages and makes the site significantly faster for end users.

However for a long time we haven’t compressed javascript or stylesheet (css) data. The main reason for this has been bugs over the years that have made this unreliable, most notably these bugs and others in Internet Explorer.

http://support.microsoft.com/kb/823386

http://support.microsoft.com/kb/327286

Since more and more sites are now compressing these files, and IE6 is being replaced rapidly by IE7 and Firefox, I’ve decided that it’s time again to turn on compression for these files and see how they fare. I’ll update this post if there’s significant problems and I have to turn it off, but hopefully we can now leave this on as a standard feature, saving a bit more time on page loading.

Posted in Technical. Comments Off

Wap server moved to a new machine

We’ve moved the wap interface to a different machine.  Hopefully the failover should be painless, I’ve already done some testing and all the features seem to work still.

Posted in Technical. Comments Off
Follow

Get every new post delivered to your Inbox.

Join 4,999 other followers