The details of the various mount options for the ext3 filesystem are fairly well documented, but as with many things in the Unix world, knowledge is far easier to come by than wisdom. That’s a pithy way of saying that I had to do some digging to find recommendations, as opposed to explanations. So here are my recommendations for ext3 users (which encompasses the majority of the Linux-using world, as far as I can tell).
First of all, do yourself a favor and disable atime updates, using the noatime mount option. This yields a huge performance boost.
This is done by adding noatime to the appropriate lines in /etc/fstab (do it once for each ext3 filesystem that’s listed), in the fourth column, which probably says defaults now.
To make this change to a live, running filesystem, remount the drive (adjust this so that the right disk device is specified at the end of the line:
sudo mount -o noatime,nodiratime,remount,rw /dev/xvda1
(My understanding is that the noatime implies the nodiratime option, but I decided to add it just in case this was not true.)
atime is a relative of the well known file modification and creation timestamps, but it tracks access to file data. That means that if you read one byte from a file, even if it’s cached in RAM, you’re actually also triggering a write to the directory entry for that file, so that its atime can be updated. (If you want to slap your forehead now in disbelief, be my guest.) And if you read a ton of little files (which happens rather often in the unix world), that means a ton of writes to update all of their directory entries. You don’t want that, right?
But do you need it? Almost certainly not. It’s required by the POSIX standard, and the need for it to be present and turned on is well debated by people more knowledgeable about this in this thread from the Linux kernel mailing list. The summary of their argument is that it’s the kernel’s job to remain standards compliant, and only the distributor or user has enough information to know that they don’t care about that part of the standard and can safely disable it. I can understand that point of view.
Well, I did the reading, and you can safely disable it, unless you’re using mutt. If you’re using mutt, or if you’re just nervous about disabling something that somebody somewhere says you might maybe need someday, then disable atime for every filesystem that doesn’t have your mail spool on it, and use the relatime mode on that drive. ( relatime is a clever hack that simulates atime behavior while skipping the disk write in certain cases.)
“It depends” is not very satisfying, so an easy rule of thumb would be to use data=journal if you really, really want to ensure the durability of your data, and data=ordered if you can tolerate a teeny tiny chance of data corruption.
I measured all three journaling modes by running time sudo rsnapshot hourly on a VPS that backed up VPSs on the same physical server to a dedicated backup disk. In other words, the source was on the same physical server as the destination but they were on different disks.
rsnapshot uses hard links to share file data across backup sets, so backing up an unchanged directory twice takes a hardly any additional space compared to backing it up once. But it does need to do a bunch of disk reads and writes to make all the linked directory entries when it does this, so there is a fair amount of I/O involved: more than what rsync would need to just update a local directory to match the remote directory, but far less than what would be needed to make a separate copy of every file for each backup.
In abstract terms, the I/O for this backup process involves a lot of small reads and writes, and a very small number of medium or large writes for changed files. All of these occur as fast as the disk can service them, and the disk is quiet aside from this activity.
Here’s what I measured (in three test runs per journal type):
|Journal Type||Real Time|
|data=journal||2m05s, 2m57s, 2m51s|
|data=writeback||2m03s, 1m18s, 1m22s|
|data=ordered||2m12s, 1m30s, 1m20s|
For this application, data=journal takes twice as long as the others, while data=ordered runs just as fast as data=writeback while providing some additional protection.
So data=writeback is useless in my case, and the fact that data=ordered is the default makes sense. You get almost the same level of data protection as with data=journal, but with the performance of data=writeback. Different I/O patterns will give different results, but I suspect that the pattern I tested with is the most common in real server usage. (Note that in ext3’s v1 journal format, data=journal was the only journal behavior.)
My inclination is to stick with the default setting, even using data=ordered on database servers, since the database is doing its own higher-level journaling in the form of a transaction log. I’m basing this recommendation on this detail from the Gentoo article:
When appending data to files, data=ordered mode provides all of the integrity guarantees offered by ext3’s full data journaling mode. However, if part of a file is being overwritten and the system crashes, it’s possible that the region being written will contain a combination of original blocks interspersed with updated blocks.
Since a database transaction log is generally appended to rather than overwritten, my understanding is that it will protect against the above scenario in which data=ordered can cause a mix of old and new data. The database’s data files may have a mix of old and new data, but the transaction log would not show that the transaction have been completed yet, so it would be re-run during recovery and the remaining old data would be removed. I think.
The usage pattern where data that you really care about is overwritten regularly (as opposed to logs, which simply append) is rare in my experience, except in the case of database servers which are covered by their own logs as I just mentioned. So I don’t know of a particular application type that demands the full data journaling mode.
Anyway, I recommend against data=writeback altogether, unless you don’t mind some data corruption if there’s a power failure. The speed gain I measured isn’t worth the risk, in my opinion.