Shooting bits: juli 2014

The servers at the company I currently work for are backed up using backuppc. It does its job nicely, the interface is reasonable and we hardly ever have problems with it. But As time went by the number of servers grew and so did the number of files. We now have over 500.000.000 files in the backuppc pool and the backup totaled over 13TB, although after some cleanup we are now at around 11TB. When we hit 13TB our monitoring system went off. It was because we created the filesystem at around 14TB. So we set out to expand it, only to figure out that we couldn't do much. The maximum size of an ext4 filesystem is 16TiB, which is about 17.5TB on a system with 4k blocks. The backups are stored on a 10 disk raid-6 19.8TB (18TiB) volume. That means we are wasting about 2TB of space by using ext4, and we couldn't expand our backup volume beyond that at all. That was problem 1.
We had 2 backup servers, one on-site and one off-site. In case something happens in the primary datacenter, such as a fire, we would still have a reasonably recent backup available to restore to new servers, the cloud or whatever. All data from the primary backup server is synced to the remote machine using rsync over ssh over our office internet connection. That connection used to be a cable modem capable of doing around 60mbit. Yet we found out that we had a nightly change set of about 2TB worth of data. Synching that across a 60mbit connection takes about 4 days if we were able to achieve a constant 60 mbit. But the connection is shared with our employees who need to do things like skype calls, browse the weba nd send emails. That's problem 2.
We noticed that even though there was 60mbit to go around, we were not achieving that throughput even when we wanted to. Analysis in our monitoring system showed that the data source was utilizing 100% of the available disk io's. The reason is that we were using rsync to transfer the data between the source and the destination. Rsync is a great tool to sync files and even full directories, because it calculates deltas and only transfers the changes. But doing this over ssh (default these days) and for a direcory tree that has over 500 million files is crazy. That wouldn't be too bad by itself but since backuppc uses hardlinks to save space, rsync's memory useage would explode. Eventually the system would run out of memory and kill rsync just to stay alive. It would consume every resource it had. That's problem 3. We added memory to try and fix this.
But it did not improve the speed, because calculating the md5 hash for many small files is expensive. And many small files cause many small io's, which can't all be grouped together by the OS or the raid controller. And the raid-penalty of raid-6 hurt a lot also. So there we have problem 4, way too may iops were needed.

We first fixed problem 2, by upgrading our connection to the datacenter to a 500mbit fiber connection. That increased the available bandwidth enough to send the full 2TB change set within a day. We could even sync the full server contents within a day.

Problems 1 and 4 were harder to fix. To fix problem 4 we need to switch to block based backups/transfers, which allow the IO to be sequential and to be done in much larger blocks, drastically improving throughput. But problem 1 is more urgent, we must have more space to store the backups.

We ordered a server with 48 disk slots and created a ZFS filesystem on it. We only filled the server for 50% with disks, 4TB each, and created a 50TB zfs volume. This we host in the offsite location. ZFS gives us a bit slower read and writes, but with this many disks the throughput will still be much higher than what we have now and much higher than what we can offer the system across the fiber link, even if we were to upgrade the connection to more bandwidth. This allows us to store a number of block-based backups from the datacenter so that if accidents happen within the backups in the datacenter, which are synced to the offsite location, we can still revert to older versions of the backup.
It also allows us to take the existing off-site backup server out of production and move it to the datacenter, where we can set it up as a glusterfs brick. Since glusterfs uses XFS as its filesystem it allows 16ExiBytes of data. I hope we will ever reach that much data, but it'll probable be lots of years before we do ;) Using glusterFS also allows us to scale our storage needs in the datacenter much more easilly, both performance wise and size wise. We can just add extra 'bricks' when we need to.
Unfortunately glusterFS needs 2 bricks as a minimum before it can start, and only 1 brick is available at this time. We need to migrate the data from our existing dc backup server to our zfs machine and verify it works correctly. Remember the problem (4) with rsync and this many files? The standard way of transferring a backuppc pool (with BackupPC_tarPCCopy) does not work for us due to errors. Instead we were able to transfer the backuppc pool using a tool created by a member of the backuppc community: BackupPC_CopyPcPool.pl

We are now at the point where we will soon mount the copy'd date that lives on the remote ZFS server on our datacenter backup server, to test whether the pool transferred correctly and is still usable. If that works we can turn the datacenter backup node into a glusterfs brick, install backuppc on one of the glusterfs nodes and mount the cluster volume. We copy the data from the ZFS machine back to the DC and we can then turn the backups back on. We then have problem 1 solved.

For problem 4 we are looking into replacing rsync with a blockbased backup solution. We tested appassure, but it would crash due to the amount of files and hardlinks. Dell is working on improving their software, but they say that it's mostly because of our use of ext4. We'll retry AppAssure when we've completed the switch to glusterFS/XFS.
For the longer term the question arises whether BackupPC is really enterprise ready. We've found there are no tools to guarantee the consistency of the backup pool and very few tools to fix problems. Work is being done on backuppc 4, which may or may not improve the situation. For creating the primary backup of our servers we'll also be looking into other opensource solutions such as bacula and amanda. Perhaps they are better at this job.

[update 2016-04-24: I've written a followup here: http://shootingbits.blogspot.com/2016/04/why-i-moved-away-from-backuppc-for.html]

So I found myself setting up this mongoDB cluster with puppet. It turns out puppetlabs has its own mongodb module, so I started using that. The documentation told me to set up the class from the nodes definition like such:

class {'::mongodb::globals':
manage_package_repo => true,
}->
class {'::mongodb::server':
auth => false,
ensure => present,
rest => true,
dbpath => "/e/data/mongodb",
logpath => "/e/logs/mongodb/mongodb.log",
bind_ip => ["127.0.0.1" , "x.x.x.x"],
}->
mongodb_database { 'feed':
ensure => present,
tries => 10,
require => Class['mongodb::server'],
}

and that installed mongodb-org-server and started it. great. But every puppet run it would give me this message:
Error: Could not find a suitable provider for mongodb_database

And no matter what I tried, I could not get this fixed. Googling and reading puppet docs did not seem to help. There were no incidents or forks of the puppetlabs mongodb module that seemed to address this. In the end I asked the puppet irc channel for help, and it quickly turned out that in order for the module to create a database, the module needs the mongo command. And that command is provided (by installing the mongodb-org-tools rpm) by the mongodb::client class. Which I did not have defined, because the documentation did not specify I would. Adding that, the nodes definition became:

class {'::mongodb::globals':
manage_package_repo => true,
}->
class {'::mongodb::server':
auth => false,
ensure => present,
rest => true,
dbpath => "/e/data/mongodb",
logpath => "/e/logs/mongodb/mongodb.log",
bind_ip => ["127.0.0.1" , "x.x.x.x"],
}->
class {'mongodb::client':}->
mongodb_database { 'feed':
ensure => present,
tries => 10,
require => Class['mongodb::server'],
}

And then things started working.

Shooting bits

vrijdag 4 juli 2014

migrating backuppc from ext4 to glusterfs with xfs

donderdag 3 juli 2014

puppet-mongodb and the story of the missing provider