Thursday, April 16, 2009

Koha Disaster Recovery

presented by Clay Fouts

Clay's primary job is to maintain the centralized repository for the Liblime hosted Koha installations.

Disaster is inevitable.... people will make mistakes, hardware will fail, malicious infiltrators can gain access. What is important is to understand how to minimize downtime, reduce frequency of failures and how to not lose important data.

How is data stored in Koha?
  • Koha source files and related configuration files
  • Perl dependencies
  • Zebra index content
  • MySQL database contents
There are a number of trade-offs you have to make when dealing with disaster recovery, such as, speed, expense, reliability, flexibility. For example, it is possible to get 99.99% up time, however, the expense of setting up a system to support this kind of up time will cost lots and lots of money because you will need multiple redundant servers, with raid hard drives and built-in automatic failover.

Storage Media
Disk(Hard Drive) - Fastest way to move data, but not very portable and still expensive compared to other options
DVD/CD - Extremely cheap and portable, but doesn't hold lots of data.
Tape - Portable and supports more data then CD/DVD, but not as much as most hard drives. Very expensive...
Cloud - unlimited capacity, redundant, so you will not lose data. Nice because you will not have to maintain another piece of hardware. Limiting capacity is network bandwidth.

Simply making backups is NOT sufficient, you need to make sure that the backups actually work and that you are able to restore completely from them. Verify that the media doesn't degrade, that it is secure, and accessible. If you store the data off site will it be accessible in the middle of the night if you need to do recovery?

Where to Store the data?
Onsite - fast, cheap, easy access, but if the place burns, you lose everything.
Offsite - Still have backups if your place burns, but it is difficult, expensive and slow access if you ever need them.
Cloud - can be the best of both if your network supports it. Most providers have redundant disks stored at multiple facilities spread across the world.

Optimal strategy is some kind of combination, for example, storing onsite if you need quick backups, but also storing the data offsite incase of a disaster.

MySQL Data Backups (most important)
Most sensitive and impossible to rebuild it manually because you don't know who has checked out what, who owes what fines, etc. Should be backed up frequently using multiple methods. Recommends using MySQL's bin logs to back up data.
Logical using tools like mysqldump. Provides the ability to rebuild a database from raw SQL statements. Very portable and can even be moved to another platform. Slow to back up and restore. Database is inaccessible during backup operation.
Binary using tools like cp, dd, LVM snapshots. Faster than logical backups. Still blocks access during the backup, but for less time. LVM snapshots can be taken almost instantaneously. Need to script so that you prevent access to database while backup is taken because it does not do it inherently otherwise you will get corruption of data. Binary files and disk partitions are much larger than logical SQL backups. The backups are difficult to verify and less portable.
Mirroring using master/slave replication. Fatest because it's ongoing. The replication server can be used as a hot spare in case primary server goes down. Very useful in combination with logical and binary backups. Has other uses, for example, reports in which you can have the reports all generated based on the slave server and it won't degrade the performace of the master server. Can be useful in combination with logical and binary backups because you can do backups based on the slave, so it never locks up the master server. Can introduce inconsistency in regards to storing timestamps, etc. To combat the inconsistency is make sure that the master server is not overloaded and prevent network latency.

Code base backup
Useful if you have customized the data.

Zebra Index backup
Huge database and not unique, no data loss if not backed up, but rebuilding will increase recovery time. Could take hours to rebuild the indexes if you have lots of data. Make sure that you are not adding to the index while the backups are going on.

1 comment: