Class FailoverFileReplicationManager


public final class FailoverFileReplicationManager extends Object
Handles the replication of data for the failover and backup system.

In failover mode, only one replication directory is maintained and it is updated in-place. No space saving techniques are applied. A data index is never used in failover mode.

In backup mode, multiple directories (on per date) are maintained. Also, regular files are hard linked between directories (and optionally the data index).

Compression may be enabled, which will compare chunks of files by MD5 hash and GZIP compress the upload stream. This will save networking at the cost of additional CPU cycles. This is generally a good thing, but when your network is significantly faster than your processor, it may be better to turn off compression.

The compression also assumes that if the MD5 matches in the same chunk location, then chunk has not changed. To be absolutely sure there is no hash collision the compression must be disabled.

Files are compared only by modified time and file length. If both these are the same, the file is assumed to be the same. There is currently no facility for full checksumming or copying of data.

When the data index is enabled, the underlying filesystem must have the capabilities of ext4 or better (support 2^16 sub directories and over DataIndex.FILESYSTEM_MAX_LINK_COUNT links to a file).

To minimize the amount of meta data updates, old backup trees are recycled and used as the starting point for new backups. This dramatically improves the throughput in the normal case where most things do not change.

The data index may be turned on and off for a partition. Newly created backup directory trees will use the format currently set, but will also recognize the existing data in either format.

When data index is off, each file is simply stored, in its entirety, directly in-place. If the file contents (possibly assumed only by length and modified time) and all attributes (ownership, permission, times, ...) match another backup directory, the files will be hard linked together to save space.

When the data index is enabled, each file is handled one of three ways:

  1. If the file is empty, it is stored directly in place not using the dataindex. Also, empty files are never hard linked because no space is saved by doing so.
  2. If the filename is less than MAX_NODIR_FILENAME in length:
    1. The file is represented by an empty surrogate in it's original location
    2. A series of hard linked data chunks with original filename as prefix
  3. If the filename is too long:
    1. A directory is put in place of the filename
    2. An empty surrogate named "<A<O<SURROGATE>O>A>" is created
    3. A series of hard linked data chunks

A surrogate file contain all the ownership, mode, and (in the future) will represent the hard link relationships in the original source tree.

During an expansion process, the surrogate might not be empty as data is put back in place. The restore processes resume where they left off, even when interrupted.

Data indexes are verified once per day as well as a quick verification on start-up.

depending on the length of the filename. 16 TiB = 2 ^ (10 + 10 + 10 + 10 + 4) = 2 ^ 44 Each chunk is up to 1 MiB: 2 ^ 20 Maximum number of chunks per file: 2 ^ (44 - 20): 2 ^ 24 TODO: filename<A<O<S>O>A>... TODO: Can't have any regular filename from client with <A<O<CHUNK>O>A> pattern. TODO: Can't have any regular file exactly named "<A<O<SURROGATE>O>A>"

TODO: Handle hard links (pertinence space savings), and also meet expectations. Our ParallelPack/ParallelUnpack are a good reference.

TODO: Need to do mysqldump and postgresql dump on preBackup

TODO: Use LVM snapshots within the client layer

TODO: Support chunking from either data set: current file or in linkToRoot, also possibly try to guess the last temp file? This would allow to not have to resend all data when a chunked transfer is interrupted. This would have the cost of additional reads and MD5 CPU, so may not be worth it in the general case; any way to detect when it is worth it, such as a certain number of chunks transferred?

TODO: Support sparse files. In simplest form, use RandomAccessFile to write new files, and detect sequences of zeros, possibly only when 4k aligned, and use seek instead of writing the zeros. Could also build the zero detection into the protocol, which would put more of the work on the client and remove the need for MD5 and compression of the zeros, at least in the case of full 1 MiB chunks of zeros.

AO Industries, Inc.
See Also:
  • Method Details

    • checkPath

      public static void checkPath(String path) throws IOException
      Checks a path for sanity.
      1. Must not be empty
      2. Must start with '/'
      3. Must not contain null character
      4. Must not contain empty path element "//"
      5. Must not end with '/', unless is the root "/" itself
      6. Must not contain "/../"
      7. Must not end with "/.."
      8. Must not contain "/./"
      9. Must not end with "/."
    • checkSymlinkTarget

      public static void checkSymlinkTarget(String target) throws IOException
      Checks a symlink target for sanity.
      1. Must not be empty
      2. Must not contain null character
    • getActivity

      public static FailoverFileReplicationManager.Activity getActivity(Integer failoverFileReplicationPkey)
    • failoverServer

      public static void failoverServer(Socket socket, StreamableInput rawIn, StreamableOutput out, AoservDaemonProtocol.Version protocolVersion, int failoverFileReplicationPkey, String fromServer, boolean useCompression, short retention, String backupPartition, short fromServerYear, short fromServerMonth, short fromServerDay, List<Server.Name> replicatedMysqlServers, List<String> replicatedMysqlMinorVersions, int quotaGid) throws IOException, SQLException
      Receives incoming data for a failover replication. The critical information, such as the directory to store to, has been provided by the master server because we can't trust the sending server.
      backupPartition - the full path to the root of the backup partition, without any hostnames, packages, or names
      quotaGid - the quota_gid or -1 for no quotas
    • start

      public static void start() throws IOException, SQLException