Skip to content

Multi-data center support #27

@misterbisson

Description

@misterbisson

Data center awareness

WordPress + HyperDB supports running in multiple data centers. The HyperDB config includes comments on how to configure it for data center awareness:

/**
 * Network topology / Datacenter awareness
 *
 * When your databases are located in separate physical locations there is
 * typically an advantage to connecting to a nearby server instead of a more
 * distant one. The read and write parameters can be used to place servers into
 * logical groups of more or less preferred connections. Lower numbers indicate
 * greater preference.
 *
 * This configuration instructs HyperDB to try reading from one of the local
 * slaves at random. If that slave is unreachable or refuses the connection,
 * the other slave will be tried, followed by the master, and finally the
 * remote slaves in random order.
 * Local slave 1:   'write' => 0, 'read' => 1,
 * Local slave 2:   'write' => 0, 'read' => 1,
 * Local master:    'write' => 1, 'read' => 2,
 * Remote slave 1:  'write' => 0, 'read' => 3,
 * Remote slave 2:  'write' => 0, 'read' => 3,
 *
 * In the other datacenter, the master would be remote. We would take that into
 * account while deciding where to send reads. Writes would always be sent to
 * the master, regardless of proximity.
 * Local slave 1:   'write' => 0, 'read' => 1,
 * Local slave 2:   'write' => 0, 'read' => 1,
 * Remote slave 1:  'write' => 0, 'read' => 2,
 * Remote slave 2:  'write' => 0, 'read' => 2,
 * Remote master:   'write' => 1, 'read' => 3,
 *
 * There are many ways to achieve different configurations in different
 * locations. You can deploy different config files. You can write code to
 * discover the web server's location, such as by inspecting $_SERVER or
 * php_uname(), and compute the read/write parameters accordingly. An example
 * appears later in this file using the legacy function add_db_server().
 */

Though MySQL is not the only service that needs data center awareness:

  • Nginx connects to WordPress
    • Nginx should probably not bother connecting to WordPress instances in other data centers, but if there are no local WP instances...
  • WordPress connects to Memcached, MySQL, and NFS
    • Making requests to Memcached across the WAN is probably slower than requesting from a local MySQL replica, but creating separate Memcached pools in each data center creates consistency problems (Facebook's mcrouter support for replicated pools claims to solve that, but I've never used it personally)
    • Awareness of MySQL primary and replica topology is critical, but it's OK if the primary is in a remote data center and the replica is local (WP+HyperDB will read its writes, so replication delay is not a problem). It should probably prefer local replicas, but if there are none locally...
    • NFS over the WAN would be very slow; even if it were tolerable, it's not supported in RFD26 and probably not wise
  • Memcached does not connect to anything else
  • MySQL replicas connect to the MySQL primary
  • NFS does not connect to anything else
    • It could be backed up to an object store or replicated across multiple volumes (see https://syncthing.net for an example), but those introduce consistency questions if both sides are writing

Given the current implementation, it might be necessary to ignore performance issues with Memached and NFS transactions over the WAN. However, a better implementation would:

  1. Resolve cross-data center Memcached questions. This could involve implementing Facebook's mcrouter and replicated pools or ditching Memcached for Couchbase, which provides a Memcached-compatible interface with cross-data center replication
  2. Resolve cross-data center NFS questions. Object storage could be used as an exclusive alternative to filesystem storage, eliminating the need for NFS. It's possible that https://syncthing.net could provide sufficiently fast replication and sufficiently good conflict resolution. It's also possible that Nginx could be configured to force all http POST requests to WP instances a primary data center, to substantially reduce the risk of conflicts due to slow replication across the WAN. That would require that Nginx instances in the non-primary data center be able to connect to WP instances in the primary DC.

Requirements for full active-active data center support

Story: The application will be deployed in data centers in two different regions connected by a WAN. Browsers may reach either data center with approximately equal frequency. Operators will specify one data center for the primary database instance, and the application will route requests internally to the correct primary instance in the correct DC.

  • A VPN between the datacenters that connects the private networks of the two data centers)
  • Routes on each host so they can connect to the other data centers
  • Data center awareness in how to reach upstreams (see discussion above)

Questions to answer:

  1. What happens if the replica DC is partitioned from the end user client?
  2. What happens if the replica DC is partitioned from the primary DC?
  3. What happens if the primary DC is partitioned from the end user client?
  4. What happens if the primary DC is partitioned from the replica DC?

Requirements for a standby data center

Story: We need a minimal foot print of the application running in a remote data center so that we can quickly recover if the the primary data center fails. The replica data center is not handling any end-user requests under normal use, and there is no provision for automatic fail-over. This approach seeks to reduce challenges by eliminating activity in the replica data center that would cause frustration due to slow performance of requests over the WAN or inconsistency due to writes in separate DCs (Memcached and NFS).

  • A VPN between the datacenters that connects the private networks of the two data centers)
  • Routes on each host so they can connect to the other data centers

incomplete issue

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions