Wednesday, September 02, 2015

How MySQL-Sandbox is tested, and tests MySQL in the process

MySQL-Sandbox is a great tool for testing a new release, and in fact this is what I do when a new MySQL tarball becomes available. I don't think many people are aware of the full testing capabilities of the sandbox, though.

When you think about testing, you may just think of creating a sandbox with the new tarball, and then hammering it with your pet procedure. That works, of course, as the main purpose of MySQL-Sandbox is to allow you to do just that. There is, however, a full test suite that can tell you in a short while if your tarball is compatible with the past or not.

This procedure is quite strict. It has happened several times that I caught a bug in a new release of MySQL, or Percona Server, or MariaDB, just by running this suite.

How MySQL-Sandbox gets tested

Before describing how to test, I would like to show what I do. When a new version of MySQL-Sandbox is ready (and I happen to have time, because, hey! I also have a day job!) I subject it to the full test suite, which is about 500 different tests (the number may vary depending on the operating system and the MySQL version being tested). Then I repeat the full test for every version that I have stored in my test machines. That is, from version 5.0 to 5.7, passing through Percona Server and MariaDB forks, I test about a dozen versions:

5.0.96
5.1.73
5.5.44
5.6.25
5.7.5
5.7.7
5.7.8
5.7.9
mariadb 10.0.20
mariadb 10.1.6
mariadb 5.5.40
percona server 5.5.43
percona server 5.6.25

The above versions change when new releases are available. I repeat the same test in two different machines, covering OSX and Linux, for a grand total of ≈ 12,000 tests before I upload the tarball to CPAN.

What does the MySQL-Sandbox test suite do

There are 20 files in the ./t directory, each one starting a set of thematic tests:

  • t/01_modules.t tests the MySQL Sandbox module
  • t/02_test_binaries.t tests the completeness of MySQL-Sandbox deployment
  • t/03_test_sandbox.t is the longest and more comprehensive test in the suite. It installs every type of sandbox, and runs basic functinal tests for each. It also includes an expensive test for sandbox installation with pattern recognition. In total, it runs for more than 10 minutes. This is where new versions may fail if they are not fully compatible with previous ones.
  • t/04_test_sbtool.t is a test for the companion tool, which can do many operations, including installing and testing hierarchical replication.
  • t/05_test_smoke.t tests a sandbox with the basic functionalities that were considered important to test manually when I was working at MySQL. Tired of testing it manually, I scripted the procedure. It may not be so vital now, but it does not hurt to run it.
  • t/06_test_user_defined.t is a demo test for the simplest user-defined tests implemented with the sandbox
  • t/07_test_user_defined.t does the same as the above, but testing replication instead of a single server
  • t/08_test_single_port_checking.t tests that we can install a sandbox multiple time, with automatic port choice
  • t/09_test_multiple_port_checking.t tests group sandboxes for the above feature
  • t/10_check_start_restart.t tests that we can restart sandboxed servers with options set on-the-fly
  • t/11_replication_parameters.t tests the installation of replication groups with user parameters that affect one or more nodes.
  • t/12_custom_user_pwd.t tests the ability of defining your own passwords instead of using the defaults
  • t/13_innodb_plugin_install.t tests the installation of the innodb plugin (it runs only only for version 5.1)
  • t/14_semi_synch_plugin_install.t tests the installation and functioning of the semi-synchronous plugin (requires version 5.5+)
  • t/15_user_privileges.t tests that the sandbox users have the privileges that were expected
  • t/16_replication_options.t similar to test #11, but testing a different set of options
  • t/17_replication_flow.t tests that regular and circular replication groups can transfer data as expected.
  • t/18_force_creation.t tests that we can create a sandbox by overwriting an existing one
  • t/19_replication_gtid.t tests replication with GTID
  • t/20_add_option.t tests the ability of restarting a server with a permanent new option.

How to test a new tarball with MySQL-Sandbox

You do not need to go to the full testing of every version. That's done to make sure that MySQL-Sandbox does not have regressions, and works as expected with all the versions. But if your purpose is to make sure that your new tarball is ready to be used, a simple pass of the test suite will do. Here are the steps:

  1. Get the tarball. For this demonstration, we will use the latest Percona Server: Percona-Server-5.6.25-rel73.1-Linux.x86_64.ssl100.tar.gz
  2. We will need to download the MySQL-Sandbox code. It is not enough to have it installed, as we will need visibility and access to the test files.
    $ git clone https://github.com/datacharmer/mysql-sandbox.git
    $ cd mysql-sandbox
    
  3. We need the tarball to be extracted in the default directory ($HOME/opt/mysql):
    $ make_sandbox --export_binaries --add_prefix=Perc /path/to/Percona-Server-5.6.25-rel73.1-Linux.x86_64.ssl100.tar.gz
    
    This step moves the tarball under $HOME/opt/mysql, creates a directory named Perc5.6.25, and installs a sandbox from that new directory.
  4. The previous step is only needed for two things: creating the directory in the right place, and making sure the tarball can be installed. Sometimes this step fails because of some surprising incompatibility. At this point, we can remove the new sandbox:
    $ sbtool -o delete -s ~/sandboxes/msb_Perc5_6_25
    
  5. Now we are ready, and we can start the test:
    $ perl Makefile.PL
    $ make
    $ export TEST_VERSION=Perc5.6.25
    $ make test
    

This will run for quite a while. Depending on the power of your server, it can take from 20 to 40 minutes.

$ make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/01_modules.t ...................... ok
t/02_test_binaries.t ................ ok
t/03_test_sandbox.t ................. Testing <5.6.26>. Please wait. This will take a few minutes
t/03_test_sandbox.t ................. ok
t/04_test_sbtool.t .................. ok
t/05_test_smoke.t ................... ok
t/06_test_user_defined.t ............ ok
t/07_test_user_defined.t ............ ok
t/08_test_single_port_checking.t .... ok
t/09_test_multiple_port_checking.t .. ok
t/10_check_start_restart.t .......... ok
t/11_replication_parameters.t ....... ok
t/12_custom_user_pwd.t .............. ok
t/13_innodb_plugin_install.t ........ # Skipping version 5.6.26 for this test. It is not in the required range (5.1.6 - 5.1.99)
t/13_innodb_plugin_install.t ........ ok
t/14_semi_synch_plugin_install.t .... ok
t/15_user_privileges.t .............. ok
t/16_replication_options.t .......... ok
t/17_replication_flow.t ............. ok
t/18_force_creation.t ............... ok
t/19_replication_gtid.t ............. ok
t/20_add_option.t ................... ok
All tests successful.
Files=20, Tests=495, 1560 wallclock secs ( 0.22 usr  0.03 sys + 480.89 cusr 73.55 csys = 554.69 CPU)
Result: PASS

CAVEATS: When you run this test, you must be aware of three things:

  1. If you have sandboxes running in $HOME/sandboxes, they will be stopped when the test starts. This is a necessity to avoid clashes, as the test needs to start every possible combination.
  2. If your sandboxes were running in a non-standard location, you need to stop them manually, as they may make the test fail. The test will not care if the main server runs (on port 3306.) It will not clash, and it will not attempt to shut it down.
  3. If you stop the test during the execution, you may end up with orphaned servers, which are installed under ./t/test_sb. If that happens, you will need to stop and eventually remove them manually.

Running only a subset of the tests

If you want to test only something specific, you can do that by invoking one of the tests mentioned above. For example, if you want to test only basic replication, you can do this:

$ perl Makefile.PL
$ make
$ export TEST_VERSION=Perc5.6.25
$ prove -b -v t/17_replication_flow.t
t/17_replication_flow.t ..
1..24
installing and starting master
installing slave 1
installing slave 2
starting slave 1
. sandbox server started
starting slave 2
. sandbox server started
initializing slave 1
initializing slave 2
replication directory installed in $HOME/sandboxes/repl_deployment
ok - Replication directory created
# Master log: mysql-bin.000001 - Position: 3667 - Rows: 2
# Testing slave #1
ok - Slave #1 acknowledged reception of transactions from master
ok - Slave #1 IO thread is running
ok - Slave #1 SQL thread is running
ok - Table t1 found on slave #1
ok - Table t1 has 2 rows on #1
# Testing slave #2
ok - Slave #2 acknowledged reception of transactions from master
ok - Slave #2 IO thread is running
ok - Slave #2 SQL thread is running
ok - Table t1 found on slave #2
ok - Table t1 has 2 rows on #2
# TESTS :    10
# FAILED:     0 (  0.0%)
# PASSED:    10 (100.0%)
ok - Replication test was successful
# executing "stop" on $HOME/sandboxes/repl_deployment
executing "stop" on slave 1
executing "stop" on slave 2
executing "stop" on master
# executing "clear" on $HOME/sandboxes/repl_deployment
executing "clear" on slave 1
executing "clear" on slave 2
executing "clear" on master
sandbox at <$HOME/sandboxes/repl_deployment> has been removed
ok - Regular replication directory repl_deployment removed
installing node 1
installing node 2
installing node 3
# server: 1:
# server: 2:
# server: 3:
# server: 1:
# server: 2:
# server: 3:
# server: 1:
# server: 2:
# server: 3:
Circular replication activated
group directory installed in $HOME/sandboxes/repl_deployment
ok - circular replication installed
# Master log: mysql-bin.000001 - Position: 1126 - Rows: 2
# Testing slave #1
ok - Slave #1 IO thread is running
ok - Slave #1 SQL thread is running
ok - Table t1 found on slave #1
ok - Table t1 has 2 rows on #1
# Testing slave #2
ok - Slave #2 IO thread is running
ok - Slave #2 SQL thread is running
ok - Table t1 found on slave #2
ok - Table t1 has 2 rows on #2
# TESTS :     8
# FAILED:     0 (  0.0%)
# PASSED:     8 (100.0%)
ok - Replication test was successful
# executing "stop" on $HOME/sandboxes/repl_deployment
# server: 1:
# server: 2:
# server: 3:
executing "stop" on node 1
executing "stop" on node 2
executing "stop" on node 3
# executing "clear" on $HOME/sandboxes/repl_deployment
# server: 1:
# server: 2:
# server: 3:
executing "clear" on node 1
executing "clear" on node 2
executing "clear" on node 3
sandbox at <$HOME/sandboxes/repl_deployment> has been removed
ok - Circular replication directory repl_deployment removed
ok
All tests successful.
Files=1, Tests=24, 69 wallclock secs ( 0.05 usr  0.01 sys +  3.24 cusr  2.57 csys =  5.87 CPU)
Result: PASS

You can repeat the procedure for every file ./t/*.t

Writing your own tests

As a parting thought, let me mention again that you can create your own user-defined tests using the sandbox simple hooks.

Here is a sample user defined test that you can run using test_sandbox:

There are two kind of tests: shell and sql The test type is defined by a keyword followed by a colon.

The 'shell' test requires:

  • a 'command', which is passed to a shell.
  • The 'expected' label is a string that you expect to find within the shell output. If you don't expect anything, you can just say "expected = OK", meaning that you will be satisfied with a ZERO exit code reported by the operating system.
  • The 'msg' is the description of the test that is shown to you when the test runs.


shell:
command  = make_sandbox $TEST_VERSION -- --no_confirm --sandbox_directory=msb_XXXX
expected = sandbox server started
msg      = sandbox creation

The 'sql' test requires

  • a 'path', which is the place where the test engine expects to find a 'use' script.
  • The 'query' is passed to the above mentioned script and the output is captured for further processing.
  • The 'expected' parameter is a string that you want to find in the query output.
  • The 'msg' parameter is like the one used with the 'shell' test.

sql:
path    = $SANDBOX_HOME/msb_XXXX
query   = select 10 * 10
expected = 100
msg      = checking database response

All strings starting with a $ are expanded to their corresponding environment variables. For example, if $SANDBOX_HOME is /home/sb/tests, the line below will expand to


command = /home/sb/tests/msb_5_1_30/stop

It is a good idea to finish every test with a cleanup. Here, we simply stop the server


shell:
command  = $SANDBOX_HOME/msb_XXXX/stop
expected = OK
msg      = stopped

To run this example, you have two options:

  • Run it directly with test_sandbox:
    $ test_sandbox --versions=Perc5.6.25 --tests=user --user_tests=name_of_your_file.sb
    
  • Or create an harness like the ones in the test suite. See for example t/06_test_user_defined.t, which then invokes the test proper, which is check_single_server.sb.
    $ export TEST_VERSION=Perc5.6.25
    $ cat t/06_test_user_defined.t
    use lib './t';
    use Test_Helper;
    test_sandbox( 'test_sandbox --user_test=./t/check_single_server.sb', 3);
    
    In this code, the '3' after the test name is the number of tests expected to run.

If this paradigm is too simple (and I know that sometimes it is) you can write your own plugins in Perl, using as examples the ones in the suite. e.g.:

$ cat t/08_test_single_port_checking.t
use lib './t';
use Test_Helper;
test_sandbox( 'test_sandbox --tests=user --user_test=./t/single_port_checking.sb.pl', 6);

The perl plugin requires, of course, some knowledge of Perl, but they allow a greater flexibility to create your own checks.

Monday, August 31, 2015

MySQL replication in action - Part 5 - parallel appliers

Previous episodes:

Parallel replication overview

One of the main grievance of replication users is that, while a well tuned master server can handle thousands of concurrent operations, an equally tuned slave is constrained to work on a single thread. In Figure 1, we see the schematics of this paradigm. Multiple operations on the master are executed simultaneously and saved to the binary log. The slave IO thread copies the binary log events to a local log, and on such log the SQL thread executes the events on the slave database. When the master is very active, chances are that the slave lags behind, causing hatred and nightmares to the DBAs.

Single applier

Figure 1 - Single applier

Until 2011, there was little remedy to this problem. Then we saw the first implementation of parallel apply, provided by Tungsten Replicator. The parallelism is schema-based, and provides 5x improvement on the slave performance.

Shortly after that, the same algorithm of schema-based parallel applier was implemented —with the same performance gain— in MySQL 5.6 (Figure 2). This implementation is valuable for those organizations that have data split by schemas, which can happen for those companies that have adopted sharding of data and split their logical chunks in different schemas, or for those companies that run multi-tenant services, where splitting data by schema is a necessity. This solution does not meet the needs of all users. Many systems are based on a single very active schema and perhaps a few ancillary ones with minimal traffic. For those users, parallel replication by schema is useless.

Parallel applier

Figure 2 - Parallel applier

In more recent times, we have seen three more implementations:

  1. In MariaDB 10, parallel appliers are based on information generated by the master when using group commit;
  2. In MySQL 5.7, a new parallelism mode was introduced, based on the events logical clock: the events that have the same timestamp are safe to execute in parallel.
  3. In MariaDB 10.1, there is an optimistic mode that breaks the commit order that was guaranteed by the group-commit algorithm, and allows higher performance.

We are not going to measure the performance of all these methods. Having benchmarked a few of them, I know by experience that you can get enormous performance gains, but depending on the data traffic you can also get smaller and smaller advantages, and you may even slow down the slave further in some cases. The performance depends on data distribution, on the pattern of the data, the type and amount of indexes, and some unknown factors that sometimes baffle both developers and DBAs. Thus, we will focus our examination to the manageability of parallel appliers. What is important to note here is that all implementations introduce, in addition to a larger number of appliers, much more complexity in the operations, where we need to know, and eventually monitor, much more than what we should do in single threaded replication. We'll see in a moment how the new features rise to this challenge.

Setting up Parallel replication

Compared to what we had to do for multi-source, the setup is not hard. There is a variable that we need to change. The name is different in MySQL 5.6/5.7 and MariaDB 10. As it happens for multi source, we can set-up the functionality using one of the mysql-replication-samples scripts on GitHub.

Setup in MySQL 5.7

When we run the script install_parallel.sh, what happens is that the slave gets stopped, the parallel workers variable is increased, and then the slave restarts. A look at show processlist in the slave tells us that there are now 10 processes waiting for work.

+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| slave_parallel_workers | 10    |
+------------------------+-------+
+----+-------------+-----------+------+---------+------+---------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                                 | Info             |
+----+-------------+-----------+------+---------+------+---------------------------------------+------------------+
|  9 | system user |           | NULL | Connect |    0 | Waiting for master to send event      | NULL             |
| 10 | system user |           | NULL | Connect |    0 | Reading event from the relay log      | NULL             |
| 11 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 12 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 13 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 14 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 15 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 16 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 17 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 18 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 19 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 20 | system user |           | NULL | Connect |    0 | Waiting for an event from Coordinator | NULL             |
| 22 | msandbox    | localhost | NULL | Query   |    0 | starting                              | show processlist |
+----+-------------+-----------+------+---------+------+---------------------------------------+------------------+

setup in MariaDB 10

Same story in MariaDB 10. Stop the slave, change the variable, restart the slave. The process list shows willing workers.

+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| slave_parallel_threads | 10    |
+------------------------+-------+
+----+-------------+-----------+------+---------+------+------------------------------------+------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                              | Info             | Progress |
+----+-------------+-----------+------+---------+------+------------------------------------+------------------+----------+
| 10 | system user |           | NULL | Connect |    0 | Waiting for master to send event   | NULL             |    0.000 |
| 11 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 12 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 13 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 14 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 15 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 16 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 17 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 18 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 19 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 20 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread   | NULL             |    0.000 |
| 21 | system user |           | NULL | Connect |    0 | Slave has read all relay log; wait | NULL             |    0.000 |
| 23 | msandbox    | localhost | NULL | Query   |    0 | init                               | show processlist |    0.000 |
+----+-------------+-----------+------+---------+------+------------------------------------+------------------+----------+

The above output has been shortened for easier showing. The message for thread 21 said "waiting for master to send event"

Monitoring parallel replication

After the setup, the interesting operations can start. We need to know what we can get from the monitoring facilities.

To monitor, however, we need some action to look at. In the same mysql-replicator-samples there are a few scripts that can generate load for multiple databases (multi_inserts.sh) or for many tables in the same database (multi_inserts_one_db.sh). We will use the first one in MySQL 5.7 to check the default algorithm (parallelism by schema) and the second script for mariadb, again to test its default algorithm, which can run parallel appliers in the same schema. As I mentioned before, we are not comparing performance here: we just want to see what we can detect when parallel replication is running.

The scripts are nothing fancy. They will generate lots of concurrent inserts in many tables. This is enough to see all the parallel workers busy. You can use your favorite stress test instead of these simple scripts.

Monitoring in MySQL 5.7

Let's start with MySQL 5.7, where we have two tables dedicated to parallel replication monitoring. The first one, in the mysql schema, can show the progress of work on binary logs and relay logs. There is no mention of GTIDs. And there is no mention of database. For a system where parallelism can happen by schema, this looks like insufficient design. Another thing that is important to notice is the field Checkpoint_group_bitmap, of which I ignore the purpose, but is very prominent when you select the table contents, because this column is filled with non-printable characters. For this reason, we will select just a few columns, to see the basic progress.

slave1 [localhost] {msandbox} (mysql) > desc slave_worker_info;
+----------------------------+---------------------+------+-----+---------+-------+
| Field                      | Type                | Null | Key | Default | Extra |
+----------------------------+---------------------+------+-----+---------+-------+
| Id                         | int(10) unsigned    | NO   | PRI | NULL    |       |
| Relay_log_name             | text                | NO   |     | NULL    |       |
| Relay_log_pos              | bigint(20) unsigned | NO   |     | NULL    |       |
| Master_log_name            | text                | NO   |     | NULL    |       |
| Master_log_pos             | bigint(20) unsigned | NO   |     | NULL    |       |
| Checkpoint_relay_log_name  | text                | NO   |     | NULL    |       |
| Checkpoint_relay_log_pos   | bigint(20) unsigned | NO   |     | NULL    |       |
| Checkpoint_master_log_name | text                | NO   |     | NULL    |       |
| Checkpoint_master_log_pos  | bigint(20) unsigned | NO   |     | NULL    |       |
| Checkpoint_seqno           | int(10) unsigned    | NO   |     | NULL    |       |
| Checkpoint_group_size      | int(10) unsigned    | NO   |     | NULL    |       |
| Checkpoint_group_bitmap    | blob                | NO   |     | NULL    |       |
| Channel_name               | char(64)            | NO   | PRI | NULL    |       |
+----------------------------+---------------------+------+-----+---------+-------+
13 rows in set (0.00 sec)
slave1 [localhost] {msandbox} (mysql) > select ID,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from slave_worker_info;
+----+----------------------+---------------+------------------+----------------+
| ID | Relay_log_name       | Relay_log_pos | Master_log_name  | Master_log_pos |
+----+----------------------+---------------+------------------+----------------+
|  1 | ./mysql-relay.000003 |      33029530 | mysql-bin.000002 |       33029317 |
|  2 | ./mysql-relay.000003 |      33066066 | mysql-bin.000002 |       33065853 |
|  3 | ./mysql-relay.000003 |      33111090 | mysql-bin.000002 |       33110877 |
|  4 | ./mysql-relay.000003 |      33107225 | mysql-bin.000002 |       33107012 |
|  5 | ./mysql-relay.000003 |      33059630 | mysql-bin.000002 |       33059417 |
|  6 | ./mysql-relay.000003 |      33056541 | mysql-bin.000002 |       33056328 |
|  7 | ./mysql-relay.000003 |      33086391 | mysql-bin.000002 |       33086178 |
|  8 | ./mysql-relay.000003 |      33018457 | mysql-bin.000002 |       33018244 |
|  9 | ./mysql-relay.000003 |      33052167 | mysql-bin.000002 |       33051954 |
| 10 | ./mysql-relay.000003 |      33044450 | mysql-bin.000002 |       33044237 |
+----+----------------------+---------------+------------------+----------------+
10 rows in set (0.00 sec)

We can see that workers are using different positions of the relay logs, which correspond to different positions of the binary logs. However, there are no timestamps, so it is hard to know which workers are working more.

Next, we look at the table dedicated to parallel replication in performance_schema.


slave1 [localhost] {msandbox} (mysql) > use performance_schema
Database changed

slave1 [localhost] {msandbox} (performance_schema) > desc replication_applier_status_by_worker;
+-----------------------+---------------------+------+-----+-------------------+-----------------------------+
| Field                 | Type                | Null | Key | Default           | Extra                       |
+-----------------------+---------------------+------+-----+-------------------+-----------------------------+
| CHANNEL_NAME          | char(64)            | NO   |     | NULL              |                             |
| WORKER_ID             | bigint(20) unsigned | NO   |     | NULL              |                             |
| THREAD_ID             | bigint(20) unsigned | YES  |     | NULL              |                             |
| SERVICE_STATE         | enum('ON','OFF')    | NO   |     | NULL              |                             |
| LAST_SEEN_TRANSACTION | char(57)            | NO   |     | NULL              |                             |
| LAST_ERROR_NUMBER     | int(11)             | NO   |     | NULL              |                             |
| LAST_ERROR_MESSAGE    | varchar(1024)       | NO   |     | NULL              |                             |
| LAST_ERROR_TIMESTAMP  | timestamp           | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-----------------------+---------------------+------+-----+-------------------+-----------------------------+

slave1 [localhost] {msandbox} (performance_schema) > select WORKER_ID, THREAD_ID , SERVICE_STATE , LAST_SEEN_TRANSACTION , LAST_ERROR_MESSAGE  from replication_applier_status_by_worker;
+-----------+-----------+---------------+---------------------------------------------+--------------------+
| WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION                       | LAST_ERROR_MESSAGE |
+-----------+-----------+---------------+---------------------------------------------+--------------------+
|         1 |        40 | ON            | 00013253-1111-1111-1111-111111111111:133621 |                    |
|         2 |        41 | ON            | 00013253-1111-1111-1111-111111111111:133699 |                    |
|         3 |        42 | ON            | 00013253-1111-1111-1111-111111111111:133826 |                    |
|         4 |        43 | ON            | 00013253-1111-1111-1111-111111111111:133919 |                    |
|         5 |        44 | ON            | 00013253-1111-1111-1111-111111111111:133686 |                    |
|         6 |        45 | ON            | 00013253-1111-1111-1111-111111111111:133688 |                    |
|         7 |        46 | ON            | 00013253-1111-1111-1111-111111111111:133770 |                    |
|         8 |        47 | ON            | 00013253-1111-1111-1111-111111111111:133494 |                    |
|         9 |        48 | ON            | 00013253-1111-1111-1111-111111111111:133731 |                    |
|        10 |        49 | ON            | 00013253-1111-1111-1111-111111111111:133763 |                    |
+-----------+-----------+---------------+---------------------------------------------+--------------------+
10 rows in set (0.00 sec)

There is a timestamp in this table, but only for errors. The regular operations don't get one. Also here we miss info about database, time of extraction and apply, and we don't know what exactly the worker is doing: The column LAST_SEEN_TRANSACTION is about received GTIDs, not executed ones.

Finally, we have a look at the output of SHOW SLAVE STATUS. In the past episodes, here is where we had the most comprehensive set of information. But with parallel replication we are going to have a surprise.


slave1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 13253
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 38675840
               Relay_Log_File: mysql-relay.000003
                Relay_Log_Pos: 35124216
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 35124003
              Relay_Log_Space: 38676469
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 30
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 00013253-1111-1111-1111-111111111111
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Reading event from the relay log
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013253-1111-1111-1111-111111111111:1-150420
            Executed_Gtid_Set: 00013253-1111-1111-1111-111111111111:1-136616:136618-136624:136626-136632:
136634-136635:136637-136640:136642-136644:136646-136648:136650-136670:136672-136681:136683-136687:
136689-136703:136706-136716:136718-136720:136722-136723:136726-136727:136730:136734-136735:
136737:136739-136741:136743-136744:136746-136747:136749:136752-136753:136755:136757:136762-136763:
136765-136766:136768-136771:136773-136774:136777:136783:136785:136787-136792:136794-136795:136797:
136801:136806:136809:136812-136814:136817:136821:136824:136826:136833:136835-136837:136841-136843:
136846:136852:136862-136864:136871:136874:136878:136881:136884:136897:136901:136908:136913:136915:
136926:136928:136940:136948:136951:136968:136986:136989:136992:136995
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
1 row in set (0.00 sec)

The output of Executed_Gtid_set is actually one line. I split it to make it fit in the page. But even after it is split, I have trouble understanding what it is showing, and how this monster piece of information can be useful. Want to see something more horrible than this? Try parallel replication with muulti-source.

Monitoring with MariaDB 10

In MariaDb there is only one table where we can follow progress. Unlike the one in MySQL 5.6/7, there is no fixed number of rows. This is the same table where operations with GTID are stored with single thread replication. The rows are added or pruned according to the traffic in the various channels.


slave1 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|         0 |  28569 |         1 |  28569 |
|         0 |  28570 |         1 |  28570 |
+-----------+--------+-----------+--------+
2 rows in set (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|         0 |  42786 |         1 |  42786 |
|         0 |  42787 |         1 |  42787 |
|         0 |  42788 |         1 |  42788 |
|         0 |  42789 |         1 |  42789 |
+-----------+--------+-----------+--------+
4 rows in set (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|         0 |  46807 |         1 |  46807 |
|         0 |  46808 |         1 |  46808 |
+-----------+--------+-----------+--------+
2 rows in set (0.00 sec)

The information here is slim. As we have noticed in the previous articles, we get little or no visibility into the internal operations. Here we have ten appliers that work concurrently, but there is little evidence that this is happening. Much like 'show processlist', the gtid table shows only a glimpse of a brief moment. Sometimes we find 10 rows showing activity, sometimes just two. It does not mean that the server is idle. It's just that we don't catch its activity. Same as when we try to monitor using process list: with a busy server, catching a view of many concurrent queries is a matter of luck.

Let's have a look at SHOW SLAVE STATUS. The good news is that we don't have the horrible crowd seen in MySQL 5.7. The bad news is that we have just as much information that we would get with a single thread. And let's not forget that the only info about GTID in the slave status is related to the IO thread. For the SQL threads (in this case, we have 10) we need to look at the above table or the GTID variables.


slave1 [localhost] {msandbox} ((none)) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 25030
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 958300
               Relay_Log_File: mysql-relay.000002
                Relay_Log_Pos: 956223
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 958300
              Relay_Log_Space: 956516
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 0-1-5826
1 row in set (0.00 sec)

Small improvements

About two years ago I wrote a post comparing parallel replication in MySQL 5.6 and Tungsten Replicator. Most of what was said there is still valid, but I want to acknowledge a small improvement. When an error happened in parallel replication, the message shown in my article said:

Last_SQL_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. 
Default database: 'test'. Query: 'insert into t1 values (1)'

In this context, when working with 10 threads, finding where the problem happens may be tough. But now the same situation results in this more helpful error message:

Last_SQL_Error: Worker 8 failed executing transaction '00021891-1111-1111-1111-111111111111:90860' 
at master log mysql-bin.000002, end_log_pos 23636368; Error 'Duplicate entry '340' for key 'PRIMARY'' on query. 
Default database: 'db1'. Query: 'insert into t7 values (340,  null)'

Here we get the thread ID (Worker 8), the GTID (yay!) and next to it the binary log name and position that was processed. This proves that the worker thread knows everything that is useful. Now the next step will be adding the same information to the appropriate monitoring table.

Summing up

Of all the new features, parallel replication is probably the most needed, and the one that falls short in matter of usability. MySQL 5.7 offers what looks at first sight a large amount of monitoring data, but in the end is not enough to run operations confidently.

In MariaDB, the lack of monitoring data is so acute that I would be very reluctant to use or recommend it for production.

It's a start. I hope that both teams will pause a bit in their frenzy to add more features, and will strengthen up the existing ones instead.

What's next: Percona Live in Amsterdam!

This is the last episode in the series. There is much more to say about replication features, old and new, but the focus of these articles was on monitoring capabilities, and we've seen enough.

I am speaking at Percona Live Amsterdam 2015 on September 22, and I will cover all these topics with examples.

I would like to remind everyone that there is a public project on GitHub with mysql-replication-samples. I am sure there could be plenty of other scripts that could be added. Participation is welcome!

Tuesday, August 25, 2015

New MySQL Sandbox 3.1 - GitHub, and usability

I have three pieces of information to share about MySQL::Sandbox:

  • Version 3.1.0 has migrated from Launchpad to GitHub
  • This version is released under the Apache license. Both these changes are meant to improve and promote cooperation on the project.
  • There is an important change related to usability. When using replication with MySQL::Sandbox and MySQL 5.6+, the server UUIDs become more readable (see below).

First, some words on the location changes. About two years ago, I started plans for a rewrite of MySQL::Sandbox. Then, I had some unexpected changes, which involved moving home to a different continent twice within twelve months. The project was neglected, but I haven't dismissed it. While I wait for the rewrite to start, I wanted to get rid of the obstacles for rapid development, and I decided to transfer the current codebase to GitHub. This will allow me to use only one RCS instead of three (My team has abandoned svn too 1).

Apart from the changes described in this post, there is little difference in the code published on GitHub.

Now, to the usability. In my recent series of advanced replication features, I complained often about GTIDs being hard to tell apart. Hers is an example from MySQL replication in action - Part 3: all-masters P2P topology. Look at the server identifiers, and see if you can tell at first glance where does the largest transaction set come from:

$ for N in 1 2 3 4 ; do ./n$N -e 'select @@server_id, @@server_uuid; select @@global.gtid_executed\G'; done
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 18fd3be0-4119-11e5-97cd-24acf2bbd1e4 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         102 | 1e629814-4119-11e5-85cf-aac6e218d3d8 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         103 | 226e3350-4119-11e5-8242-de985f123dfc |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         104 | 270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3

I decided that I could do something. I use MySQL::Sandbox for my testing, and in these conditions there is little risk of server clashing. The architecture of MySQL::Sandbox ensures that servers within a group of sandboxes keep to themselves. I modified the software so that when a server is created with a server-ID, the server-uuid is modified (with a unsupported-and-do-not-try-this-in-production hack). I know: it defies the purpose of having unique identifiers, but frankly, I care more about being able to understand what is going on than worrying about my server-uuid being the same in a different cluster.

The way it works is simple: when a server is created and has a server-id (i.e. we can safely assume that its purpose is to be used in replication), its server-UUID is changed to a new string made of port number and the server-id repeated many times. If the server ID is between 101 and 109 (what usually MySQL::Sandbox does for groups of sandboxes) it is simplified by subtracting 100, and having IDs that are still hard to pronounce, but that can be visually identified at a glance. (such as 00008480-1111-1111-1111-111111111111)

If the server ID is something more complex, then MySQL::Sandbox uses the port number to create the last part as well. For example 00005708-5708-5708-5708-000000005708.

Here is the same scenario shown in the article, but using the newest MySQL Sandbox version. Now server #2 is more easily identified as the source of the largest transaction group.

$ for N in 1 2 3 4 ; do ./n$N -e 'select @@server_id, @@server_uuid; select @@global.gtid_executed\G'; done
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 00008480-1111-1111-1111-111111111111 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         102 | 00008481-2222-2222-2222-222222222222 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         103 | 00008482-3333-3333-3333-333333333333 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         104 | 00008483-4444-4444-4444-444444444444 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3

Should you decide that you don't want MySQL::Sandbox to do this transformation, you can skip it by setting the variable KEEP_UUID before running any sandbox installation command.

Another change that was long overdue is the behavior of the './clear' script within the sandboxes. With the introduction of MySQL 5.7, I faced the problem of having innodb tables in places where there hadn't been before (i.e. the 'mysql' database). As a solution, I made a copy of that database with mysaldump right after the installation, and then I used it to re-create the database after a cleanup. This solution doesn't work, as it leads to more problems than benefits. So I changed the behavior. There is no mysqldump backup, and there is no removal of innodb files. Moreover, since the sys database was introduced, it is not safe to truncate 'mysql.proc', as it would also make the sys schema ineffective. Now, if a cleanup that just removes non-system schemas is enough for you, go for the './clear' script. If your testing has messed up with stored routines, then you would be better off with a reinstallation (which takes just a few seconds anyway.)

And one more thing: This release of MySQL::Sandbox is ready for MySQL 5.7.9. The MySQL team at Oracle has graciously provided a preview build to test some early bug fixes, and MySQL::Sandbox 3.1.00 works smoothly with it.




1. For the ones who are new to my blog, let's make clear that MySQL::Sandbox is my personal project, and my company does not have any relation or influence on such project.

Monday, August 24, 2015

Tungsten Replicator moved to GitHub with Apache license

It had been in the making for long time. Google announced that Google Code would be closing, and since then the Continuent team has been hard at work to handle the transition. You can guess it: this operation would have been quicker if it had been done by a small company like we were one year ago, but being part of a large corporation introduces some constraints that have affected our schedule.

However, our wish has always been, and still is, to keep Tungsten Replicator as an open source product, with full functionalities and with the full benefits that the open source development model offers.

Today, Tungsten Replicator is available on GitHub as vmware/tungsten-replicator, and it is wearing new clothes. It is not GPL anymore. In an effort to facilitate contributions, its license was changed to Apache 2.0.

Feature-wise, there is little difference from the previous release of 4.0. Mainly, we have cleaned up the code and moved out the pieces that no longer fit:

  1. Bristlecone was removed from the package. It is used only for testing, and it will be released separately. There is no need to duplicate it into every Tungsten tarball.
  2. The cookbook recipes have been retired. These scripts were created when the installer was still in its infancy and we had little documentation. Therefore, it was convenient to have wrappers for the common installation operations. Using the manual, it is pretty easy to install master/slave, fan-in, and multi-master topologies. The biggest reason for removing the cookbook, though, is that it was only useful for MySQL replication. If you need heterogenous deployments, the cookbook was an obstacle, rather than being helpful.
  3. Some files were shuffled within the deployment tree. The ./tungsten-replicator/scripts directory was merged with ./tungsten-replicator/bin, the applier templates were moved from samples to a dedicated path, and we also did some other similar cleanup.

Although it has changed location and license, this is not a "release." If you compile the code, it will come up as 4.1, but it is still work in progress. Same as what was happening in the previous repository, we tag the code with the next version, and start working on it until it is deemed ready for release. The latest release for production (4.0.1) is still available from the old directory.

The code is available on GitHub, which makes collaboration much simpler than the previous repository. Take advantage of it: fork it, and help make the best replication tool even better!

MySQL usability issues for replication

In my latest series of advanced replication features, I came across several usability issues, which I would like to recap here. For each section of this list of requests, I make a wish list, with some general comments.

INSTALLATION

As the maintainer of MySQL Sandbox, a tool that wants to facilitate the installation of MySQL for testing, I am particularly sensitive to the annoyances during installation. I have covered the matter extensively in recent weeks and many times before. My point is that the logging of the installation must be useful for the DBA, not to the MySQL developers.

  • Make a big distinction between [Info] and [Warning] messages. Warnings should only appear when there is some risk. Using a warning to tell the user that the InnoDB log file was created is a nonsense.
  • … and then make [Info] messages suppressible. Currently, I can suppress warnings, but then I would suppress also the real warnings, in addition to the fake ones.
  • Make mysqld –initialize use the correct syntax for timestamps, instead of issuing a warning about the obsolete syntax that was used. (See In search of cleanliness : the elusive quiet installation for a full explanation)

MULTI-SOURCE

Here we have two main problems. The first one is that the introduction of multi source channels has left some new ways of shooting yourself in the foot. By mixing the new explicitly named channels with the hidden one (but still used in regular replication!) which is named "" (= the empty string), we may start to see a new series of blunders in replication.

The second set of problems comes from weak integration between existing features and the new one. There are operation modes, such as semi-synchronous, delayed, and parallel replication that were designed with a single data stream in mind, and that lack the capability of being tuned for different channels.

  • Make it impossible to use the unnamed channel and the named ones at the same time. Currently only a very weak protection exists against mixing named and unnamed channels. Also the syntax for channel ’’ should result in an error when named channels are defined.
  • Integrate features that were designed for single sources to work better with multi source: semi-sync replication, parallel applier, delayed replication.

GTID

The ugliness of GTID implementation comes from the assumption that users should value more the uniqueness of the server ID than the ease of maintenance. This leads to the solution of choosing UUIDs as server identifiers, and tying such identifiers to GTIDs. Personally, I don't care if my server has the same ID used in a deployment next door or in another continent. I care about being able to recognize my servers easily and group them according to my needs. Moreover, the server-UUID is only used for a few functions, but replication accuracy still depends on the old server-IDs.

That said, even if GTIDs were easier to read, they still suffer from lack of usability in many ways, as was documented in an earlier article.

  • The server-UUID should be human friendly. It should be something that can be pronounceable, easy to read, and easy to tell apart from others. Valid candidates for better server identifiers can be simple numbers (like server-id), alphanumeric text (such as ‘alpha,’ ‘mynode101,’ ‘London–42’)
  • It should be possible to define the server-UUID (or its human-friendly replacement) without hacking the auto.cnf file
  • The GTID shown in monitoring commands ( show master/slave status, select @@global.gtid_executed, various performance_schema tables) should show the precise transaction number, not the transaction set, i.e. server-uuid:203, not server-uuid:1–203.
  • Enable GTID by default;
  • Remove restrictions to GTID usage;
  • Make an easy procedure to skip unwanted events, instead of injecting empty transactions;

CHANGE MASTER TO

The shortcomings of GTID are also reflected on the main command for replication setup: CHANGE MASTER TO. Some of the shortcomings are listed below. Perhaps there is something more that escapes me now.

  • Allow CHANGE MASTER TO to use GTID to set the replication starting point. Currently you can only do it with binlog name + position;
  • Allow CHANGE MASTER TO to use server-uuid (or their human friendly replacement) as argument for IGNORE_SERVER_IDS=…

MASTER STATUS

The usefulness of MASTER STATUS is reduced by the inability of combining its output with something else. Since the status is only available as a SHOW statement, we can only use its output through an external procedure, not in SQL.

Most damning, though, is the grouping of GTIDs listed in SHOW MASTER STATUS with multi-source replication, where determining what was created in the current server requires multiple operations.

  • its output should be available in a table
  • When using multi-source, there should be a distinction between GTID generated in the current server and the ones that were received through replication

SLAVE STATUS

In part I of the replication analysis, we have seen that the only place where we have completeness of information is SHOW SLAVE STATUS, while all the replication tables in mysql and performance_schema together still lack some of the elements that this ancient command allows. There is much that can be done for improving the monitoring tables.

  • show ALL the information from SHOW SLAVE STATUS in performance_schema tables.
  • In multi-source, show executed GTID for the channel that is being listed. Currently, it shows all GTIDs for every channel.

CRASH-SAFE tables in mysql

This is related to the previous section. Some info that is available in SHOW SLAVE STATUS is still missing from the crash-safe tables. And some more data, which is available to the replication threads, should be logged as well.

  • Show received and executed GTID alongside binlog and relay log info
  • Add schema info (in which schema the event was applied)

SLAVE WORKER TABLE

Continuing the grievance about SHOW SLAVE STATUS, the monitoring table dedicated to parallel replication can give more info. When an error occurs during parallel replication, the error message shows the thread identifier, the GTID, and the log where the problem has arisen. Why the same info cannot be in the tables as well?

  • Add GTID info
  • Add schema info
  • Remove Checkpoint_group_bitmap field (which shows as gibberish) or make it displayable.

performance_schema TABLES

Part of the above wishes apply to the performance_schema tables as well. Information known to the server is not made available for monitoring.

  • Show both received and executed GTIDs for all replication_% tables that display progress.
  • Enhance replication tables to include all information related to replication.

WISHES FOR FUTURE FEATURES

As MySQL 5.7 approaches GA, we can start worrying about new features coming. I think that the data dictionary will be one of the features that will have the bigger impact on usability.

What concerns me is that some of the features of a data dictionary will clash with what is already implemented in information_schema and performance_schema. I'd like to see, as early as possible, an integration path for these features. I don't care if in this version we need to adjust our queries to the new structure, provided that there will be only one source of information and that the change provides real benefits.

A further wish for the future: when introducing changes related to security, make them substantial rather than noisy. Adding warnings to commonly used commands results often in unnecessary work for DBAs and developers.

Finally, and coming back to the main focus of this post, when thinking of replication I invite the MySQL team to innovate outside the box. There are many fields where replication can be enhanced and made more powerful. Here is a simple list:

  1. Make replication open to heterogeneous sources and destination. Replicating from and to no-sql entities is becoming more important by the day.
  2. Make replication more agile, by adopting and promoting an open standard for binary logs.
  3. Replication monitoring is now based on a single server. We need internal features that make easier the task of creating monitoring dashboards that can see the whole cluster at once. Adding the appropriate monitoring tables and giving slaves the ability of communicating back to the masters their status (which would then broadcast back to the other slaves) would improve the replication process —and the ability of recovering from a failure— immensely.

Wednesday, August 19, 2015

MySQL replication in action - Part 4 - star and hybrid topologies

Previous episodes:

Introducing star topology.

In all-masters P2P topologies, we have seen that we have a way of deploying a topology where all nodes are masters, and achieve better efficiency and stability than ring topologies. That method comes at the price of a complex setup, which requires, for a N-node cluster, N*(N-1) connections.

We can achieve the same result as in a P2P all-masters topology by trading connections for stability. In a star topology (Figure 1) all nodes are masters, but they do not connect to each other directly. There is a special node, named hub, which receives the changes produced by each endpoint and spreads them to the others.

Topologies star

Figure 1 - A star topology

Compared to ring replication, a star topology does not achieve the dangerous depth of processing (see Part 3) of N-1, where the last node requires N-1 acts of replication to be updated. In fact, the depth of processing of a star topology can be either 1, if the data is produced in the hub, or 2, if it is produced in the endpoints (See figures 2 and 3).

Topologies star processing Topologies star processing

Figure 2 - A star topology depth of processing from hub

Figure 3 - A star topology depth of processing from endpoints

You can think of a star topology as a set of ring topologies strung together.

Topologies star as ring

Figure 4 - A star topology is like many rings together

The way it works is simple. All endpoint nodes are configured as we have seen for fan-in and P2P topologies. The hub node, instead, has an extra option: it enables log-slave-updates. With this addition, the hub is able to receive data from any master, and every endpoint can then pull the data from the hub.

The advantage is clear: While we had 12 connections for a 4 node deployment in an all-masters P2P topology, a star topology only requires 6 of them. To add a 5th node, you needed to add 8 connections (4 in the new node, and one each in the previous nodes) for a total of 20 connections. In a star topology, you would only add 2 connections (one from the new endpoint to the hub, and one from the hub to the new endpoint) for a total of 8.

If your deployment requires many nodes and all need to be masters, a star topology is a good candidate.

There is, of course, a price to pay. In addition to the risk of bad performance (of which I have talked at length in Multiple masters : attraction to the stars,) a star topology has a single point of failure (SPOF) and you must get ready to deal with it. We will see a full example, but for now you need to know that, in order to successfully promote an endpoint to the role of hub, you need to enable log-slave-updates in one or more endpoints that you want to be hub-candidates.

Installing a star topology

The procedure to install a star topology in MySQL 5.7 requires several steps for each endpoint:

  1. in the endpoint:
    • create a channel named hub-to-$nodename, where $nodename is the name of the endpoint;
    • start slave for channel 'hub-to-$nodename'
  2. in the hub:
    • create a channel named $nodename-to-hub
    • start slave for channel '$nodename-to-hub'

You must also remember to enable log-slave-updates in the hub (and in one or more endpoints that you want to set as candidate to hub replacement in case of failure.

The procedure for MariaDB 10 is the same, with the different syntax that we have noted in the previous articles.

To see a practical example, we will use another script from mysql-replication-samples. Using a group of 5 sandboxes, we will create a system with 4 endpoint and one hub.

$ ./set_star_topology.sh 5.7.8 mysql
installing node 1
installing node 2
installing node 3
installing node 4
installing node 5
group directory installed in $HOME/sandboxes/multi_msb_5_7_8
# server: 1:
# server: 2:
# server: 3:
# server: 4:
# server: 5:
# option 'master-info-repository=table' added to node1 configuration file
# option 'relay-log-info-repository=table' added to node1 configuration file
# option 'gtid_mode=ON' added to node1 configuration file
# option 'enforce-gtid-consistency' added to node1 configuration file
. sandbox server started
# option 'master-info-repository=table' added to node2 configuration file
# option 'relay-log-info-repository=table' added to node2 configuration file
# option 'gtid_mode=ON' added to node2 configuration file
# option 'enforce-gtid-consistency' added to node2 configuration file
. sandbox server started
# option 'master-info-repository=table' added to node3 configuration file
# option 'relay-log-info-repository=table' added to node3 configuration file
# option 'gtid_mode=ON' added to node3 configuration file
# option 'enforce-gtid-consistency' added to node3 configuration file
# option 'log-slave-updates' added to node3 configuration file
. sandbox server started
# option 'master-info-repository=table' added to node4 configuration file
# option 'relay-log-info-repository=table' added to node4 configuration file
# option 'gtid_mode=ON' added to node4 configuration file
# option 'enforce-gtid-consistency' added to node4 configuration file
. sandbox server started
# option 'master-info-repository=table' added to node5 configuration file
# option 'relay-log-info-repository=table' added to node5 configuration file
# option 'gtid_mode=ON' added to node5 configuration file
# option 'enforce-gtid-consistency' added to node5 configuration file
. sandbox server started
# HUB node3 port: 8381
# node node1 port: 8379
./node1/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8381, MASTER_AUTO_POSITION=1 for channel 'hub_node1'"
./node3/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8379, MASTER_AUTO_POSITION=1 for channel 'node1_hub'"
./node3/use -e "START SLAVE FOR CHANNEL 'node1_hub'"
./node1/use -e "START SLAVE FOR CHANNEL 'hub_node1'"
# node node2 port: 8380
./node2/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8381, MASTER_AUTO_POSITION=1 for channel 'hub_node2'"
./node3/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8380, MASTER_AUTO_POSITION=1 for channel 'node2_hub'"
./node3/use -e "START SLAVE FOR CHANNEL 'node2_hub'"
./node2/use -e "START SLAVE FOR CHANNEL 'hub_node2'"
# node node4 port: 8382
./node4/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8381, MASTER_AUTO_POSITION=1 for channel 'hub_node4'"
./node3/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8382, MASTER_AUTO_POSITION=1 for channel 'node4_hub'"
./node3/use -e "START SLAVE FOR CHANNEL 'node4_hub'"
./node4/use -e "START SLAVE FOR CHANNEL 'hub_node4'"
# node node5 port: 8383
./node5/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8381, MASTER_AUTO_POSITION=1 for channel 'hub_node5'"
./node3/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8383, MASTER_AUTO_POSITION=1 for channel 'node5_hub'"
./node3/use -e "START SLAVE FOR CHANNEL 'node5_hub'"
./node5/use -e "START SLAVE FOR CHANNEL 'hub_node5'"

We see operations similar to the ones used for other multi-source topologies. Notice that for the hub we also set log-slave-updates. We can use the same script used for all-masters topology to see if this deployment works as expected. Before doing that, though, we will enable log-slave-updates in one of the endpoints (node4) which will be our hub candidate when we try a failover.

$ ./node4/add_option log-slave-updates
# option 'log-slave-updates' added to configuration file
. sandbox server started

Note: you may need to wait a few seconds for the slave channels between the hub and the candidate to resume operations.

After that is done, we can test the deployment:

$ ./test_all_masters_replication.sh
# NODE node1 created table test_node1
# NODE node2 created table test_node2
# NODE node3 created table test_node3
# NODE node4 created table test_node4
# NODE node5 created table test_node5
# Data in all nodes
101
1 101 8379 node1 2015-08-14 22:21:02
1 102 8380 node2 2015-08-14 22:21:02
1 103 8381 node3 2015-08-14 22:21:02
1 104 8382 node4 2015-08-14 22:21:02
1 105 8383 node5 2015-08-14 22:21:02
102
1 101 8379 node1 2015-08-14 22:21:02
1 102 8380 node2 2015-08-14 22:21:02
1 103 8381 node3 2015-08-14 22:21:02
1 104 8382 node4 2015-08-14 22:21:02
1 105 8383 node5 2015-08-14 22:21:02
103
1 101 8379 node1 2015-08-14 22:21:02
1 102 8380 node2 2015-08-14 22:21:02
1 103 8381 node3 2015-08-14 22:21:02
1 104 8382 node4 2015-08-14 22:21:02
1 105 8383 node5 2015-08-14 22:21:02
104
1 101 8379 node1 2015-08-14 22:21:02
1 102 8380 node2 2015-08-14 22:21:02
1 103 8381 node3 2015-08-14 22:21:02
1 104 8382 node4 2015-08-14 22:21:02
1 105 8383 node5 2015-08-14 22:21:02
105
1 101 8379 node1 2015-08-14 22:21:02
1 102 8380 node2 2015-08-14 22:21:02
1 103 8381 node3 2015-08-14 22:21:02
1 104 8382 node4 2015-08-14 22:21:02
1 105 8383 node5 2015-08-14 22:21:02

Now, let's have a look at the GTIDs:

$ for N in 1 2 3 4 5; do ./n$N -e 'select @@server_id, @@server_uuid; select @@global.gtid_executed\G'; done
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 519dd7f0-42c4-11e5-8995-590303071a70 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 519dd7f0-42c4-11e5-8995-590303071a70:1-3,
5615010a-42c4-11e5-873f-9904485cf6bc:1-3,
5a8809f8-42c4-11e5-9c60-f36f50446736:1-3,
5ecdf2ca-42c4-11e5-b861-a22aea18b64b:1-3,
630bab66-42c4-11e5-8807-bb8c8c1ce62c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         102 | 5615010a-42c4-11e5-873f-9904485cf6bc |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 519dd7f0-42c4-11e5-8995-590303071a70:1-3,
5615010a-42c4-11e5-873f-9904485cf6bc:1-3,
5a8809f8-42c4-11e5-9c60-f36f50446736:1-3,
5ecdf2ca-42c4-11e5-b861-a22aea18b64b:1-3,
630bab66-42c4-11e5-8807-bb8c8c1ce62c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         103 | 5a8809f8-42c4-11e5-9c60-f36f50446736 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 519dd7f0-42c4-11e5-8995-590303071a70:1-3,
5615010a-42c4-11e5-873f-9904485cf6bc:1-3,
5a8809f8-42c4-11e5-9c60-f36f50446736:1-3,
5ecdf2ca-42c4-11e5-b861-a22aea18b64b:1-3,
630bab66-42c4-11e5-8807-bb8c8c1ce62c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         104 | 5ecdf2ca-42c4-11e5-b861-a22aea18b64b |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 519dd7f0-42c4-11e5-8995-590303071a70:1-3,
5615010a-42c4-11e5-873f-9904485cf6bc:1-3,
5a8809f8-42c4-11e5-9c60-f36f50446736:1-3,
5ecdf2ca-42c4-11e5-b861-a22aea18b64b:1-3,
630bab66-42c4-11e5-8807-bb8c8c1ce62c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         105 | 630bab66-42c4-11e5-8807-bb8c8c1ce62c |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 519dd7f0-42c4-11e5-8995-590303071a70:1-3,
5615010a-42c4-11e5-873f-9904485cf6bc:1-3,
5a8809f8-42c4-11e5-9c60-f36f50446736:1-3,
5ecdf2ca-42c4-11e5-b861-a22aea18b64b:1-3,
630bab66-42c4-11e5-8807-bb8c8c1ce62c:1-3

Here's the interesting thing, where the power of GTID is shown to its full potential. Although each endpoint is only connected to the hub, it gets all the GTID from the other nodes. This makes possible the procedure of hub replacement that we will see in a moment.

To show what the monitoring looks like in a star topology, we could run a full SHOW SLAVE STATUS for each node, but that would take too much space. Basically, you will see the same output that we have seen for all-masters, with a big difference: endpoints have only one channel, while the hub has 4. Let's see just a simplified account:

$ ./use_all 'SHOW SLAVE STATUS\G' | grep 'server:\|Running:\|Channel_Name:'
# server: 1:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: hub_node1
# server: 2:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: hub_node2
# server: 3:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: node1_hub
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: node2_hub
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: node4_hub
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: node5_hub
# server: 4:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: hub_node4
# server: 5:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                 Channel_Name: hub_node5

So, you have a crowded situation in the hub and a simpler one in the endpoints. There is nothing new in the monitoring tables, compared to what we've seen in the past articles, therefore let's finish this chapter by seeing what happens with MariaDB 10.

$ ./set_star_topology.sh ma10.0.20 mariadb
installing node 1
installing node 2
installing node 3
installing node 4
installing node 5
group directory installed in $HOME/sandboxes/multi_msb_ma10_0_20
# server: 1:
# server: 2:
# server: 3:
# server: 4:
# server: 5:
# option 'gtid_domain_id=1010' added to node1 configuration file
. sandbox server started
# option 'gtid_domain_id=1020' added to node2 configuration file
. sandbox server started
# option 'log-slave-updates' added to node3 configuration file
# option 'gtid_domain_id=1030' added to node3 configuration file
. sandbox server started
# option 'gtid_domain_id=1040' added to node4 configuration file
. sandbox server started
# option 'gtid_domain_id=1050' added to node5 configuration file
. sandbox server started
# HUB node3 port: 19023
# node node1 port: 19021
./node1/use -e "CHANGE MASTER 'hub_node1' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19023, MASTER_USE_GTID=current_pos "
./node3/use -e "CHANGE MASTER 'node1_hub' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19021, MASTER_USE_GTID=current_pos "
./node3/use -e "START SLAVE 'node1_hub' "
./node1/use -e "START SLAVE 'hub_node1' "
# node node2 port: 19022
./node2/use -e "CHANGE MASTER 'hub_node2' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19023, MASTER_USE_GTID=current_pos "
./node3/use -e "CHANGE MASTER 'node2_hub' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19022, MASTER_USE_GTID=current_pos "
./node3/use -e "START SLAVE 'node2_hub' "
./node2/use -e "START SLAVE 'hub_node2' "
# node node4 port: 19024
./node4/use -e "CHANGE MASTER 'hub_node4' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19023, MASTER_USE_GTID=current_pos "
./node3/use -e "CHANGE MASTER 'node4_hub' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19024, MASTER_USE_GTID=current_pos "
./node3/use -e "START SLAVE 'node4_hub' "
./node4/use -e "START SLAVE 'hub_node4' "
# node node5 port: 19025
./node5/use -e "CHANGE MASTER 'hub_node5' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19023, MASTER_USE_GTID=current_pos "
./node3/use -e "CHANGE MASTER 'node5_hub' TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=19025, MASTER_USE_GTID=current_pos "
./node3/use -e "START SLAVE 'node5_hub' "
./node5/use -e "START SLAVE 'hub_node5' "

The test script gives the same outcome that we've seen for MySQL 5.7. The status of the slave connections is only slightly different dues to the implementation:

$ ./use_all 'SHOW ALL SLAVES STATUS\G' | grep 'server:\|Running:\|Connection_name:'
# server: 1:
              Connection_name: hub_node1
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
# server: 2:
              Connection_name: hub_node2
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
# server: 3:
              Connection_name: node1_hub
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Connection_name: node2_hub
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Connection_name: node4_hub
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Connection_name: node5_hub
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
# server: 4:
              Connection_name: hub_node4
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
# server: 5:
              Connection_name: hub_node5
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

What may be surprising is the content of the crash-safe table.

$ for N in 1 2 3 4 5; do ./n$N -e 'select @@server_id, @@gtid_domain_id; select * from mysql.gtid_slave_pos' ; done
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         101 |             1010 |
+-------------+------------------+
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|      1010 |      2 |       101 |      2 |
|      1010 |      3 |       101 |      3 |
|      1020 |      5 |       102 |      2 |
|      1020 |      6 |       102 |      3 |
|      1030 |      8 |       103 |      2 |
|      1030 |      9 |       103 |      3 |
|      1040 |     11 |       104 |      2 |
|      1040 |     12 |       104 |      3 |
|      1050 |     14 |       105 |      2 |
|      1050 |     15 |       105 |      3 |
+-----------+--------+-----------+--------+
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         102 |             1020 |
+-------------+------------------+
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|      1010 |      2 |       101 |      2 |
|      1010 |      3 |       101 |      3 |
|      1020 |      5 |       102 |      2 |
|      1020 |      6 |       102 |      3 |
|      1030 |      8 |       103 |      2 |
|      1030 |      9 |       103 |      3 |
|      1040 |     11 |       104 |      2 |
|      1040 |     12 |       104 |      3 |
|      1050 |     14 |       105 |      2 |
|      1050 |     15 |       105 |      3 |
+-----------+--------+-----------+--------+
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         103 |             1030 |
+-------------+------------------+
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|      1010 |      2 |       101 |      2 |
|      1010 |      3 |       101 |      3 |
|      1020 |      5 |       102 |      2 |
|      1020 |      6 |       102 |      3 |
|      1040 |      8 |       104 |      2 |
|      1040 |      9 |       104 |      3 |
|      1050 |     11 |       105 |      2 |
|      1050 |     12 |       105 |      3 |
+-----------+--------+-----------+--------+
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         104 |             1040 |
+-------------+------------------+
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|      1010 |      2 |       101 |      2 |
|      1010 |      3 |       101 |      3 |
|      1020 |      5 |       102 |      2 |
|      1020 |      6 |       102 |      3 |
|      1030 |      8 |       103 |      2 |
|      1030 |      9 |       103 |      3 |
|      1040 |     11 |       104 |      2 |
|      1040 |     12 |       104 |      3 |
|      1050 |     14 |       105 |      2 |
|      1050 |     15 |       105 |      3 |
+-----------+--------+-----------+--------+
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         105 |             1050 |
+-------------+------------------+
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|      1010 |      2 |       101 |      2 |
|      1010 |      3 |       101 |      3 |
|      1020 |      5 |       102 |      2 |
|      1020 |      6 |       102 |      3 |
|      1030 |      8 |       103 |      2 |
|      1030 |      9 |       103 |      3 |
|      1040 |     11 |       104 |      2 |
|      1040 |     12 |       104 |      3 |
|      1050 |     14 |       105 |      2 |
|      1050 |     15 |       105 |      3 |
+-----------+--------+-----------+--------+

Here we see that each endpoint shows the GTIDs of the data received from the other nodes, and the data produced in the endpoint itself. However, the hub only shows the data received from the endpoints, and not the data that was created in the hub. This bizarre situation is due to the position of log-slave-update in one node only. The issue was explained in one maria-developers thread, but it feels like a bug to me.

Replacing the hub

Since the hub is the SPOF in a star topology, before attempting to install one, we should at least have an idea of how to replace the hub in case of failure. In our test deployment, we have a candidate to replace the hub in case of need. Let's apply some load to the system by inserting data simultaneously in all endpoints, and killing the hub while the load goes on.

With the hub gone, we will see that the endpoints are not updated:

$ for N in 1 2 3 4 5; do ./n$N -e 'select @@server_id, @@server_uuid; select @@global.gtid_executed\G'; done
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 3d9dec68-42ce-11e5-86be-185460e24dd9 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-1163,
41918672-42ce-11e5-ae5e-27905d218c5f:1-236,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-82,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-75
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         102 | 41918672-42ce-11e5-ae5e-27905d218c5f |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-92,
41918672-42ce-11e5-ae5e-27905d218c5f:1-1163,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-82,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-75
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         104 | 4a72292c-42ce-11e5-9b80-e17cc877fb87 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-92,
41918672-42ce-11e5-ae5e-27905d218c5f:1-236,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-1163,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-75
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         105 | 4ef58dae-42ce-11e5-9981-28967a9fad86 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-92,
41918672-42ce-11e5-ae5e-27905d218c5f:1-236,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-82,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-1163

You see that for each node we get a large transaction number (it's the position reached by the endpoint itself) and for the other nodes we only have small numbers. Now we can promote node4 to be the new hub. To do so, we need to remove the connections between the dead hub and the endpoints, and replace them with the new ones.

./node4/use -e "STOP  SLAVE FOR CHANNEL 'hub_node4'"
./node4/use -e "STOP  SLAVE "
./node4/use -e "RESET  SLAVE FOR CHANNEL 'hub_node4'"
./node4/use -e "RESET  SLAVE "
# HUB node4 port: 8382
# node node1 port: 8379
./node1/use -e "STOP  SLAVE FOR CHANNEL 'hub_node1'"
./node1/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8382, MASTER_AUTO_POSITION=1 for channel 'hub_node1'"
./node4/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8379, MASTER_AUTO_POSITION=1 for channel 'node1_hub'"
./node4/use -e "START SLAVE FOR CHANNEL 'node1_hub'"
./node1/use -e "START SLAVE FOR CHANNEL 'hub_node1'"
# node node2 port: 8380
./node2/use -e "STOP  SLAVE FOR CHANNEL 'hub_node2'"
./node2/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8382, MASTER_AUTO_POSITION=1 for channel 'hub_node2'"
./node4/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8380, MASTER_AUTO_POSITION=1 for channel 'node2_hub'"
./node4/use -e "START SLAVE FOR CHANNEL 'node2_hub'"
./node2/use -e "START SLAVE FOR CHANNEL 'hub_node2'"
# node node5 port: 8383
./node5/use -e "STOP  SLAVE FOR CHANNEL 'hub_node5'"
./node5/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8382, MASTER_AUTO_POSITION=1 for channel 'hub_node5'"
./node4/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8383, MASTER_AUTO_POSITION=1 for channel 'node5_hub'"
./node4/use -e "START SLAVE FOR CHANNEL 'node5_hub'"
./node5/use -e "START SLAVE FOR CHANNEL 'hub_node5'"

After about one minute, we poll the system again, and we see that all the endpoints have caught up with the rest.

$ for N in 1 2 3 4 5; do ./n$N -e 'select @@server_id, @@server_uuid; select @@global.gtid_executed\G'; done
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 3d9dec68-42ce-11e5-86be-185460e24dd9 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-1163,
41918672-42ce-11e5-ae5e-27905d218c5f:1-1163,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-1163,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-1163
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         102 | 41918672-42ce-11e5-ae5e-27905d218c5f |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-1163,
41918672-42ce-11e5-ae5e-27905d218c5f:1-1163,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-1163,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-1163
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         104 | 4a72292c-42ce-11e5-9b80-e17cc877fb87 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-1163,
41918672-42ce-11e5-ae5e-27905d218c5f:1-1163,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-1163,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-1163
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         105 | 4ef58dae-42ce-11e5-9981-28967a9fad86 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 3d9dec68-42ce-11e5-86be-185460e24dd9:1-1163,
41918672-42ce-11e5-ae5e-27905d218c5f:1-1163,
45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3,
4a72292c-42ce-11e5-9b80-e17cc877fb87:1-1163,
4ef58dae-42ce-11e5-9981-28967a9fad86:1-1163

The small sequence that is still in the middle (45f1991e-42ce-11e5-a853-1f71d8e7014b:1-3) is the one belonging to the failed hub.

Hybrid topologies

What we have seen so far are clean cut topologies, where all our servers fit into a simple definition, be it master-slave,fan-in, all-masters, or star. But with the elements that we have examined, we can create topologies that are a mix of the simple ones.

The ingredients to create your own customized deployments are the following:

  • GTID: don't leave home without it! When dealing with complex topologies, knowing where your data comes from is paramount.
  • Point-to-point connections, the ones introduced by multi-source enhancements in MySQL 5.7 and MariaDB 10: You should never mix old fashion connections (with nameless channels: see MySQL replication in action - Part 2 - Fan-in topology)
  • log-slave-updates: by using this property wisely, you will create your own hubs when needed.

To make an example of what we can do, we will enhance a star topology to include two P2P masters and two regular slaves.

Topologies hybrid1

Figure 5 - A hybrid topology including 1 star, 2 P2P, and 2 simple slaves

In Figure 5, node #4 is connected with node #5 in P2P all-masters. Neither node has log-slave-updated enabled. From node #4 we also have a simple slave (node #6).

We have a mirror deployment with nodes #2, #7, and #8, but this time the connection node has log-slave-updated enabled.

If you want to try this at home, you can use a proof-of-concept from mysql-replication-samples. Here is the relevant chunk from that script:

./node1/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8381, MASTER_AUTO_POSITION=1 for channel 'hub_node1'"
./node3/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8379, MASTER_AUTO_POSITION=1 for channel 'node1_hub'"
./node3/use -e "START SLAVE FOR CHANNEL 'node1_hub'"
./node1/use -e "START SLAVE FOR CHANNEL 'hub_node1'"
# node node2 port: 8380
./node2/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8381, MASTER_AUTO_POSITION=1 for channel 'hub_node2'"
./node3/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8380, MASTER_AUTO_POSITION=1 for channel 'node2_hub'"
./node3/use -e "START SLAVE FOR CHANNEL 'node2_hub'"
./node2/use -e "START SLAVE FOR CHANNEL 'hub_node2'"
# node node4 port: 8382
./node4/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8381, MASTER_AUTO_POSITION=1 for channel 'hub_node4'"
./node3/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8382, MASTER_AUTO_POSITION=1 for channel 'node4_hub'"
./node3/use -e "START SLAVE FOR CHANNEL 'node4_hub'"
./node4/use -e "START SLAVE FOR CHANNEL 'hub_node4'"
# node node5 port: 8383
./node5/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8382, MASTER_AUTO_POSITION=1 for channel 'node4_node5'"
./node4/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8383, MASTER_AUTO_POSITION=1 for channel 'node5_node4'"
./node4/use -e "START SLAVE FOR CHANNEL 'node5_node4'"
./node5/use -e "START SLAVE FOR CHANNEL 'node4_node5'"

# node node6 port: 8384
./node6/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8382, MASTER_AUTO_POSITION=1 for channel 'node4_node6'"
./node6/use -e "START SLAVE FOR CHANNEL 'node4_node6'"

# node node7 port: 8385
./node7/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8380, MASTER_AUTO_POSITION=1 for channel 'node2_node7'"
./node2/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8385, MASTER_AUTO_POSITION=1 for channel 'node7_node2'"
./node7/use -e "START SLAVE FOR CHANNEL 'node2_node7'"
./node2/use -e "START SLAVE FOR CHANNEL 'node7_node2'"

# node node8 port: 8386
./node8/use -e "CHANGE MASTER TO master_host='127.0.0.1', master_user='rsandbox', master_password='rsandbox', master_port=8380, MASTER_AUTO_POSITION=1 for channel 'node2_node8'"
./node8/use -e "START SLAVE FOR CHANNEL 'node2_node8'"

To test the deployment, we create one table in every slave (all nodes except #6 and #8) and then check what each node has received from replication:

# server: 1:
t1  # (created)
t2  # host2-host3-host1
t3  # host3-host1
t4  # host4-host3-host1
t7  # host7-host2-host3-host1
# server: 2:
t1  # host1-host3-host2
t2  # (created)
t3  # host3-host2
t4  # host4-host3-host2
t7  # host7-host2
# server: 3:
t1  # host1-host3
t2  # host2-host3
t3  # (created)
t4  # host4-host3
t7  # host7-host2-host3
# server: 4:
t1  # host1-host3-host4
t2  # host2-host3-host4
t3  # host3-host4
t4  # (created)
t5  # host5-host4
t7  # host7-host2-host3-host4
# server: 5:
t4  # host4-host5
t5  # (created)
# server: 6:
t4  # host4
# server: 7:
t1  # host1-host3-host2-host7
t2  # host2-host7
t3  # host3-host2-host7
t4  # host4-host3-host2-host7
t7  # (created)
# server: 8:
t1 # host1-host3-host2-host8
t2 # host2-host3-host2-host8
t3 # host3-host2-host8
t4 # host4-host3-host2-host8
t7 # host7-host2-host8

Most interestingly, in node #5, which is in P2P with node #4, but without log-slave-updates, we only find what was produced in node#4. We get a similar outcome in node #6.

Conversely, in node #7 and #8 we get everything from every master, except from node #5, because it lacks the pass-through of log-slave-update in node #4, its connection to the cluster.

Topologies hybrid2

Figure 6 - A hybrid topology including 3 P2P, and 2 simple slaves

In a similar way, we can extend a all-masters P2P topology to have 2 more nodes in master-to-master replication with one node (Figure 6). The outcome of this deployment will be the same seen for Figure 5, with the difference that this topology requires more connections.

Topologies hybrid3

Figure 7 - A hybrid topology including 2 P2P clusters joined by a node.

Using the same principle, we can join two clusters by enabling log-slave-updates in one of their nodes and putting these enhanced nodes in P2P replication. In Figure 7 you see how you may join two all-masters P2P topologies.

Topologies hybrid4

Figure 8 - A hybrid topology including 2 star clusters joined by their hubs.

Joining two star topologies does not require any extra operations except running CHANGE MASTER TO, because the hubs are already enabled for dispatching events.

Topologies hybrid5

Figure 9 - A hybrid topology including 1 P2P cluster and a star cluster joined by a node.

Finally, you can join two different topologies by linking one node from each, like in figure 9.

Summing up

Star topologies are an interesting alternative to more robust deployments. They require more maintenance efforts in case of a failure, but they are agile and easy to expand.

Setting up a star topology makes us understand the building blocks of complex topologies. Armed with this knowledge, we can create hybrid topologies that fit our needs.

What's next

We have explored all the most interesting aspects of single and multiple master deployments. There are many more enhancements in replication, but the most striking one is the ability of running parallel appliers in the slave. This will be the focus of our next (and last) episode.

Vote on Planet MySQL