Thursday, October 08, 2015

Sound advice for GTID, with caveats

During the PerconaLive conference in Amsterdam, I attended a session where I heard a good piece of advice about using GTID. It amounts to: look at SHOW SLAVE STATUS output, and if you see more than one line in the Executed_Gtid_Set field, this tells you immediately if someone has written on a slave database.
This is good advice. Let's dissect it. Here is what a regular slave looks like, when nobody has messed up with it:
*************************** 1. row ***************************
             Master_Server_Id: 1
                  Master_UUID: 00013454-1111-1111-1111-111111111111
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 1
1 row in set (0.00 sec)
What you see here is a slave that has received transactions from a single source (Retrieved_Gtid_Set lists only one GTID set) and has applied data from a single source (also Executed_Gtid_Set shows a single item.)
Notice that this advice holds true even when the slave being considered is an intermediate one, i.e. a relay slave which is master of one or more slaves. Due to the nature of GTIDs, even though the intermediate slave is recording the transactions to its own binary log, the transaction identifier does not change. Thus you should see a clean set of transactions throughout the chain. For example, if you have another slave that is replicating from slave #2, you would see something like this:
*************************** 1. row ***************************
             Master_Server_Id: 102
                  Master_UUID: 00013456-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 0

Monday, October 05, 2015

MySQL-Sandbox 3.1.01 - First release after the change

I have released MySQL-Sandbox 3.1.01, which is the first release after the move to GitHub. While the changes are not so spectacular (it's a minor release, with mostly bug fixes), I am pleased to see that the move has started producing collaboration. Two of the changes were provided by Daniƫl van Eeden and Mark Leith, who have scratched some of their own itches by providing useful patches.

All in all, this period of working with GitHub has been liberating. Although Bazaar plays with the same principles of git, it lacks most of the tools and the know-how which characterizes git. Add to this that also my team has moved Tungsten Replicator to Github, and with that I found myself all of a sudden free of old revision control systems, and master of my own time.

Back to MySQL-Sandbox: while its enhancements may not amount to much, it helped me to discover several bugs in MySQL, some of which were addressed and solved quickly. So, I have had a deeper relationship with the community, with the experience of being at both ends of the collaboration ops.

The last notable piece of news about this release is that it has been tested with the latest and greatest available: a preview of MySQL 5.7.9 and the latest MariaDB 10.1.6. With this, I hope to witness a GA release of either flavor that does not break MySQL-Sandbox. We'll see!

Monday, September 28, 2015

MySQL 5.7 : Playing with mysqlpump

MySQL 5.7 comes with a new backup tool, named mysqlpump, which is almost the same as mysqldump with the ability of extracting data in parallel threads.

I tried a little experiment. Using a server containing 11 databases, with a total of 300 tables and about 20 million rows (roughly ≈ 10GB,) I used both mysqldump and mysqlpump to get a backup.

mysqldump --all-databases  > dump.sql
mysqlpump --all-databases \
    --add-drop-database --add-drop-table --skip-watch-progress \
    --default-parallelism=10 \
    --parallel-schemas=db,db1,db2 \
    --parallel-schemas=db3,db4,db5 \
    --parallel-schemas=db6,db7,db8 \
    --parallel-schemas=db9,db10 > pump.sql

The backup with mysqldump took 3 minutes and 33 seconds. The one with mysqlpump took 2 minutes and 55 seconds (saving 38 seconds). This does not seem to be a great gain. I experimented with several values of default-parallelism and different grouping of databases, and also without any parameters at all, but I always get the same time.

If there is a different way of invoking mysqlpump to use parallelism better, I would like to know.

There are four interesting points about mysqlpump that users should know:

  • mysqlpump has options to include and exclude objects (databases, tables, routines, users) from the backup. This is a long awaited feature that will be welcome by many DBAs.
  • The option --no-data is called --skip-dump-rows. (Just in case you want to use the new tool alternate way of reproducing DDL. But be aware that there is at least one bug)
  • A backup created with mysqlpump can only be loaded into a database of the same name. This is due to the parallel work, which requires that the INSERT statements contain both the database and the table names. But it means that, unlike with mysqldump, you can't backup tables from database X and load them to database Y.
  • The most serious limitation of mysqlpump, which I have seen both in the manual and in a blog article is that, while the backup is parallelized, the restore is serialized. Both sources say to run "mysqlpump > file.sql" and "mysql < file.sql". What is the advantage of extracting data with N parallel threads if I then need to apply it with a single thread? I would have expected an option to create N files, which I can then load using several background tasks, or even better an option in the mysql client to handle parallel backup files. I may be missing something here. I will appreciate comments by more savvy users.

The idea is good. The tool still has some rough edges, but I am sure it can be improved.

Tuesday, September 15, 2015

Percona Live Amsterdam - September 21-23, 2015

PL EuropeLogo FullInv CMYK Final Horiz EMAIL

I am attending Percona Live Amsterdam 2015 on September 21-23, 2015.

I will be on stage three times:

My first talk is a topic that has ben among my favorites for long time: I published an article about it in 2001, and several more in the years to come.

The second one is a summary of what I have written recently about replication technologies.

The lightning talks are a collection of 5-minutes long talks that are presented by different speakers. For the first time, the LT are held in a separate room instead of being attached to one of the community events. It will be fun!

Percona has just released a mobile app for the conference for both iOS and Android. With it, it is possible to set a personalized schedule, follow the show more closely, and get in touch with other attendees. It is a very good addition!

There is much to watch at the conference, and I look forward to seeing the latest innovation in the field. I will miss some very interesting talks because they are at the same time as mine (!!) but I hope I will catch up with the speakers in the conference hall.

Monday, September 14, 2015

Improving Sakila database

The Sakila sample database was created almost 10 years ago, as a sample set of data for MySQL courses and examples.

The database was developed by MySQL employees, with substantial contributions form the community.

Recently, the database was updated to use some of the features in MySQL 5.7. As a result, we had two sets of samples, one to use with MySQL 5.0+, and one that only loads with MySQL 5.7.

I filed a feature request, offering a patch to use conditional schema and data changes, which was incorporated very quickly into the official release.

The current release, available within the MySQL docs, has conditional comments such as this:

/*!50610 ALTER TABLE film_text engine=InnoDB */ ;

Using these comments, we can enable specific features if the version is at least the one indicated in the comment. So, for example, we can use InnoDB tables with full-text indexes starting with version 5.6. The original table is MyISAM, but if the current version is at least 5.6.10 (that's the meaning of !50610) then the engine is changed to InnoDB.

>Similarly, there is a GEOMETRY column and SPATIAL key in the 'address' table, which are only enabled for MySQL 5.7.5+. A similar comment allows the loading of the relevant data only in MySQL 5.7.

Using these new files, you can install the Sakila database using any version of MySQL from 5.0 onwards, and it will always load correctly.

Monday, September 07, 2015

Sample employees database migrated to GitHub

It's migration time. There was another project that I use often and was still in Launchpad. The Sample Employees Database is now on GitHub, under the same license it had before (CC A-SA 3).
Figure 1 - Employees database
This database is interesting because it is not too small (like Sakila) and not too big. It has enough data to allow you to test in a non trivial way.

Wednesday, September 02, 2015

How MySQL-Sandbox is tested, and tests MySQL in the process

MySQL-Sandbox is a great tool for testing a new release, and in fact this is what I do when a new MySQL tarball becomes available. I don't think many people are aware of the full testing capabilities of the sandbox, though.
When you think about testing, you may just think of creating a sandbox with the new tarball, and then hammering it with your pet procedure. That works, of course, as the main purpose of MySQL-Sandbox is to allow you to do just that. There is, however, a full test suite that can tell you in a short while if your tarball is compatible with the past or not.
This procedure is quite strict. It has happened several times that I caught a bug in a new release of MySQL, or Percona Server, or MariaDB, just by running this suite.

Monday, August 31, 2015

MySQL replication in action - Part 5 - parallel appliers

Previous episodes:

Parallel replication overview

One of the main grievance of replication users is that, while a well tuned master server can handle thousands of concurrent operations, an equally tuned slave is constrained to work on a single thread. In Figure 1, we see the schematics of this paradigm. Multiple operations on the master are executed simultaneously and saved to the binary log. The slave IO thread copies the binary log events to a local log, and on such log the SQL thread executes the events on the slave database. When the master is very active, chances are that the slave lags behind, causing hatred and nightmares to the DBAs.
Single applier
Figure 1 - Single applier

Tuesday, August 25, 2015

New MySQL Sandbox 3.1 - GitHub, and usability

I have three pieces of information to share about MySQL::Sandbox:
  • Version 3.1.0 has migrated from Launchpad to GitHub
  • This version is released under the Apache license. Both these changes are meant to improve and promote cooperation on the project.
  • There is an important change related to usability. When using replication with MySQL::Sandbox and MySQL 5.6+, the server UUIDs become more readable (see below).
First, some words on the location changes. About two years ago, I started plans for a rewrite of MySQL::Sandbox. Then, I had some unexpected changes, which involved moving home to a different continent twice within twelve months. The project was neglected, but I haven't dismissed it. While I wait for the rewrite to start, I wanted to get rid of the obstacles for rapid development, and I decided to transfer the current codebase to GitHub. This will allow me to use only one RCS instead of three (My team has abandoned svn too 1).
Apart from the changes described in this post, there is little difference in the code published on GitHub.

Monday, August 24, 2015

Tungsten Replicator moved to GitHub with Apache license

It had been in the making for long time. Google announced that Google Code would be closing, and since then the Continuent team has been hard at work to handle the transition. You can guess it: this operation would have been quicker if it had been done by a small company like we were one year ago, but being part of a large corporation introduces some constraints that have affected our schedule.

However, our wish has always been, and still is, to keep Tungsten Replicator as an open source product, with full functionalities and with the full benefits that the open source development model offers.

Today, Tungsten Replicator is available on GitHub as vmware/tungsten-replicator, and it is wearing new clothes. It is not GPL anymore. In an effort to facilitate contributions, its license was changed to Apache 2.0.

Feature-wise, there is little difference from the previous release of 4.0. Mainly, we have cleaned up the code and moved out the pieces that no longer fit:

  1. Bristlecone was removed from the package. It is used only for testing, and it will be released separately. There is no need to duplicate it into every Tungsten tarball.
  2. The cookbook recipes have been retired. These scripts were created when the installer was still in its infancy and we had little documentation. Therefore, it was convenient to have wrappers for the common installation operations. Using the manual, it is pretty easy to install master/slave, fan-in, and multi-master topologies. The biggest reason for removing the cookbook, though, is that it was only useful for MySQL replication. If you need heterogenous deployments, the cookbook was an obstacle, rather than being helpful.
  3. Some files were shuffled within the deployment tree. The ./tungsten-replicator/scripts directory was merged with ./tungsten-replicator/bin, the applier templates were moved from samples to a dedicated path, and we also did some other similar cleanup.

Although it has changed location and license, this is not a "release." If you compile the code, it will come up as 4.1, but it is still work in progress. Same as what was happening in the previous repository, we tag the code with the next version, and start working on it until it is deemed ready for release. The latest release for production (4.0.1) is still available from the old directory.

The code is available on GitHub, which makes collaboration much simpler than the previous repository. Take advantage of it: fork it, and help make the best replication tool even better!

MySQL usability issues for replication

In my latest series of advanced replication features, I came across several usability issues, which I would like to recap here. For each section of this list of requests, I make a wish list, with some general comments.


As the maintainer of MySQL Sandbox, a tool that wants to facilitate the installation of MySQL for testing, I am particularly sensitive to the annoyances during installation. I have covered the matter extensively in recent weeks and many times before. My point is that the logging of the installation must be useful for the DBA, not to the MySQL developers.
  • Make a big distinction between [Info] and [Warning] messages. Warnings should only appear when there is some risk. Using a warning to tell the user that the InnoDB log file was created is a nonsense.
  • … and then make [Info] messages suppressible. Currently, I can suppress warnings, but then I would suppress also the real warnings, in addition to the fake ones.
  • Make mysqld –initialize use the correct syntax for timestamps, instead of issuing a warning about the obsolete syntax that was used. (See In search of cleanliness : the elusive quiet installation for a full explanation)


Here we have two main problems. The first one is that the introduction of multi source channels has left some new ways of shooting yourself in the foot. By mixing the new explicitly named channels with the hidden one (but still used in regular replication!) which is named "" (= the empty string), we may start to see a new series of blunders in replication.
The second set of problems comes from weak integration between existing features and the new one. There are operation modes, such as semi-synchronous, delayed, and parallel replication that were designed with a single data stream in mind, and that lack the capability of being tuned for different channels.
  • Make it impossible to use the unnamed channel and the named ones at the same time. Currently only a very weak protection exists against mixing named and unnamed channels. Also the syntax for channel ’’ should result in an error when named channels are defined.
  • Integrate features that were designed for single sources to work better with multi source: semi-sync replication, parallel applier, delayed replication.

Wednesday, August 19, 2015

MySQL replication in action - Part 4 - star and hybrid topologies

Previous episodes:

Introducing star topology.

In all-masters P2P topologies, we have seen that we have a way of deploying a topology where all nodes are masters, and achieve better efficiency and stability than ring topologies. That method comes at the price of a complex setup, which requires, for a N-node cluster, N*(N-1) connections.
We can achieve the same result as in a P2P all-masters topology by trading connections for stability. In a star topology (Figure 1) all nodes are masters, but they do not connect to each other directly. There is a special node, named hub, which receives the changes produced by each endpoint and spreads them to the others.
Topologies star
Figure 1 - A star topology

Monday, August 17, 2015

MySQL replication in action - Part 3: all-masters P2P topology

Previous episodes:

In the previous article, we saw the basics of establishing replication from multiple origins to the same destination. By extending that concept, we can deploy more complex topologies, such as the point-to-point (P2P) all-masters topology, a robust and fast way of moving data.

Introduction to P2P all-masters topology

A P2P (Point-to-point) topology is a kind of deployment where replication happens in a single step from the producer to the consumers. For example, in a master/slave topology, replication from the master (producer) reaches every slave (consumer) in one step. This is simple P2P replication. If we use a hierarchical deployment, where every slave that is connected to the master is also replicating to one or more slaves, we will have a 2-step replication (Figure 1). Similarly, in circular replication, we have as many steps as the number of nodes minus one (Figure 2.)
Hierarchical master slave processing Figure 1 - Hierarchical replication depth of processing

Friday, August 14, 2015

MySQL replication in action - Part 2 - Fan-in topology

Introduction: where we stand

Previous episodes:

In the latest releases of MySQL and MariaDB we have seen several replication improvements. One of the most exciting additions is the ability to enhance basic replication with multiple sources. Those who have used replication for a while should remember that one of the tenets of the “old” replication was that a slave couldn’t have more than one master. This was The Law and there was no escape ... until now. The only way to work around that prohibition was to use circular replication, also known as ring replication, where each node is slave of the previous node and master of the next one.
Circular replication

Wednesday, August 12, 2015

MySQL replication in action - Part 1: GTID & Co

In the theoretical part of this series, we have seen the basics of monitoring. In that article, though, we have barely mentioned the new tools available in MySQL 5.7 and MariaDB 10. Let’s start from something that has the potential of dramatically changing replication as we know it.

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Global transaction identifiers (GTID) is a feature that has been in my wish list for long time, since the times I was working with the MySQL team. By the time I left Oracle, this feature was not even in the plans.
When MySQL 5.6 was first disclosed, the biggest improvement for replication was the introduction of crash-safe tables (see Status persistence in Monitoring 101.) There are two tables in the mysql database, named slave_master_info and slave_relay_log_info. At the beginning, these tables were using the MyISAM engine, thus defeating the purpose of making them crash-safe. In later versions, the developers decided to bite the bullet and create these tables with innodb from the beginning.
These two tables allow us to see the same information previously stored in the files and What makes these tables convenient is that they should survive a crash better than the standalone files.

Monday, August 10, 2015

MySQL::Sandbox 3.0.66 - improved usability and support for newest releases

The latest MySQL Sandbox, version 3.0.66 is out. It has a few new features (as always, when I find myself doing the same thing many times, I script it) and improved support for latest releases of MySQL. You can now install, among other versions, MySQL 5.7.8 and MariaDB 10.1.x

Some notable additions in this release are in the scripts that are created and customized for each sandbox. There are many of them and when one more arrives, it's easy to overlook it. So, here are the new arrivals.

Thursday, August 06, 2015

Changed defaults between MySQL 5.6 and 5.7

MySQL 5.7 comes with many changes. Some of them are better explained than others.

I wanted to see how many changes I could get by comparing SHOW VARIABLES in MySQL 5.6 and 5.7.
The most notable ones are:

  • binlog_format: the default is now ROW. This variable affects the format of the binary log, whether you use it as a backup complement or for replication, the change means bigger binary logs and possibly side effects.
  • binlog_error_action now defaults to ABORT_SERVER. If the server cannot write to the binary log, rather than continuing its work without logging, it shuts down. This could be a desirable course of action, but better be prepared for the eventuality.
  • innodb_strict_mode is enabled by default, which is probably a good thing, but it means that previously accepted events will now generate an error instead than a warning.
  • sql_mode is now STRICT by default. While many well prepared users will be pleased with this change, which was advocated as best practice by some DBAs, the practical outcome is that several exiting applications may break because of unclean input.
  • sync_binlog, which affects data safety but also server performance is enabled.

Wednesday, August 05, 2015

In search of cleanliness : the elusive quiet installation

UPDATE: Almost solved! See at the end.

A clean installation of a database server is one where everything goes according to the expectations. It used to be easy: you only had to do what the manual says, and, presto! you would see your database server installed and ready to use. If something went wrong, you got one or more error messages that informed you of what needs to be fixed.

Sometimes, rarely, it happened that you got also a warning message, telling you that while the installation was successful, you could improve it by fine tuning this and that. No big deal.

Gone are those times. A clean installation nowadays is a much harder exercise, if not impossible. Let’s give it a try using MySQL 5.7.7.

Tuesday, August 04, 2015

Yet another MySQL 5.7 silent change

When a new version of MySQL appears, the first source of information for the brave experimenter is a page in the manual named What is new in MySQL X.X, also known as MySQL in a nutshell. For MySQL 5.7, the in-a-nutshell page lists quite a lot of changes. In that page, the list of removed features is enough to send a chill down the spine of most any DBA. Some of the items in the deprecation section are removals in disguise, as they require immediate, rather than delayed, action to use the new version with existing application (SET PASSWORD comes to mind immediately.)

In all that abundance of changes, there is at least one that may cause huge discomfort to many users:

mysql [localhost] {msandbox} (information_schema) > select @@version;  
| @@version |  
| 5.7.8-rc  |  
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > select * from GLOBAL_STATUS;  
Empty set, 1 warning (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > show warnings\G  
*************************** 1. row ***************************  
  Level: Warning  
   Code: 1287  
Message: 'INFORMATION_SCHEMA.GLOBAL_STATUS' is deprecated and will be removed in  
a future release. Please use performance_schema.global_status instead  
1 row in set (0.00 sec)

Monday, August 03, 2015

MySQL 5.7.8 : features, bugs and rumors

I’ve had a look at a preview release of MySQL 5.7.8, some time before it became available to the general public (perks and duties of an Oracle ACE) and I found a few interesting things among the release notes and the tarball itself:

  • There is a new tool named mysqlpump, which is intended as a replacement for mysqldump, with parallel processing, compression, progress watch, the long awaited ability of excluding databases or tables, and more.
  • The json functionality has been fished out from the labs and added to the main distribution.

I was initially pleased with this preview when it installed without errors in MySQL Sandbox, but my pleasure didn’t last long, as I soon stumbled upon a bug that affects replication. Until MySQL 5.7.7, a user with just REPLICATION SLAVE privileges can activate replication. In MySQL 5.7.8, though, replication fails with a baffling error:

slave1 [localhost] {msandbox} ((none)) > show slave status\G
            Last_IO_Errno: 1142
            Last_IO_Error: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: SELECT command denied to user 'rsandbox'@'localhost' for table 'global_variables'

Vote on Planet MySQL