It's a tricky problem. Even with my experience with regular expressions, I knew that using a script to extract statements related to a single table was going to be a nasty business.
However, I know of an entity that can filter by table name efficiently, and that's the MySQL replication system. So I suggested using replication to a sandbox with a replicate-wild-do-table statement to get the job done.
My friend was skeptical and did not want to go that way. I was busy writing an article for an Italian magazine and did not follow up immediately. But today, with the article safely in the editor's hands, I did a quick test, and guess what? It works!
Here is a step-by-step procedure to do it. I started with a server built with MySQL Sandbox, using MySQL 5.5.7. I used the employees test database to create a large enough binlog, and soon I had a database containing 160 MB of data and a binary log of about the same size.
Then I decided that I wanted to filter the binlog, to get only statements about the employees table. Thus, I issued this command:
The "-c" option transfers its argument to the sandbox configuration file.
$ make_sandbox 5.5.7 --sandbox_port=6000 \ --sandbox_directory=trans_repl \ -c log-slave-update \ -c replicate-wild-do-table=employees.employees \ -c log-bin=mysql-bin \ -c server-id=101
At the end of this operation, I had one server with the same version of the server that I had filled with the employee database. The server is ready to filter replicated streams, accepting only commands that affect the table 'employees' within the database 'employees'.
The second step was to create an empty database in the second server, with the Innodb tables converted to BlackHole (to avoid wasting unnecessary space).
Inside the first sandbox, I did this:
Combining the flexibility of the sandbox with some command line skills, the operation requires just one command.
$ ./my sqldump -B --no-data employees \ | perl -pe 's/=innodb/=blackhole/i' \ | ~/sandboxes/trans_repl/use
Before starting the replication, I needed to avoid re-creating the tables, or my blackhole trick would have been useless. So I looked at the binary log, and found where the CREATE TABLE statements ended:
$ ./my sqlbinlog ./data/mysql-bin.000001 |less
# at 3057
#101112 9:48:45 server id 1 end_log_pos 3364 Query thread_id=1 exec_time=0 error_code=0
INSERT INTO `departments` VALUES ('d001','Marketing'),('d002','Finance'),('d003','Human Resources'),('d004','Production'),('d005','Development'),('d006','Quality Management'),('d007','Sales'),('d008','Research'),('d009','Customer Service')/*!*/;
Armed with this knowledge, I logged in the second server and did the following:
After a few seconds, I issued a "SHOW SLAVE STATUS". All was OK.
FLUSH BINARY LOGS; # to get a binlog with only the statements that I need CHANGE MASTER TO master_host='127.0.0.1', master_port=5570, master_user='msandbox', master_password='msandbox', master_log_file='mysql-bin.000001', master_log_pos=3057; # this is the position after all the CREATE TABLE # statements in the master START SLAVE;
I flushed the logs again and inspected the second binary log. As expected, it contained only the statements related to the employees table.
Total cost of the operation: 5 minutes. Way quicker than writing this report!