Increase MySQL output to 80K rows/second in Pentaho Data Integration
- Tune MySQL for better performance on Inserts
- Use the MySQL Bulk loader step in PDI
- Write SQL statements to file with PDI and read them with mysql-binary
When i discussed this with one of my contacts of Basis06 they faced a similar issue a while ago. He mentioned that speed can be boosted by using some simple JDBC-connection setting.
useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true
[[UPDATE 10/2018: In some environments – especially with a high network load iseServerPrepStatements=true is worth a try]]
These options should be entered in PDI at the connection. Double click the connection go to Options and set these values.
Used together, useServerPrepStmts=false
and rewriteBatchedStatements=true
will “fake” batch inserts on the client. Specifically, the insert statements:
INSERT INTO t (c1,c2) VALUES ('One',1);
INSERT INTO t (c1,c2) VALUES ('Two',2);
INSERT INTO t (c1,c2) VALUES ('Three',3);
will be rewritten into:
INSERT INTO t (c1,c2) VALUES ('One',1),('Two',2),('Three',3);
The third option useCompression=true
compresses the traffic between the client and the MySQL server.
Finally I increased the number of copies of the output step to 2 so that there are two treads inserting into the database.
This all together increased the speed to around 84.000 rows a second! WOW!
Source: Julien Hofstede – Pentaho: Increase MySQL output to 80K rows/second in Pentaho Data Integration