Save 3 million rows best practice

mesiarm · August 31, 2018, 8:34pm

I want to save 3 million rows with saveMany into mysql database. This function is rare used to import convert and save data from csv. I want to ask about tips to do it efficiently. Should I call saveMany for bulks of (for example) 10 000 rows or is better to save it once? Any other tips?

rrd · September 1, 2018, 10:26am

Definitely you should use smaller chunks. Depending on the size of the data it may be 100 or 1.000 rows.

Graziel · September 1, 2018, 11:57am

there was workshop about it quite a time ago but it should still be viable, if you want to know how it works you would need to watch it from begining

JoeBot · September 6, 2018, 6:44am

Hello, you could also do bulk insert with load file in mysql.

You generate a csv file in weebroot directory.

And you can launch a function like that

    private function ImportAndResetFile(){
            TableRegistry::clear();
            ConnectionManager::alias('default', 'default');
            $new = ConnectionManager::get('default');
            $new->execute(
                "LOAD DATA INFILE 'c:/wamp64/www/yourproject/webroot/messages.csv' IGNORE 
                        INTO TABLE messages
                        FIELDS TERMINATED BY ',' 
                        ENCLOSED BY '\"'
                        ESCAPED BY '\"'
                        LINES TERMINATED BY '\r\n'");
}

mesiarm · September 6, 2018, 6:57am

Good tip if I wouldn’t need to convert data before save

deanoj · September 6, 2018, 1:19pm

My CSV might be half a million rows, but here’s how I tackle it.

create a shell that will read the file X number of lines at a time. I usually just end up with an array of strings (each line in the CSV)
Create a model method to accept the array and persist the data using the ORM.
Take a sample of the CSV to build a test (both for the shell and a hard coded array for the model test)
Get my test passing, then I can swap out the ORM for a prepared statement if needed, as well as other optimisations - having confidence my logic is correct.

Topic		Replies	Views
newEntities & saveMany too slow on a server Need Help	5	588	June 18, 2020
saveMany and query loop - better way to do this? Need Help	3	627	April 7, 2020
Optimization question - mass save and unique email Need Help	1	346	May 3, 2022
Bulk update or create Need Help	1	1137	September 3, 2019
Read csv file and insert in database with validation? Need Help	2	3496	May 16, 2018

Save 3 million rows best practice

Releases

Documentation

Community

Save 3 million rows best practice

Related topics

Releases

Documentation

Community