Save 3 million rows best practice

I want to save 3 million rows with saveMany into mysql database. This function is rare used to import convert and save data from csv. I want to ask about tips to do it efficiently. Should I call saveMany for bulks of (for example) 10 000 rows or is better to save it once? Any other tips?

Definitely you should use smaller chunks. Depending on the size of the data it may be 100 or 1.000 rows.

there was workshop about it quite a time ago but it should still be viable, if you want to know how it works you would need to watch it from begining

Hello, you could also do bulk insert with load file in mysql.

You generate a csv file in weebroot directory.

And you can launch a function like that

    private function ImportAndResetFile(){
            TableRegistry::clear();
            ConnectionManager::alias('default', 'default');
            $new = ConnectionManager::get('default');
            $new->execute(
                "LOAD DATA INFILE 'c:/wamp64/www/yourproject/webroot/messages.csv' IGNORE 
                        INTO TABLE messages
                        FIELDS TERMINATED BY ',' 
                        ENCLOSED BY '\"'
                        ESCAPED BY '\"'
                        LINES TERMINATED BY '\r\n'");
}

Good tip if I wouldn’t need to convert data before save

My CSV might be half a million rows, but here’s how I tackle it.

  1. create a shell that will read the file X number of lines at a time. I usually just end up with an array of strings (each line in the CSV)

  2. Create a model method to accept the array and persist the data using the ORM.

  3. Take a sample of the CSV to build a test (both for the shell and a hard coded array for the model test)

  4. Get my test passing, then I can swap out the ORM for a prepared statement if needed, as well as other optimisations - having confidence my logic is correct.