Save 3 million rows best practice


#1

I want to save 3 million rows with saveMany into mysql database. This function is rare used to import convert and save data from csv. I want to ask about tips to do it efficiently. Should I call saveMany for bulks of (for example) 10 000 rows or is better to save it once? Any other tips?


#2

Definitely you should use smaller chunks. Depending on the size of the data it may be 100 or 1.000 rows.


#3

there was workshop about it quite a time ago but it should still be viable, if you want to know how it works you would need to watch it from begining


#4

Hello, you could also do bulk insert with load file in mysql.

You generate a csv file in weebroot directory.

And you can launch a function like that

    private function ImportAndResetFile(){
            TableRegistry::clear();
            ConnectionManager::alias('default', 'default');
            $new = ConnectionManager::get('default');
            $new->execute(
                "LOAD DATA INFILE 'c:/wamp64/www/yourproject/webroot/messages.csv' IGNORE 
                        INTO TABLE messages
                        FIELDS TERMINATED BY ',' 
                        ENCLOSED BY '\"'
                        ESCAPED BY '\"'
                        LINES TERMINATED BY '\r\n'");
}

#5

Good tip if I wouldn’t need to convert data before save


#6

My CSV might be half a million rows, but here’s how I tackle it.

  1. create a shell that will read the file X number of lines at a time. I usually just end up with an array of strings (each line in the CSV)

  2. Create a model method to accept the array and persist the data using the ORM.

  3. Take a sample of the CSV to build a test (both for the shell and a hard coded array for the model test)

  4. Get my test passing, then I can swap out the ORM for a prepared statement if needed, as well as other optimisations - having confidence my logic is correct.