Optimization question - mass save and unique email

MarekDM · May 3, 2022, 5:53pm

Hello
Cake have awesome isunique rule, however I’m often importing csvs with a lot of columns per row (or data for more than one table), often including email and I want to maintain unique emails.

To speed things up I’m saving 50 rows at a time with saveManyOrFail and its throwing error on duplicates - cool.

But would it be better (performance wise) to run select query first with those upcoming 50 emails to import, and remove duplicates from saveMany (also disabling this isunique rule before such save) to have clean save with only new and unique entries?

dreamingmind · May 3, 2022, 8:09pm

I have no evidence either way, but my intuition says:

When there are high duplicate counts, filtering first would pay off handsomely.

For low collision counts you wouldn’t see any gains but also wouldn’t see significant losses.

If there were a lot of 50-row chunks to process, you would start to accumulate costs of preparing the email query again and again plus the cost of running that query. But this is balanced by the fact that you could eliminate the Rule that watches for duplicates. And you would have options for preparing a query for reuse rather that depending on the Query class, thus bypassing the query-construction overhead (but locking yourself to your current db language).

Doing the optimization you suggest feels like a good choice to me

Topic		Replies	Views
Application Rules and Transactions Need Help	0	281	September 12, 2019
Cakephp import users failed due to unique rule for email addresses Need Help	1	391	September 6, 2019
Save 3 million rows best practice Performance	5	2682	September 6, 2018
Creating Unique Rules Need Help	5	659	May 18, 2022
newEntities & saveMany too slow on a server Need Help	5	590	June 18, 2020

Optimization question - mass save and unique email

Releases

Documentation

Community

Optimization question - mass save and unique email

Related topics

Releases

Documentation

Community