I have a non unicode ‘string’, origin can be anything from a filename to a file_get_contents or a form submit.
Trying to save the new entity gives me: "Error: SQLSTATE[HY000]: General error: 1366 Incorrect string value: ‘\xF0\x9F\xBF\xBE\xF0\x9F…’ for column ‘unicode’ at row 1 "
How should I ‘sanitize’ my string?
I found (stackoverflow) this, but am not sure it’s the way to go as this modifies the data, can I escape the data somehow?:
preg_replace(‘/[\x00-\x1F\x80-\xFF]/’, ‘’, $value);
This demonstrates my issue:
MySQL table:
CREATE TABLE `unicode` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
`unicode` text COLLATE utf8_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Code that saves a record:
$obj = TableRegistry::get('Unicode')->newEntity([
'unicode' => ''
]);
TableRegistry::get('Unicode')->save($obj);
Note that my encoding for my datasource is set on utf8.
- Edit *
I also use postgres, and it seems postgres just saves these characters without raising any errors.
So it’s just a problem for mysql - i don’t really want to use the preg_replace as it removes characters.