Re: Sort using utf characters in mysql or php ? best solutions... [message #172828 is a reply to message #172818] |
Fri, 04 March 2011 08:53 |
alvaro.NOSPAMTHANX
Messages: 277 Registered: September 2010
Karma:
|
Senior Member |
|
|
El 04/03/2011 1:58, SM escribió/wrote:
> Using MySQL, i'm selecting a list of songs in spanish that i would
> like to sort. Here's a list of names that is returned by the query:
>
> ¡Decirevilla!
> Alhambra
> 123 pasitos
> África
> Arroz
> Decir
>
> The sorted list should look like this:
>
> 123 pasitos
> África
> Alhambra
> Arroz
> ¡Decirevilla!
> Decir
"Decir" should come before "¡Decirevilla!" but I get your point.
>
>
> After all of the research i've read, i've concluded that there is no
> reasonable way to achieve this using MySQL. I've tried collation,
> charset, etc... but there is no way the character ¡, ?, etc... can by
> sorted accordingly to my desired result. Even the Á is not sorted the
> way i want to...
These are two different issues. You can absolutely sort Spanish letters
at MySQL with the appropriate rules. You just need to assign a proper
collation to the column: utf8_spanish_ci
CREATE TABLE test (
word VARCHAR(50) NULL
)
COLLATE='utf8_spanish_ci'
ENGINE=InnoDB;
INSERT INTO test (word)
VALUES ('ae'), ('Ád'), ('Ac'), ('áb'), ('aa');
SELECT word FROM test ORDER BY word;
"word"
"aa"
"áb"
"Ac"
"Ád"
"ae"
Compare with:
CREATE TABLE test (
word VARCHAR(50) NULL
)
COLLATE='utf8_bin'
ENGINE=InnoDB;
INSERT INTO test (word)
VALUES ('ae'), ('Ád'), ('Ac'), ('áb'), ('aa');
SELECT word FROM test ORDER BY word;
"word"
"Ac"
"aa"
"ae"
"Ád"
"áb"
The second issue is that you want to ignore character that are not
letters or numbers. Well, just remove them from the string before
comparing. Unluckily, MySQL does not provide regexp based replacement.
So my advice is that you add a secondary column and populate it with PHP:
CREATE TABLE test (
word VARCHAR(50) NULL,
sort_order VARCHAR(50) NULL
)
COLLATE='utf8_spanish_ci'
ENGINE=InnoDB;
INSERT INTO test (word, sort_order) VALUES
('¡Decirevilla!', 'Decirevilla'),
('Alhambra', 'Alhambra'),
('123 pasitos', '123 pasitos'),
('África', 'África'),
('Arroz', 'Arroz'),
('Decir', 'Decir');
SELECT word FROM test ORDER BY sort_order;
"word"
"123 pasitos"
"África"
"Alhambra"
"Arroz"
"Decir"
"¡Decirevilla!"
In order to populate the secondary column, you can use several
approaches, from preg_replace()
(<http://es.php.net/manual/en/function.preg-replace.php>) to
internationalization functions
(<http://es.php.net/manual/en/book.intl.php>).
echo preg_replace('/\\W/iu', '', '¡Hola, mundo! ¿Cómo está usted?');
// Prints: HolamundoCómoestáusted
(adjust the regexp to your liking.)
> Question 1: Is this a reasonable conclusion?
>
> I believe the only way to achieve this is by passing the results to an
> array in php and then sort the array using a custom function... all
> this using the function usort (need to sort by value and i don't care
> about maintaning the key association).
This implies that a query that depends on sort order (e.g., "get first
20 items that start with 'B') will need to retrieve the complete table.
[...]
> Question 3: I have no idea how to create the normalize function to
> sort the names accordingly to my needs. How do i ignore certain
> characters (¡, ?, ', !, ¿) and how do i replace other characters with
> the natural equivalent (Á -> A, É -> E, etc..)
You can use iconv() for that but it's probably not necessary:
http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
|
|
|