There are very little common things which developer easily skip and choose defaults options. It will create not a problem immedietly but you have to dig your head someday. Recently i faced a same challange with one of my sql query returning wrong results while comparing some hindi phrases.
After hours of mind juggling i finally found this is a problem of mysql collation. A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations. A collation orders characters based on weights.
Now if you see while creating a Mysql Database there is long list of collations available. on php my admin mostly default selected as
utf8_general_ci, which i had selected while developing my web application.
utf8_general_ci is a very simple — and on Unicode, very broken — collation, one that gives incorrect results on general Unicode text. So when i start storing content is unicode (hindi) it start giving wrong results. So i switched to utf8_unicode_ci .
utf8_general_ci is somewhat faster than
utf8_unicode_ci, but less accurate (for sorting). The specific language utf8 encoding (such as
utf8_swedish_ci) contain additional language rules that make them the most accurate to sort for those languages. Most of the time I use
utf8_unicode_ci (I prefer accuracy to small performance improvements), unless I have a good reason to prefer a specific language.
You can read more on specific unicode character sets on the MySQL manual – http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html