Database Collations in Different RDMS

Understanding Database Collation


In a best case scenario, you may never notice your databases collation at all. But it can have a detrimental affect on your databases and user experience if not configured properly. Having collation knowledge ahead of time can help you make higher quality decisions when you are standing up additional database resources. Collation is a decision that you should make early in a new server configuration. Ideally inheriting the default server level collation to all underlying objects would be ideal. Only explicit deviations from that server level collation would need to be considered for individual use cases.


Breaking Down Collation


Collation can also cause different experiences with regards to querying your data out of your database. Probably the most influential piece of a collation format is the case sensitivity indicator. Lets examine the following collation setting : Latin1_General_CI_AI.


Lets break down the above collation string :


Latin1_General = Character encoding or character set used

CI = Case Insensitive, will dictate if query results will return based on column exact matching. I.E. if your case sensitive “Dog” <> “DoG”

AI = Accent Insensitive, will dictate if accents affect the query results I.E. “ü” equals to “u”.

WI = Width Insensitive, although not specified the lack of the WS in the format string dictates width insensitivity.

KS = Kanatype-sensitive, although not specified the lack of the KS in the format string dictates kanatype insensitivity.


A little more description for Width and Kanatatype.


Width Sensitive - Distinguishes between the two types of Japanese kana characters: Hiragana and Katakana. If this option is not selected, SQL Server considers Hiragana and Katakana characters to be equal for sorting purposes


Kanatype Sensitive - Distinguishes between a single-byte character and the same character when represented as a double-byte character. If this option is not selected, SQL Server considers the single-byte and double-byte representation of the same character to be identical for sorting purposes.


Applying Collation


Collation levels can be set at multiple levels, this is agnostic between RDMS but the the general theme is a hierarchy of priority. The hierarchy detailed is as follows : Server > Database > Table > Query, with the caveat of the possibility of schema level collation as well.


Model Results


Ian Fogelman

Ian Fogelman

My name is Ian Fogelman. I like to develop data driven solutions with SQL Server, Python, .NET and predictive analytics.