gooderp18绿色标准版
Вы не можете выбрать более 25 тем Темы должны начинаться с буквы или цифры, могут содержать дефисы(-) и должны содержать не более 35 символов.

413 lines
31KB

  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>23.2. Collation Support</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="locale.html" title="23.1. Locale Support" /><link rel="next" href="multibyte.html" title="23.3. Character Set Support" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">23.2. Collation Support</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="locale.html" title="23.1. Locale Support">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="charset.html" title="Chapter 23. Localization">Up</a></td><th width="60%" align="center">Chapter 23. Localization</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="multibyte.html" title="23.3. Character Set Support">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="COLLATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">23.2. Collation Support</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="collation.html#id-1.6.10.4.4">23.2.1. Concepts</a></span></dt><dt><span class="sect2"><a href="collation.html#COLLATION-MANAGING">23.2.2. Managing Collations</a></span></dt></dl></div><a id="id-1.6.10.4.2" class="indexterm"></a><p>
  3. The collation feature allows specifying the sort order and character
  4. classification behavior of data per-column, or even per-operation.
  5. This alleviates the restriction that the
  6. <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code> settings
  7. of a database cannot be changed after its creation.
  8. </p><div class="sect2" id="id-1.6.10.4.4"><div class="titlepage"><div><div><h3 class="title">23.2.1. Concepts</h3></div></div></div><p>
  9. Conceptually, every expression of a collatable data type has a
  10. collation. (The built-in collatable data types are
  11. <code class="type">text</code>, <code class="type">varchar</code>, and <code class="type">char</code>.
  12. User-defined base types can also be marked collatable, and of course
  13. a domain over a collatable data type is collatable.) If the
  14. expression is a column reference, the collation of the expression is the
  15. defined collation of the column. If the expression is a constant, the
  16. collation is the default collation of the data type of the
  17. constant. The collation of a more complex expression is derived
  18. from the collations of its inputs, as described below.
  19. </p><p>
  20. The collation of an expression can be the <span class="quote">“<span class="quote">default</span>”</span>
  21. collation, which means the locale settings defined for the
  22. database. It is also possible for an expression's collation to be
  23. indeterminate. In such cases, ordering operations and other
  24. operations that need to know the collation will fail.
  25. </p><p>
  26. When the database system has to perform an ordering or a character
  27. classification, it uses the collation of the input expression. This
  28. happens, for example, with <code class="literal">ORDER BY</code> clauses
  29. and function or operator calls such as <code class="literal">&lt;</code>.
  30. The collation to apply for an <code class="literal">ORDER BY</code> clause
  31. is simply the collation of the sort key. The collation to apply for a
  32. function or operator call is derived from the arguments, as described
  33. below. In addition to comparison operators, collations are taken into
  34. account by functions that convert between lower and upper case
  35. letters, such as <code class="function">lower</code>, <code class="function">upper</code>, and
  36. <code class="function">initcap</code>; by pattern matching operators; and by
  37. <code class="function">to_char</code> and related functions.
  38. </p><p>
  39. For a function or operator call, the collation that is derived by
  40. examining the argument collations is used at run time for performing
  41. the specified operation. If the result of the function or operator
  42. call is of a collatable data type, the collation is also used at parse
  43. time as the defined collation of the function or operator expression,
  44. in case there is a surrounding expression that requires knowledge of
  45. its collation.
  46. </p><p>
  47. The <em class="firstterm">collation derivation</em> of an expression can be
  48. implicit or explicit. This distinction affects how collations are
  49. combined when multiple different collations appear in an
  50. expression. An explicit collation derivation occurs when a
  51. <code class="literal">COLLATE</code> clause is used; all other collation
  52. derivations are implicit. When multiple collations need to be
  53. combined, for example in a function call, the following rules are
  54. used:
  55. </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
  56. If any input expression has an explicit collation derivation, then
  57. all explicitly derived collations among the input expressions must be
  58. the same, otherwise an error is raised. If any explicitly
  59. derived collation is present, that is the result of the
  60. collation combination.
  61. </p></li><li class="listitem"><p>
  62. Otherwise, all input expressions must have the same implicit
  63. collation derivation or the default collation. If any non-default
  64. collation is present, that is the result of the collation combination.
  65. Otherwise, the result is the default collation.
  66. </p></li><li class="listitem"><p>
  67. If there are conflicting non-default implicit collations among the
  68. input expressions, then the combination is deemed to have indeterminate
  69. collation. This is not an error condition unless the particular
  70. function being invoked requires knowledge of the collation it should
  71. apply. If it does, an error will be raised at run-time.
  72. </p></li></ol></div><p>
  73. For example, consider this table definition:
  74. </p><pre class="programlisting">
  75. CREATE TABLE test1 (
  76. a text COLLATE "de_DE",
  77. b text COLLATE "es_ES",
  78. ...
  79. );
  80. </pre><p>
  81. Then in
  82. </p><pre class="programlisting">
  83. SELECT a &lt; 'foo' FROM test1;
  84. </pre><p>
  85. the <code class="literal">&lt;</code> comparison is performed according to
  86. <code class="literal">de_DE</code> rules, because the expression combines an
  87. implicitly derived collation with the default collation. But in
  88. </p><pre class="programlisting">
  89. SELECT a &lt; ('foo' COLLATE "fr_FR") FROM test1;
  90. </pre><p>
  91. the comparison is performed using <code class="literal">fr_FR</code> rules,
  92. because the explicit collation derivation overrides the implicit one.
  93. Furthermore, given
  94. </p><pre class="programlisting">
  95. SELECT a &lt; b FROM test1;
  96. </pre><p>
  97. the parser cannot determine which collation to apply, since the
  98. <code class="structfield">a</code> and <code class="structfield">b</code> columns have conflicting
  99. implicit collations. Since the <code class="literal">&lt;</code> operator
  100. does need to know which collation to use, this will result in an
  101. error. The error can be resolved by attaching an explicit collation
  102. specifier to either input expression, thus:
  103. </p><pre class="programlisting">
  104. SELECT a &lt; b COLLATE "de_DE" FROM test1;
  105. </pre><p>
  106. or equivalently
  107. </p><pre class="programlisting">
  108. SELECT a COLLATE "de_DE" &lt; b FROM test1;
  109. </pre><p>
  110. On the other hand, the structurally similar case
  111. </p><pre class="programlisting">
  112. SELECT a || b FROM test1;
  113. </pre><p>
  114. does not result in an error, because the <code class="literal">||</code> operator
  115. does not care about collations: its result is the same regardless
  116. of the collation.
  117. </p><p>
  118. The collation assigned to a function or operator's combined input
  119. expressions is also considered to apply to the function or operator's
  120. result, if the function or operator delivers a result of a collatable
  121. data type. So, in
  122. </p><pre class="programlisting">
  123. SELECT * FROM test1 ORDER BY a || 'foo';
  124. </pre><p>
  125. the ordering will be done according to <code class="literal">de_DE</code> rules.
  126. But this query:
  127. </p><pre class="programlisting">
  128. SELECT * FROM test1 ORDER BY a || b;
  129. </pre><p>
  130. results in an error, because even though the <code class="literal">||</code> operator
  131. doesn't need to know a collation, the <code class="literal">ORDER BY</code> clause does.
  132. As before, the conflict can be resolved with an explicit collation
  133. specifier:
  134. </p><pre class="programlisting">
  135. SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
  136. </pre><p>
  137. </p></div><div class="sect2" id="COLLATION-MANAGING"><div class="titlepage"><div><div><h3 class="title">23.2.2. Managing Collations</h3></div></div></div><p>
  138. A collation is an SQL schema object that maps an SQL name to locales
  139. provided by libraries installed in the operating system. A collation
  140. definition has a <em class="firstterm">provider</em> that specifies which
  141. library supplies the locale data. One standard provider name
  142. is <code class="literal">libc</code>, which uses the locales provided by the
  143. operating system C library. These are the locales that most tools
  144. provided by the operating system use. Another provider
  145. is <code class="literal">icu</code>, which uses the external
  146. ICU<a id="id-1.6.10.4.5.2.4" class="indexterm"></a> library. ICU locales can only be
  147. used if support for ICU was configured when PostgreSQL was built.
  148. </p><p>
  149. A collation object provided by <code class="literal">libc</code> maps to a
  150. combination of <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code>
  151. settings, as accepted by the <code class="literal">setlocale()</code> system library call. (As
  152. the name would suggest, the main purpose of a collation is to set
  153. <code class="symbol">LC_COLLATE</code>, which controls the sort order. But
  154. it is rarely necessary in practice to have an
  155. <code class="symbol">LC_CTYPE</code> setting that is different from
  156. <code class="symbol">LC_COLLATE</code>, so it is more convenient to collect
  157. these under one concept than to create another infrastructure for
  158. setting <code class="symbol">LC_CTYPE</code> per expression.) Also,
  159. a <code class="literal">libc</code> collation
  160. is tied to a character set encoding (see <a class="xref" href="multibyte.html" title="23.3. Character Set Support">Section 23.3</a>).
  161. The same collation name may exist for different encodings.
  162. </p><p>
  163. A collation object provided by <code class="literal">icu</code> maps to a named
  164. collator provided by the ICU library. ICU does not support
  165. separate <span class="quote">“<span class="quote">collate</span>”</span> and <span class="quote">“<span class="quote">ctype</span>”</span> settings, so
  166. they are always the same. Also, ICU collations are independent of the
  167. encoding, so there is always only one ICU collation of a given name in
  168. a database.
  169. </p><div class="sect3" id="id-1.6.10.4.5.5"><div class="titlepage"><div><div><h4 class="title">23.2.2.1. Standard Collations</h4></div></div></div><p>
  170. On all platforms, the collations named <code class="literal">default</code>,
  171. <code class="literal">C</code>, and <code class="literal">POSIX</code> are available. Additional
  172. collations may be available depending on operating system support.
  173. The <code class="literal">default</code> collation selects the <code class="symbol">LC_COLLATE</code>
  174. and <code class="symbol">LC_CTYPE</code> values specified at database creation time.
  175. The <code class="literal">C</code> and <code class="literal">POSIX</code> collations both specify
  176. <span class="quote">“<span class="quote">traditional C</span>”</span> behavior, in which only the ASCII letters
  177. <span class="quote">“<span class="quote"><code class="literal">A</code></span>”</span> through <span class="quote">“<span class="quote"><code class="literal">Z</code></span>”</span>
  178. are treated as letters, and sorting is done strictly by character
  179. code byte values.
  180. </p><p>
  181. Additionally, the SQL standard collation name <code class="literal">ucs_basic</code>
  182. is available for encoding <code class="literal">UTF8</code>. It is equivalent
  183. to <code class="literal">C</code> and sorts by Unicode code point.
  184. </p></div><div class="sect3" id="id-1.6.10.4.5.6"><div class="titlepage"><div><div><h4 class="title">23.2.2.2. Predefined Collations</h4></div></div></div><p>
  185. If the operating system provides support for using multiple locales
  186. within a single program (<code class="function">newlocale</code> and related functions),
  187. or if support for ICU is configured,
  188. then when a database cluster is initialized, <code class="command">initdb</code>
  189. populates the system catalog <code class="literal">pg_collation</code> with
  190. collations based on all the locales it finds in the operating
  191. system at the time.
  192. </p><p>
  193. To inspect the currently available locales, use the query <code class="literal">SELECT
  194. * FROM pg_collation</code>, or the command <code class="command">\dOS+</code>
  195. in <span class="application">psql</span>.
  196. </p><div class="sect4" id="id-1.6.10.4.5.6.4"><div class="titlepage"><div><div><h5 class="title">23.2.2.2.1. libc Collations</h5></div></div></div><p>
  197. For example, the operating system might
  198. provide a locale named <code class="literal">de_DE.utf8</code>.
  199. <code class="command">initdb</code> would then create a collation named
  200. <code class="literal">de_DE.utf8</code> for encoding <code class="literal">UTF8</code>
  201. that has both <code class="symbol">LC_COLLATE</code> and
  202. <code class="symbol">LC_CTYPE</code> set to <code class="literal">de_DE.utf8</code>.
  203. It will also create a collation with the <code class="literal">.utf8</code>
  204. tag stripped off the name. So you could also use the collation
  205. under the name <code class="literal">de_DE</code>, which is less cumbersome
  206. to write and makes the name less encoding-dependent. Note that,
  207. nevertheless, the initial set of collation names is
  208. platform-dependent.
  209. </p><p>
  210. The default set of collations provided by <code class="literal">libc</code> map
  211. directly to the locales installed in the operating system, which can be
  212. listed using the command <code class="literal">locale -a</code>. In case
  213. a <code class="literal">libc</code> collation is needed that has different values
  214. for <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code>, or if new
  215. locales are installed in the operating system after the database system
  216. was initialized, then a new collation may be created using
  217. the <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> command.
  218. New operating system locales can also be imported en masse using
  219. the <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.91. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a> function.
  220. </p><p>
  221. Within any particular database, only collations that use that
  222. database's encoding are of interest. Other entries in
  223. <code class="literal">pg_collation</code> are ignored. Thus, a stripped collation
  224. name such as <code class="literal">de_DE</code> can be considered unique
  225. within a given database even though it would not be unique globally.
  226. Use of the stripped collation names is recommended, since it will
  227. make one less thing you need to change if you decide to change to
  228. another database encoding. Note however that the <code class="literal">default</code>,
  229. <code class="literal">C</code>, and <code class="literal">POSIX</code> collations can be used regardless of
  230. the database encoding.
  231. </p><p>
  232. <span class="productname">PostgreSQL</span> considers distinct collation
  233. objects to be incompatible even when they have identical properties.
  234. Thus for example,
  235. </p><pre class="programlisting">
  236. SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
  237. </pre><p>
  238. will draw an error even though the <code class="literal">C</code> and <code class="literal">POSIX</code>
  239. collations have identical behaviors. Mixing stripped and non-stripped
  240. collation names is therefore not recommended.
  241. </p></div><div class="sect4" id="id-1.6.10.4.5.6.5"><div class="titlepage"><div><div><h5 class="title">23.2.2.2.2. ICU Collations</h5></div></div></div><p>
  242. With ICU, it is not sensible to enumerate all possible locale names. ICU
  243. uses a particular naming system for locales, but there are many more ways
  244. to name a locale than there are actually distinct locales.
  245. <code class="command">initdb</code> uses the ICU APIs to extract a set of distinct
  246. locales to populate the initial set of collations. Collations provided by
  247. ICU are created in the SQL environment with names in BCP 47 language tag
  248. format, with a <span class="quote">“<span class="quote">private use</span>”</span>
  249. extension <code class="literal">-x-icu</code> appended, to distinguish them from
  250. libc locales.
  251. </p><p>
  252. Here are some example collations that might be created:
  253. </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">de-x-icu</code></span></dt><dd><p>German collation, default variant</p></dd><dt><span class="term"><code class="literal">de-AT-x-icu</code></span></dt><dd><p>German collation for Austria, default variant</p><p>
  254. (There are also, say, <code class="literal">de-DE-x-icu</code>
  255. or <code class="literal">de-CH-x-icu</code>, but as of this writing, they are
  256. equivalent to <code class="literal">de-x-icu</code>.)
  257. </p></dd><dt><span class="term"><code class="literal">und-x-icu</code> (for <span class="quote">“<span class="quote">undefined</span>”</span>)</span></dt><dd><p>
  258. ICU <span class="quote">“<span class="quote">root</span>”</span> collation. Use this to get a reasonable
  259. language-agnostic sort order.
  260. </p></dd></dl></div><p>
  261. </p><p>
  262. Some (less frequently used) encodings are not supported by ICU. When the
  263. database encoding is one of these, ICU collation entries
  264. in <code class="literal">pg_collation</code> are ignored. Attempting to use one
  265. will draw an error along the lines of <span class="quote">“<span class="quote">collation "de-x-icu" for
  266. encoding "WIN874" does not exist</span>”</span>.
  267. </p></div></div><div class="sect3" id="COLLATION-CREATE"><div class="titlepage"><div><div><h4 class="title">23.2.2.3. Creating New Collation Objects</h4></div></div></div><p>
  268. If the standard and predefined collations are not sufficient, users can
  269. create their own collation objects using the SQL
  270. command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a>.
  271. </p><p>
  272. The standard and predefined collations are in the
  273. schema <code class="literal">pg_catalog</code>, like all predefined objects.
  274. User-defined collations should be created in user schemas. This also
  275. ensures that they are saved by <code class="command">pg_dump</code>.
  276. </p><div class="sect4" id="id-1.6.10.4.5.7.4"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.1. libc Collations</h5></div></div></div><p>
  277. New libc collations can be created like this:
  278. </p><pre class="programlisting">
  279. CREATE COLLATION german (provider = libc, locale = 'de_DE');
  280. </pre><p>
  281. The exact values that are acceptable for the <code class="literal">locale</code>
  282. clause in this command depend on the operating system. On Unix-like
  283. systems, the command <code class="literal">locale -a</code> will show a list.
  284. </p><p>
  285. Since the predefined libc collations already include all collations
  286. defined in the operating system when the database instance is
  287. initialized, it is not often necessary to manually create new ones.
  288. Reasons might be if a different naming system is desired (in which case
  289. see also <a class="xref" href="collation.html#COLLATION-COPY" title="23.2.2.3.3. Copying Collations">Section 23.2.2.3.3</a>) or if the operating system has
  290. been upgraded to provide new locale definitions (in which case see
  291. also <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.91. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a>).
  292. </p></div><div class="sect4" id="id-1.6.10.4.5.7.5"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.2. ICU Collations</h5></div></div></div><p>
  293. ICU allows collations to be customized beyond the basic language+country
  294. set that is preloaded by <code class="command">initdb</code>. Users are encouraged
  295. to define their own collation objects that make use of these facilities to
  296. suit the sorting behavior to their requirements.
  297. See <a class="ulink" href="http://userguide.icu-project.org/locale" target="_top">http://userguide.icu-project.org/locale</a>
  298. and <a class="ulink" href="http://userguide.icu-project.org/collation/api" target="_top">http://userguide.icu-project.org/collation/api</a> for
  299. information on ICU locale naming. The set of acceptable names and
  300. attributes depends on the particular ICU version.
  301. </p><p>
  302. Here are some examples:
  303. </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');</code></span></dt><dd><p>German collation with phone book collation type</p><p>
  304. The first example selects the ICU locale using a <span class="quote">“<span class="quote">language
  305. tag</span>”</span> per BCP 47. The second example uses the traditional
  306. ICU-specific locale syntax. The first style is preferred going
  307. forward, but it is not supported by older ICU versions.
  308. </p><p>
  309. Note that you can name the collation objects in the SQL environment
  310. anything you want. In this example, we follow the naming style that
  311. the predefined collations use, which in turn also follow BCP 47, but
  312. that is not required for user-defined collations.
  313. </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');</code></span></dt><dd><p>
  314. Root collation with Emoji collation type, per Unicode Technical Standard #51
  315. </p><p>
  316. Observe how in the traditional ICU locale naming system, the root
  317. locale is selected by an empty string.
  318. </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION digitslast (provider = icu, locale = 'en@colReorder=latn-digit');</code></span></dt><dd><p>
  319. Sort digits after Latin letters. (The default is digits before letters.)
  320. </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');</code></span></dt><dd><p>
  321. Sort upper-case letters before lower-case letters. (The default is
  322. lower-case letters first.)
  323. </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=latn-digit');</code></span></dt><dd><p>
  324. Combines both of the above options.
  325. </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');</code></span></dt><dd><p>
  326. Numeric ordering, sorts sequences of digits by their numeric value,
  327. for example: <code class="literal">A-21</code> &lt; <code class="literal">A-123</code>
  328. (also known as natural sort).
  329. </p></dd></dl></div><p>
  330. See <a class="ulink" href="http://unicode.org/reports/tr35/tr35-collation.html" target="_top">Unicode
  331. Technical Standard #35</a>
  332. and <a class="ulink" href="https://tools.ietf.org/html/bcp47" target="_top">BCP 47</a> for
  333. details. The list of possible collation types (<code class="literal">co</code>
  334. subtag) can be found in
  335. the <a class="ulink" href="http://www.unicode.org/repos/cldr/trunk/common/bcp47/collation.xml" target="_top">CLDR
  336. repository</a>.
  337. The <a class="ulink" href="https://ssl.icu-project.org/icu-bin/locexp" target="_top">ICU Locale
  338. Explorer</a> can be used to check the details of a particular locale
  339. definition. The examples using the <code class="literal">k*</code> subtags require
  340. at least ICU version 54.
  341. </p><p>
  342. Note that while this system allows creating collations that <span class="quote">“<span class="quote">ignore
  343. case</span>”</span> or <span class="quote">“<span class="quote">ignore accents</span>”</span> or similar (using the
  344. <code class="literal">ks</code> key), in order for such collations to act in a
  345. truly case- or accent-insensitive manner, they also need to be declared as not
  346. <em class="firstterm">deterministic</em> in <code class="command">CREATE COLLATION</code>;
  347. see <a class="xref" href="collation.html#COLLATION-NONDETERMINISTIC" title="23.2.2.4. Nondeterministic Collations">Section 23.2.2.4</a>.
  348. Otherwise, any strings that compare equal according to the collation but
  349. are not byte-wise equal will be sorted according to their byte values.
  350. </p><div class="note"><h3 class="title">Note</h3><p>
  351. By design, ICU will accept almost any string as a locale name and match
  352. it to the closest locale it can provide, using the fallback procedure
  353. described in its documentation. Thus, there will be no direct feedback
  354. if a collation specification is composed using features that the given
  355. ICU installation does not actually support. It is therefore recommended
  356. to create application-level test cases to check that the collation
  357. definitions satisfy one's requirements.
  358. </p></div></div><div class="sect4" id="COLLATION-COPY"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.3. Copying Collations</h5></div></div></div><p>
  359. The command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> can also be used to
  360. create a new collation from an existing collation, which can be useful to
  361. be able to use operating-system-independent collation names in
  362. applications, create compatibility names, or use an ICU-provided collation
  363. under a more readable name. For example:
  364. </p><pre class="programlisting">
  365. CREATE COLLATION german FROM "de_DE";
  366. CREATE COLLATION french FROM "fr-x-icu";
  367. </pre><p>
  368. </p></div></div><div class="sect3" id="COLLATION-NONDETERMINISTIC"><div class="titlepage"><div><div><h4 class="title">23.2.2.4. Nondeterministic Collations</h4></div></div></div><p>
  369. A collation is either <em class="firstterm">deterministic</em> or
  370. <em class="firstterm">nondeterministic</em>. A deterministic collation uses
  371. deterministic comparisons, which means that it considers strings to be
  372. equal only if they consist of the same byte sequence. Nondeterministic
  373. comparison may determine strings to be equal even if they consist of
  374. different bytes. Typical situations include case-insensitive comparison,
  375. accent-insensitive comparison, as well as comparison of strings in
  376. different Unicode normal forms. It is up to the collation provider to
  377. actually implement such insensitive comparisons; the deterministic flag
  378. only determines whether ties are to be broken using bytewise comparison.
  379. See also <a class="ulink" href="https://unicode.org/reports/tr10" target="_top">Unicode Technical
  380. Standard 10</a> for more information on the terminology.
  381. </p><p>
  382. To create a nondeterministic collation, specify the property
  383. <code class="literal">deterministic = false</code> to <code class="command">CREATE
  384. COLLATION</code>, for example:
  385. </p><pre class="programlisting">
  386. CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
  387. </pre><p>
  388. This example would use the standard Unicode collation in a
  389. nondeterministic way. In particular, this would allow strings in
  390. different normal forms to be compared correctly. More interesting
  391. examples make use of the ICU customization facilities explained above.
  392. For example:
  393. </p><pre class="programlisting">
  394. CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
  395. CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
  396. </pre><p>
  397. </p><p>
  398. All standard and predefined collations are deterministic, all
  399. user-defined collations are deterministic by default. While
  400. nondeterministic collations give a more <span class="quote">“<span class="quote">correct</span>”</span> behavior,
  401. especially when considering the full power of Unicode and its many
  402. special cases, they also have some drawbacks. Foremost, their use leads
  403. to a performance penalty. Also, certain operations are not possible with
  404. nondeterministic collations, such as pattern matching operations.
  405. Therefore, they should be used only in cases where they are specifically
  406. wanted.
  407. </p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="locale.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="charset.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="multibyte.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">23.1. Locale Support </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 23.3. Character Set Support</td></tr></table></div></body></html>
上海开阖软件有限公司 沪ICP备12045867号-1