collation.txt 9.98 KB
=========
Collation
=========

.. default-domain:: mongodb

.. contents:: On this page
   :local:
   :backlinks: none
   :depth: 2
   :class: singlecol

.. versionadded:: 1.1

Overview
--------

MongoDB 3.4 introduced support for :manual:`collations
</manual/reference/collation/>`, which provide a set of rules to comply with the
conventions of a particular language when comparing strings.

For example, in Canadian French, the last accent in a given word determines the
sorting order. Consider the following French words:

.. code-block:: none

   cote < coté < côte < côté

The sort order using the Canadian French collation would result in the
following:

.. code-block:: none

   cote < côte < coté < côté

If collation is unspecified, MongoDB uses simple binary comparison for strings.
As such, the sort order of the words would be:

.. code-block:: none

    cote < coté < côte < côté

Usage
-----

You can specify a default collation for collections and indexes when they are
created, or specify a collation for CRUD operations and aggregations. For
operations that support collation, MongoDB uses the collection's default
collation unless the operation specifies a different collation.

Collation Parameters
~~~~~~~~~~~~~~~~~~~~

.. code-block:: php

   'collation' => [
       'locale' => <string>,
       'caseLevel' => <boolean>,
       'caseFirst' => <string>,
       'strength' => <integer>,
       'numericOrdering' => <boolean>,
       'alternate' => <string>,
       'maxVariable' => <string>,
       'normalization' => <boolean>,
       'backwards' => <boolean>,
   ]

The only required parameter is ``locale``, which the server parses as an `ICU
format locale ID <http://userguide.icu-project.org/locale>`_. For example, set
``locale`` to ``en_US`` to represent US English or ``fr_CA`` to represent
Canadian French.

For a complete description of the available parameters, see :manual:`Collation
Document </reference/collation/#collation-document>` in the MongoDB manual.

Assign a Default Collation to a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following example creates a new collection called ``contacts`` on the
``test`` database and assigns a default collation with the ``fr_CA`` locale.
Specifying a collation when you create the collection ensures that all
operations involving a query that are run against the ``contacts`` collection
use the ``fr_CA`` collation, unless the query specifies another collation. Any
indexes on the new collection also inherit the default collation, unless the
creation command specifies another collation.

.. code-block:: php

   <?php

   $database = (new MongoDB\Client)->test;

   $database->createCollection('contacts', [
       'collation' => ['locale' => 'fr_CA'],
   ]);

Assign a Collation to an Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To specify a collation for an index, use the ``collation`` option when you
create the index.

The following example creates an index on the ``name`` field of the
``address_book`` collection, with the ``unique`` parameter enabled and a default
collation with ``locale`` set to ``en_US``.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->address_book;

   $collection->createIndex(
       ['first_name' => 1],
       [
           'collation' => ['locale' => 'en_US'],
           'unique' => true,
       ]
   );

To use this index, make sure your queries also specify the same collation. The
following query uses the above index:

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->address_book;

   $cursor = $collection->find(
       ['first_name' => 'Adam'],
       [
           'collation' => ['locale' => 'en_US'],
       ]
   );

The following queries do **NOT** use the index. The first query uses no
collation, and the second uses a collation with a different ``strength`` value
than the collation on the index.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->address_book;

   $cursor1 = $collection->find(['first_name' => 'Adam']);

   $cursor2 = $collection->find(
       ['first_name' => 'Adam'],
       [
           'collation' => [
               'locale' => 'en_US',
               'strength' => 2,
           ],
       ]
   );

Operations that Support Collation
---------------------------------

All reading, updating, and deleting methods support collation. Some examples are
listed below.

``find()`` with ``sort``
~~~~~~~~~~~~~~~~~~~~~~~~

Individual queries can specify a collation to use when matching and sorting
results. The following query and sort operation uses a German collation with the
``locale`` parameter set to ``de``.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->contacts;

   $cursor = $collection->find(
       ['city' => 'New York'],
       [
           'collation' => ['locale' => 'de'],
           'sort' => ['name' => 1],
       ]
   );

``findOneAndUpdate()``
~~~~~~~~~~~~~~~~~~~~~~

A collection called ``names`` contains the following documents:

.. code-block:: javascript

   { "_id" : 1, "first_name" : "Hans" }
   { "_id" : 2, "first_name" : "Gunter" }
   { "_id" : 3, "first_name" : "Günter" }
   { "_id" : 4, "first_name" : "Jürgen" }

The following :phpmethod:`findOneAndUpdate()
<MongoDB\\Collection::findOneAndUpdate>` operation on the collection does not
specify a collation.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->names;

   $document = $collection->findOneAndUpdate(
       ['first_name' => ['$lt' =-> 'Gunter']],
       ['$set' => ['verified' => true]]
   );

Because ``Gunter`` is lexically first in the collection, the above operation
returns no results and updates no documents.

Consider the same :phpmethod:`findOneAndUpdate()
<MongoDB\\Collection::findOneAndUpdate>` operation but with a collation
specified, which uses the locale ``de@collation=phonebook``.

.. note::

   Some locales have a ``collation=phonebook`` option available for use with
   languages which sort proper nouns differently from other words. According to
   the ``de@collation=phonebook`` collation, characters with umlauts come before
   the same characters without umlauts.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->names;

   $document = $collection->findOneAndUpdate(
       ['first_name' => ['$lt' =-> 'Gunter']],
       ['$set' => ['verified' => true]],
       [
           'collation' => ['locale' => 'de@collation=phonebook'],
           'returnDocument' => MongoDB\Operation\FindOneAndUpdate::RETURN_DOCUMENT_AFTER,
       ]
   );

The operation returns the following updated document:

.. code-block:: javascript

   { "_id" => 3, "first_name" => "Günter", "verified" => true }

``findOneAndDelete()``
~~~~~~~~~~~~~~~~~~~~~~

Set the ``numericOrdering`` collation parameter to ``true`` to compare numeric
strings by their numeric values.

The collection ``numbers`` contains the following documents:

.. code-block:: javascript

   { "_id" : 1, "a" : "16" }
   { "_id" : 2, "a" : "84" }
   { "_id" : 3, "a" : "179" }

The following example matches the first document in which field ``a`` has a
numeric value greater than 100 and deletes it.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->numbers;

   $document = $collection->findOneAndDelete(
       ['a' => ['$gt' =-> '100']],
       [
           'collation' => [
               'locale' => 'en',
               'numericOrdering' => true,
           ],
       ]
   );

After the above operation, the following documents remain in the collection:

.. code-block:: javascript

   { "_id" : 1, "a" : "16" }
   { "_id" : 2, "a" : "84" }

If you perform the same operation without collation, the server deletes the
first document it finds in which the lexical value of ``a`` is greater than
``"100"``.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->numbers;

   $document = $collection->findOneAndDelete(['a' => ['$gt' =-> '100']]);

After the above operation is executed, the document in which ``a`` was equal to
``"16"`` has been deleted, and the following documents remain in the collection:

.. code-block:: javascript

   { "_id" : 2, "a" : "84" }
   { "_id" : 3, "a" : "179" }

``deleteMany()``
~~~~~~~~~~~~~~~~

You can use collations with all the various CRUD operations which exist in the
|php-library|.

The collection ``recipes`` contains the following documents:

.. code-block:: javascript

   { "_id" : 1, "dish" : "veggie empanadas", "cuisine" : "Spanish" }
   { "_id" : 2, "dish" : "beef bourgignon", "cuisine" : "French" }
   { "_id" : 3, "dish" : "chicken molé", "cuisine" : "Mexican" }
   { "_id" : 4, "dish" : "chicken paillard", "cuisine" : "french" }
   { "_id" : 5, "dish" : "pozole verde", "cuisine" : "Mexican" }

Setting the ``strength`` parameter of the collation document to ``1`` or ``2``
causes the server to disregard case in the query filter. The following example
uses a case-insensitive query filter to delete all records in which the
``cuisine`` field matches ``French``.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->recipes;

   $collection->deleteMany(
       ['cuisine' => 'French'],
       [
           'collation' => [
               'locale' => 'en_US',
               'strength' => 1,
           ],
       ]
   );

After the above operation runs, the documents with ``_id`` values of ``2`` and
``4`` are deleted from the collection.

Aggregation
~~~~~~~~~~~

To use collation with an :phpmethod:`aggregate()
<MongoDB\\Collection::aggregate>` operation, specify a collation in the
aggregation options.

The following aggregation example uses a collection called ``names`` and groups
the ``first_name`` field together, counts the total number of results in each
group, and sorts the results by German phonebook order.

.. code-block:: php

   <?php

   $collection = (new MongoDB\Client)->test->recipes;

   $cursor = $collection->aggregate(
       [
           ['$group' => ['_id' => '$first_name', 'name_count' => ['$sum' => 1]]],
           ['$sort' => ['_id' => 1]],
       ],
       [
           'collation' => ['locale' => 'de@collation=phonebook'],
       ]
   );