Introduction to MongoDB $substrCP Operator
$substrCP
is a string aggregation operator in MongoDB used to extract a substring from a string by code point. Code point refers to the unique numeric identifier of each character in the Unicode code table.
Syntax
The syntax of the $substrCP
operator is as follows:
{ $substrCP: [ <string>, <startingIndex>, <length> ] }
<string>
: The string from which to extract the substring.<startingIndex>
: The starting position of the substring to be extracted, counting from 0.<length>
: The length of the substring to be extracted. If not specified, all characters from the starting position to the end of the string are extracted.
Use Cases
The $substrCP
operator is commonly used in the following scenarios:
- Extracting a part of a string, such as extracting the date and time from an email subject.
- Extracting specific code points from a string, such as extracting specific emojis from an emoji expression.
Examples
Here are two examples of using the $substrCP
operator.
Example 1
Assume there is a collection called user
that stores user information, including the first name and last name of each user. Now we need to query the first two characters of each user’s first name. We can use the following aggregation pipeline:
db.user.aggregate([
{
$project: {
firstName: { $substrCP: ["$name", 0, 2] }
}
}
])
In this aggregation pipeline, we first use the $project
operator to project each document in the collection as a document containing only the firstName
field. In the $project
operator, we use the $substrCP
operator to extract the first two characters from the name
field as the value of the firstName
field.
Assume the collection contains the following two documents:
{ "_id": 1, "name": "John Doe" }
{ "_id": 2, "name": "Jane Smith" }
Using the above aggregation pipeline, we get the following results:
{ "_id": 1, "firstName": "Jo" }
{ "_id": 2, "firstName": "Ja" }
Example 2
Assume there is a collection called product
that stores product information, including the name and price of each product. Now we need to query the first three characters of each product name starting from the second character. We can use the following aggregation pipeline:
db.product.aggregate([
{
$project: {
namePrefix: { $substrCP: ["$name", 1, 3] }
}
}
])
In this aggregation pipeline, we also use the $project
operator to project each document in the collection as a document containing only the namePrefix
field. In the $project
operator, we use the $substrCP
operator to extract the first three characters from the name
field starting from the second character as the value of the namePrefix
field.
Assume the collection contains the following two documents:
{ "_id": 1, "name": "Apple iPhone 13", "price": 999 }
{ "_id": 2, "name": "Samsung Galaxy S21", "price": 799 }
Using the above aggregation pipeline, we get the following results:
{ "_id": 1, "namePrefix": "ppl" }
{ "_id": 2, "namePrefix": "ams" }
Conclusion
The $substrCP
operator is a string aggregation operator in MongoDB, which is used to extract substrings from a string. Unlike the $substrBytes
operator, the $substrCP
operator extracts substrings according to Unicode code points, which ensures correct handling of multibyte characters. In practical application scenarios, $substrCP
operator can be conveniently used to process strings according to specific requirements.