Group Data with the Attribute Pattern
The attribute pattern is a schema design pattern that helps organize documents with many similar fields, especially when the fields share common characteristics. If you need to sort or query on these subsets of similar fields, the attribute pattern can optimize your schema. It creates easier document indexing by consolidating multiple similar fields per document into a key-value sub-document. Rather than creating multiple indexes on several similar fields, the attribute pattern enables you to create fewer indexes, making your queries faster and simpler to write.
Use the attribute pattern if any of the following conditions apply to your collection:
You have large documents containing many similar fields with shared characteristics that you want to sort or query.
A small subset of the documents contain the fields required for sorting.
About this Task
Consider a collection of movies. Typical documents in the collection may look like this:
db.movies.insertOne( { "_id": 1, "title": "Star Wars", "runtime": 121, "directors": ["George Lucas"], release_US: ISODate("1977-05-20T01:00:00+01:00"), release_France: ISODate("1977-10-19T01:00:00+01:00"), release_Italy: ISODate("1977-10-20T01:00:00+01:00"), release_UK: ISODate("1977-12-27T01:00:00+01:00") } )
Note the multiple release date fields for different countries in the
above document. If you want to search for a release date, you must look
across many fields at once. Without the attribute pattern, you would
need to create several indexes on the movies
collection
to quickly perform searches on release dates:
db.movies.createIndex({ release_US: 1 }); db.movies.createIndex({ release_France: 1 }); db.movies.createIndex({ release_Italy: 1 }); db.movies.createIndex({ release_UK: 1 });
However, indexes are expensive and can slow down performance, particularly
for write operations. The following procedure demonstrates how you can apply
the attribute pattern to the movies
collection by moving the information
subset of different release dates into an array, reducing indexing needs.
Steps
Group subsets of data into one array.
Reorganize the schema to turn the various release date fields into an array of key-value pairs:
db.movies.insertOne( { "_id": 1, "title": "Star Wars", "runtime": 121, "directors": ["George Lucas"], releases: [ { location: "USA", date: ISODate("1977-05-20T01:00:00+01:00") }, { location: "France", date: ISODate("1977-10-19T01:00:00+01:00") }, { location: "Italy", date: ISODate("1977-10-20T01:00:00+01:00") }, { location: "UK", date: ISODate("1977-12-27T01:00:00+01:00") } ] } )
Results
If a document has multiple fields that track the same or similar characteristics, the attribute pattern avoids the need to create indexes on each similar field. By consolidating similar fields into an array and creating an index on that array, you reduce the total number of required indexes and improve query performance.
Other Use Cases
The attribute pattern can be helpful when your document describes the characteristics of items. Some products, such as clothing, may have sizes that are expressed in small, medium, or large. Other products in the same collection may be expressed in volume, while others may be expressed in physical dimensions or weight.
For example, consider a collection of bottles of water. A document that does not use the attribute pattern may look like this:
db.bottles.insertOne([ { "_id": 1, "volume_ml": 500, "volume_ounces": 12 } ])
The following code applies the attribute pattern to the bottles
collection:
db.bottles.insertOne([ { "_id": 1, specs: [ { k: "volume", v: "500", u: "ml" }, { k: "volume", v: "12", u: "ounces" }, ] } ])
Since the volume_ml
and volume_ounces
fields in the first document
contain similar information, the schema above consolidates them into
one field, specs
. The specs
field groups together information
regarding measurement specifications of a given water bottle, where the
k
field specifies what is being measured, v
specifies the value,
and u
specifies the unit of measurement.
The attribute pattern also allows you to group together similar fields with
different names. By specifying attributes through key-value pairs,
like the k
field, which specifies what is being measured, you can
store a wider variety of similar fields into one array, minimizing
the number of indexes you need to efficiently query your data.
For example, consider this document in the bottles
collection that
does not use the attribute pattern. This document stores
specifications on water bottle volume and height:
db.bottles.insertOne([ { "_id": 1, "volume_ml": 500, "volume_ounces": 12, "height_inches": 8 } ])
The following code applies the attribute pattern to the
document. It groups the volume_ml
, volume_ounces
,
and height_inches
fields all into the specs
array:
db.bottles.insertOne([ { "_id": 1, specs: [ { k: "volume", v: "500", u: "ml" }, { k: "volume", v: "12", u: "ounces" }, { k: "height", v: "8", u: "inches" } ] } ])
Using key-value pairs, like k
, v
,
and u
, allows for more flexibility in what fields you can
add to the array. The more fields you can consolidate into the array,
the fewer indexes you need to make, maximizing query performance.