Group Data with the Attribute Pattern

The attribute pattern is a schema design pattern that helps organize documents with many similar fields, especially when the fields share common characteristics. If you need to sort or query on these subsets of similar fields, the attribute pattern can optimize your schema. It creates easier document indexing by consolidating multiple similar fields per document into a key-value sub-document. Rather than creating multiple indexes on several similar fields, the attribute pattern enables you to create fewer indexes, making your queries faster and simpler to write.

Use the attribute pattern if any of the following conditions apply to your collection:

You have large documents containing many similar fields with shared characteristics that you want to sort or query.
A small subset of the documents contain the fields required for sorting.

About this Task

Consider a collection of movies. Typical documents in the collection may look like this:

db.movies.insertOne(
  {
    "_id": 1,
    "title": "Star Wars",
    "runtime": 121,
    "directors": ["George Lucas"],
    release_US: ISODate("1977-05-20T01:00:00+01:00"),
    release_France: ISODate("1977-10-19T01:00:00+01:00"),
    release_Italy: ISODate("1977-10-20T01:00:00+01:00"),
    release_UK: ISODate("1977-12-27T01:00:00+01:00")
  }
)

Note the multiple release date fields for different countries in the above document. If you want to search for a release date, you must look across many fields at once. Without the attribute pattern, you would need to create several indexes on the movies collection to quickly perform searches on release dates:

db.movies.createIndex({ release_US: 1 });
db.movies.createIndex({ release_France: 1 });
db.movies.createIndex({ release_Italy: 1 });
db.movies.createIndex({ release_UK: 1 });

However, indexes are expensive and can slow down performance, particularly for write operations. The following procedure demonstrates how you can apply the attribute pattern to the movies collection by moving the information subset of different release dates into an array, reducing indexing needs.

Steps

Group subsets of data into one array.

Reorganize the schema to turn the various release date fields into an array of key-value pairs:

db.movies.insertOne(
  {
     "_id": 1,
     "title": "Star Wars",
     "runtime": 121,
     "directors": ["George Lucas"],
     releases: [
       {
         location: "USA",
         date: ISODate("1977-05-20T01:00:00+01:00")
       },
       {
         location: "France",
         date: ISODate("1977-10-19T01:00:00+01:00")
       },
       {
         location: "Italy",
         date: ISODate("1977-10-20T01:00:00+01:00")
       },
       {
         location: "UK",
         date: ISODate("1977-12-27T01:00:00+01:00")
       }
     ]
  }
)

Create an index on the `releases` array.

You can then create one index on the releases array. This index optimizes queries on any of the release date fields for various countries.

db.movies.createIndex({ releases: 1 });

Results

If a document has multiple fields that track the same or similar characteristics, the attribute pattern avoids the need to create indexes on each similar field. By consolidating similar fields into an array and creating an index on that array, you reduce the total number of required indexes and improve query performance.

Other Use Cases

The attribute pattern can be helpful when your document describes the characteristics of items. Some products, such as clothing, may have sizes that are expressed in small, medium, or large. Other products in the same collection may be expressed in volume, while others may be expressed in physical dimensions or weight.

For example, consider a collection of bottles of water. A document that does not use the attribute pattern may look like this:

db.bottles.insertOne([
  {
    "_id": 1,
    "volume_ml": 500,
    "volume_ounces": 12
  }
])

The following code applies the attribute pattern to the bottles collection:

db.bottles.insertOne([
  {
    "_id": 1,
    specs: [
      { k: "volume", v: "500", u: "ml" },
      { k: "volume", v: "12", u: "ounces" },
    ]
  }
])

Since the volume_ml and volume_ounces fields in the first document contain similar information, the schema above consolidates them into one field, specs. The specs field groups together information regarding measurement specifications of a given water bottle, where the k field specifies what is being measured, v specifies the value, and u specifies the unit of measurement.

The attribute pattern also allows you to group together similar fields with different names. By specifying attributes through key-value pairs, like the k field, which specifies what is being measured, you can store a wider variety of similar fields into one array, minimizing the number of indexes you need to efficiently query your data.

For example, consider this document in the bottles collection that does not use the attribute pattern. This document stores specifications on water bottle volume and height:

db.bottles.insertOne([
  {
    "_id": 1,
    "volume_ml": 500,
    "volume_ounces": 12,
    "height_inches": 8
  }
])

The following code applies the attribute pattern to the document. It groups the volume_ml, volume_ounces, and height_inches fields all into the specs array:

db.bottles.insertOne([
  {
    "_id": 1,
    specs: [
      { k: "volume", v: "500", u: "ml" },
      { k: "volume", v: "12", u: "ounces" },
      { k: "height", v: "8", u: "inches" }
    ]
  }
])

Using key-value pairs, like k, v, and u, allows for more flexibility in what fields you can add to the array. The more fields you can consolidate into the array, the fewer indexes you need to make, maximizing query performance.

Back

Outlier Pattern

Polymorphic Data