MongoDB Maximum Document Size
This tutorial describes the default maximum size limit for storing a document in MongoDB. It also educates the alternate solution if the data exceeds the size limit.
We will also learn about the efficient use of the default maximum size limit for a BSON document.
MongoDB Maximum Document Size
In MongoDB, the documents (objects) are stored in BSON format. The BSON (the Binary JSON
) is a binary serialization of the JSON-like documents.
Using this format, we can use different extensions to use the various representation of data types that are not a part of the JSON.
For instance, we have a Date
and BinData
type in BSON that are not available in JSON. According to the MongoDB documentation, the size limit for a single BSON document is 16MB
.
We have the maximum size limit of a document to ensure that one document can’t use the unrestricted amount of RAM or bandwidth during transmission. Remember that we can nest the BSON documents up to 100 levels where each array/object adds one level.
In today’s world, we have data all around us. So, there is a possibility that our data may increase the size limit for a BSON document which is 16 megabytes.
In that case, MongoDB assists us by providing the GridFS
API to store the documents larger than 16MB
.
What Is the GridFS
API
The GridFS
is a MongoDB specification that we can use to store and access the large files exceeding the limit of BSON document (16MB
), for instance, audio, video, or image files. It is similar to the file system for storing files, but the data is stored in MongoDB collections.
The GridFS
API divides the file into chunks and stores every data chunk in a separate document where each document’s size is 255KB
. The GridFS
contains two collections, fs.files
and fs.chunks
by default, storing a file’s metadata and chunks.
Every chunk is recognized by a unique _id
(the ObjectId
) field, while the fs.files
serve as a parent document. The files_id
field in the fs.chunks
document links the chunk to its parent.
You can go through this article to understand the syntax while using GridFS
.
Use Default BSON Document Size Limit Efficiently
The BSON document size limit (16MB
) is a lot. For instance, the whole uncompressed text of the War of the Worlds
is only 364k
(HTML), but exceptions are always there.
If your data exceeds the limit, you can use the GridFS
API that we discussed earlier or make a strategy for efficient use of 16MB
.
Let’s have a scenario where we want to develop an XYZ application. The application needs four data types - Booleans
, numbers
, strings
, and dates
(represented as UNIX ms).
With a 16MB
size limit, MongoDB can easily store around two million values of 64-bit
numbers (dates
and Booleans
as well).
Here, the string
type values need special attention because every UTF-8 character occupies one byte
. We need to optimize the size of all the columns containing string
type values.
We can try the following ways to decrease the size of a column having string
type values.
-
We can use the
stringify()
andzip()
method aszip(JSON.stringify(column.values));
. -
We can create a dictionary and insert all unique
string
type values into the dictionary. Then, replace the string values with indexes.This approach is useful if we have many repeated string values in a field. This method will not help if someone wants to store a column of hashes, but they can use the
GridFS
API. -
We can also split the column into various chunks and save these chunks in some other documents linked to the main document.
There is a reference article demonstrating all these approaches.