One to Many Relationship
One to many relationship (1:N)
The 1:N relationship describes a relationship where one side can have more than one relationship
while the reverse relationship can only be single sided.
An example is a Post
where a blog might have many Comments
but a Comment
is only related to a single Post
.
Embedded document pattern
Embedding connected data in a single document can reduce the number of read operations required to obtain data. In general, you should structure your schema so your application receives all of its required information in a single read operation. If the number of embedded document(s)(child documents) are limited to fewer numbers then this can be better options.
For example, User To Accounts
, User To Addresses
, Post To Comments
, can be modeled to embedded document pattern in one to many relationship.
{
"id": "1",
"name": "John Doe",
"accounts": [
{
"bank": "123 fake bank",
"accountNumber": "1231308239",
"accountType": "saving",
"address":"123 fake street",
"createdDate":"2020-06-06",
"valid" :true,
"balance": "100.00"
},
{
"bank": "abc fake bank",
"accountNumber": "1231308239",
"accountType": "saving",
"address":"123 fake street",
"createdDate":"2020-07-06",
"valid" :true,
"balance": "100.00"
}
]
}
However, there are 3 potential problems associated with this approach that one should be aware of.
- The embedded documents might grow larger and consequently growing main document size. NoSql DBs have limit on document size for example 16MB in case of Mongodb, 1 MB in case of google fire store.
- This could result in slow write performance as the embedded documents grows significantly larger in number.
- The third problem is exposed when one tries to perform pagination on the embedded documents
Subset Pattern
In case, we don't need whole data of embedded documents, we can store only the required part of the document which is frequently fetched. For example, we might only need
bank name
andaccount number
while fetchinguser details
. In this caseAccounts
can be a separate collections with all the account related details andUsers
collection can have subset ofaccounts
inside each user details.{
"id":"1",
"name":"John Doe",
"accounts": [
{
"bank":"123 fake bank",
"accountNumber":"1231308239"
},
{
"bank":"abc fake bank",
"accountNumber":"1231308239"
}
]
}
When to use subset pattern
Some cases when properties rarely changes, for example name , dateOfBirth etc , we can denormalize these properties across multiple collections. In the above given example, bank name and account number will rarely changes and we can duplicate these properties across multiple collections to avoid joins and faster fetch.
Document references
This is normalised case of structuring data by adding reference or id of the required document inside other document as a foreign key. This is recommended only when the document being referenced is highly changing. This will impact read performance.
{
"id": "1",
"name": "John Doe",
"accounts": [1234,345334,785124132]
}
Linking
The another approach is to link Accounts
to the User
using a more traditional foreign key.
{
"id": "1",
"name": "John Doe",
"address":"fake street 123"
}
{
"id": "1",
"userId: "1",
"bank": "123 fake bank",
"accountNumber": "1231308239"
},
{
"id": "2",
"userId: "1",
"bank": "abc fake bank",
"accountNumber": "1231308239"
}
An advantage this model has is that additional Accounts
will not grow the original User
document, making it less likely that the applications will run in the maximum document size limit.
It’s also much easier to return paginated documents as the application can slice and dice the documents more easily.
On the downside if we have 1000 accounts on a user, we would need to retrieve all 1000 documents causing a lot of reads from the database.
Bucketing
The third approach is a hybrid of the two above. Basically, it tries to balance the rigidity of the embedding strategy with the flexibility of the linking strategy.
Lets take example of Post
and Comments
. We will split the comments into buckets with a maximum of 50 comments in each bucket.
{
_id: 1,
title: "An awesome blog",
url: "http://awesomeblog.com",
text: "This is an awesome blog we have just started"
}
{
blog_entry_id: 1,
page: 1,
count: 50,
comments: [{
name: "Peter Critic",
created_on: ISODate("2014-01-01T10:01:22Z"),
comment: "Awesome blog post"
}, ...]
}
{
blog_entry_id: 1,
page: 2,
count: 1,
comments: [{
name: "John Page",
created_on: ISODate("2014-01-01T11:01:22Z"),
comment: "Not so awesome blog"
}]
}
The main benefit of using buckets in this case is that we can perform a single read to fetch 50 comments at a time, allowing for efficient pagination.
When to use bucketing
When you have the possibility of splitting up your documents into discrete batches, it makes sense to consider bucketing to speed up document retrieval.
Typical cases where bucketing is appropriate are ones such as bucketing data by hours, days or number of entries on a page (like comments pagination).