NoSQL or document based databases let us store data without enforcing a schema. Instead of adding rows to tables, we store documents in collections. A document (similar to a row in SQL) stores key value pairs. Think of a document as JSON data.
Imagine a user document:
1
2
3
4
{
  "name": "Cody",
  "address": "Annecy"
}
We can have objects and arrays in our documents. Imagine another user document:
1
2
3
4
5
6
7
{
  "name": {
    "first": "Brian",
    "last": "Smith"
  },
  "address": "Annecy"
}
We would most likely store these documents in a collection named “users”. Collections are used to organize documents. We can think of them as directories.
/users/cody:
1
2
3
4
{
  "name": "Cody",
  "address": "Annecy"
}
/users/briansmith:
1
2
3
4
5
6
7
{
  "name": {
    "first": "Brian",
    "last": "Smith"
  },
  "address": "Annecy"
}
Notice that the two documents DO NOT have the same schema. NoSQL lets us do this because there is no schema enforced. Typically, documents should have the same schema to allow us to query them, but this is up to the developer.
Documents are identified by an id. Notice that the documents have no explicit “id” field. In SQL you would be required to identify a primary key in a table and every row must have a unique primary key. Similarly, in NoSQL every document must have a unique id, but we are not required to store the id in the document. Instead we we reference the document by it’s id.
For this course we will use Cloud Firestore from Firebase. Firestore is a NoSQL document database with several advantage for creating cloud based applications.
There is an existing Firestore DB with single “countries” collection. Here is an example of the /countries/CAN document that contains information about Canada:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
 "altSpellings": ["CA"],
 "area": 9984670,
 "borders": ["USA"],
 "bordersObject": {"USA": true},
 "callingCode": ["1"],
 "capital": "Ottawa",
 "cca2": "CA",
 "cca3": "CAN",
 "ccn3": "124",
 "cioc": "CAN",
 "currency": ["CAD"],
 "demonym": "Canadian",
 "geo": {"lat": 60, "lng": -95},
 "landlocked": false,
 "languages": {"eng": "English", "fra": "French"},
 "latlng": [60, -95],
 "name": {
   "common": "Canada",
   "native": {
     "eng": {"common": "Canada", "official": "Canada"},
     "fra": {"common": "Canada", "official": "Canada"}
   },
   "official": "Canada"
 },
 "region": "Americas",
 "subregion": "Northern America",
 "tld": [".ca"],
 "translations": {
   "cym": {"common": "Canada", "official": "Canada"},
   "deu": {"common": "Kanada", "official": "Kanada"},
   "fin": {"common": "Kanada", "official": "Kanada"},
   "fra": {"common": "Canada", "official": "Canada"},
   "hrv": {"common": "Kanada", "official": "Kanada"},
   "ita": {"common": "Canada", "official": "Canada"},
   "jpn": {"common": "カナダ", "official": "カナダ"},
   "nld": {"common": "Canada", "official": "Canada"},
   "por": {"common": "Canadá", "official": "Canadá"},
   "rus": {"common": "Канада", "official": "Канада"},
   "spa": {"common": "Canadá", "official": "Canadá"}
 }
}
Setup a new project to do queries (do not add this to our express.js project):
countries:
    mkdir countries
cd countries
npm init
Press enter in all of the prompts.
Install the firebase client library:
npm install firebase@8.10.0 --save
countries.js with the following:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
'use strict';
const firebase = require('firebase');
// Initialize Firebase
firebase.initializeApp({
  apiKey: "AIzaSyAgTDJtAw1EAWlyXZNz0ZT-E7uKklMWIhw",
  projectId: "tetras-nosql",
});
// Access the firestore API
// https://firebase.google.com/docs/reference/js/firebase.firestore.Firestore
const firestore = firebase.firestore();
async function queryCountries() {
  try {
    // YOUR CODE HERE
  } catch(error) {
    console.log(error);
  } finally {
    // Delete the application to close the connections to the DB
    await firebase.app().delete();
  }
}
queryCountries();
The above code initializes the Firebase API to run queries in Firestore. The function queryCountries() is a function where you can add queries by accessing the firestore object.
As discussed above, Firestore documents are organized into collections (similar to a table in SQL). The collection method provides a reference to a collection. The CollectionReference contains several methods for accessing documents stored in that collection.
We can access a Firestore document by id with the doc() method of the CollectionReference. The doc() method returns a DocumentReference.
References, as their names imply, are a reference to a document in the DB. In order to read the contents of the document we have to fetch the document with the get() method. However, get() returns a promise since it is an asynchronous operation. We will use the await keyword access the resolved value of the promise The promise will resolve to a DocumentSnapshot which is the state of the document at the moment it was fetched from the DB.
Add the following code into your queryCountries() function (replacing // YOUR CODE HERE):
1
2
3
4
const countries = firestore.collection('countries');
const franceRef = countries.doc('FRA');
const franceSnapshot = await franceRef.get();
console.log(franceSnapshot.data());
Execute your countries.js script to see the results.
node .\countries.js
We could have replaced the above code with a single line. This is equivalent to the previous code snippet:
1
2
3
4
5
const franceSnapshot2 = await firestore
  .collection('countries')
  .doc('FRA')
  .get();
console.log(franceSnapshot2.data());
area in the FRA document?The DocumentSnapshot has some other interesting properties:
id - Contains the id of the document. Note that the data of the document and the id are not stored in the same place.exists - Indicates if the document exists. This is important because no error is thrown when we attempt to fetch a document that does not exist. Therefore when working with snapshots that were fetched by id, it is a good idea to check the exists flag.ref - The DocumentReference. In our example above, we already have a reference, so it is not very useful. However, if we had executed a query that returns multiple documents, the ref can be used in subsequent operations (such as deleting and updating).We can query a collection using the where() method of the CollectionReference. A simple query would look like this:
1
2
3
const european = await countries
  .where('subregion', '==', 'Western Europe')
  .get();
The first parameter to the where() function is the field or path to query on, followed by an operator, followed by the value to compare with.
Executing a query returns a QuerySnapshot. The QuerySnapshot returns an array of DocumentSnapshots that can access via the docs property. The docs property contains an array of DocumentSnapshots.
There are two other useful properties of a QuerySnapshot:
size - The number of documents in the snapshot.empty - True if there are no documents in the snapshot.Remember, the where() function can use a path to a nested field. For example, the path to the french language is 'languages.fra'.
There are several operators we can use in a where clause:
'<''<=''==''>''>=''array-contains'The 'array-contains' operator checks if an array contains a value.
See the Firestore documentation for a list of all supported operators.
The following query lists the names of all of the countries that are larger than 9000000 m2:
1
2
3
const bigCountries = await countries.where('area', '>', 9000000).get();
const bigNames = bigCountries.docs.map(d => d.data().name.common);
console.log(bigNames);
Where clauses can be combined to provide more specific queries. For example, to query all french speaking countries in Western Europe:
1
2
3
4
5
const frenchWesternEurope = await countries
  .where('languages.fra', '==', 'French')
  .where('subregion', '==', 'Western Europe')
  .get();
console.log(frenchWesternEurope.docs.map(d => d.data().name.common));
⚠️Warning!
The comparison operators '<', '<=', '>', '>=' and 'array-contains' require an index to be configured in Firestore when they are combined with the '==' operator. Indexes are configured by an administrator.
The operators '<', '<=', '>', and '>=' can be repeated with multiple calls to the where function, but only on the same field and 'array-contains' can only be included once in a query.
1
2
3
4
5
const frenchWesternEurope = await countries
  .where('area', '>', 100000)
  .where('subregion', '==', 'Western Europe')
  .get();
console.log(frenchWesternEurope.docs.size);
When writing queries, we can also order and limit the number of documents queried. For example we can fetch the first 3 smallest countries:
1
2
3
4
5
const smallest3 = await countries
  .orderBy('area')
  .limit(3)
  .get();
console.log(smallest3.docs.map(d => d.data().name.common));
In reverse order:
1
2
3
4
5
const biggest3 = await countries
  .orderBy('area', 'desc')
  .limit(3)
  .get();
console.log(biggest3.docs.map(d => d.data().name.common));
orderBy() statements can be repeated for more advanced ordering. Below we can order by subregion, then by area:
1
2
3
4
5
6
7
8
9
10
const largestByRegion = await countries
  .orderBy('subregion')
  .orderBy('area', 'desc')
  .limit(20)
  .get();
console.log(
  largestByRegion.docs.map(d =>
    `${d.data().subregion}, ${d.data().name.common}`
  )
);
Ordering can also be combined with where() criteria to further refine our query. For example, to query the largest countries in Western Europe:
1
2
3
4
5
6
const largestWesternEurope = await countries
  .where('subregion', '==', 'Western Europe')
  .orderBy('area', 'desc')
  .limit(10)
  .get();
console.log(largestWesternEurope.docs.map(d => d.data().name.common));
In a relational database, like SQL, we would organize the different types of objects into separate tables and define relationships between those tables. We can then query data and include the related data with SQL JOIN statements.
In NoSQL we organize our data into collections. However, we are unable to “join” data between several collections.
Normalizing data is the process of minimizing the redundant data. The advantage to Normalized data is that it is easier to maintain integrity and typically reduces the storage cost. This is typically what we do in a SQL database: many tables that are related with primary and foreign keys. This requires developers to write queries with JOIN statements.
De-normalizing is adding redundant data in order to simplify data access. In a NoSQL database, it is not possible to query data from multiple collections in the same query. Therefore, it is often necessary to duplicate data between collections to minimize the number of queries that are required to get the data of interest.
In order to de-normalize data, we must first understand the needs of our application.
Imagine an application that rates restaurants. We have 3 collections:
Users can:
/regions/:region that queries the countries DB  by subregion and lists the results ordered by size (area). Use an ordered html list <ol> to list the country names.A NoSQL database typically has the following advantages:
However, there are also some disadvantages: