# Select/create a db
> use training
test
switched to db training
# Create a collection
> db.createCollection("bigdata")
training1 }
{ ok:
# Insert documents
> use training
trainingfor (i=1;i<=200000;i++){print(i);db.bigdata.insert({"account_no":i,"balance":Math.round(Math.random()*1000000)})}
# Count of documents inserted
> db.bigdata.countDocuments()
training200000
# Let's time a query: the details of account number 58982
> db.bigdata.find({"account_no":58982}).explain("executionStats").executionStats.executionTimeMillis
training57
# Create an index on the field account_no
> db.bigdata.createIndex({"account_no":1})
training
account_no_1
# List all the indexes in the db
> db.bigdata.getIndexes()
training
[2, key: { _id: 1 }, name: '_id_', ns: 'training.bigdata' },
{ v:
{2,
v: 1 },
key: { account_no: 'account_no_1',
name: 'training.bigdata'
ns:
}
]
# Run a Query to test the time consumed
> db.bigdata.find({"account_no": 69271}).explain("executionStats").executionStats.executionTimeMillis
training0
# Delete the index we created above
> db.bigdata.dropIndex({"account_no":1})
training2, ok: 1 } { nIndexesWas:
Indexes
Indexes help quickly locate data without looking for it everywhere.
Indexing the fields that are frequently queried or used for sorting helps to improve query execution times. However, ensuring that the queries support appropriate indexes and avoid creating extra indexes is vital. This helps to impact the right performance and save storage space. Also, keep in mind that indexing is a continuous process; however, queries need to be monitored and updated based on the business requirements.
For example:
Until just a few years ago, telephone books were really popular. All the names were indexed by surname and first name. So, to find one person, you jump into their surname section. D for Doe, then you go to first names starting with J , and you can easily find John Doe.
Instead of scanning the whole collection, we create an index on the field ‘courseId’ in the course enrollment collection. ‘courseId : 1’ means store the index in ascending order.
Our index on ‘courseId’ will help find those documents more efficiently. But to organize the students by ‘studentId’ in ascending order, MongoDB will need to perform in-memory sort, which will not be efficient.
In this situation, we can change our index to be a compound index, which means indexing more than one field. Because the items in the index are organized in ascending order, once we find all documents with a matching ‘courseId’ of ‘1547’, they will all be already sorted by ‘studentId’ because of this index.
- Indexes in MongoDB are special data structures. They store the fields you are indexing, and a location where that document is stored on disk.
- MongoDB stores indexes in a tree form: a balanced tree to be precise.
- Imagine our last example: ‘courseId,’ being the first field in our compound index, is in ascending order.
- Under each tree node, you will have students in ascending order too.
- This makes finding documents much more efficient, whether it’s an equality or a range search.
- And if you are sorting on a field which is already indexed, MongoDB doesn’t need to sort it again.
Create Instance
- I’ll be using MongoDB on the cloud
- So we start by opening the MongoDB page in IDE
- Create an instance
Select DB
- We’ll be using the same db training that we created in the earlier examples
Create a Collection
- Create a new collection bigdata
Insert Document
- The code given below will insert 200000 documents into the ‘bigdata’ collection.
- Each document would have a field named account_no which is a simple auto increment number.
- And a field named balance which is a randomly generated number, to simulate the bank balance for the account
Count # of Documents
- Let’s verify the number of documents inserted
db.bigdata.countDocuments()
Time a Query
- Let us run a query and find out how much time it takes to complete.
- Let us query for the details of account number 58982
- We will make use of the explain function to find the time taken to run the query in milliseconds.
db.bigdata.find({"account_no":58982}).explain("executionStats").executionStats.executionTimeMillis
Create an Index
- Before you create an index, choose the field you wish to create an index on.
- It is usually the field that you query most
- Create an index on the field account_no.
db.bigdata.createIndex({"account_no":1})
List all the indexes
db.bigdata.getIndexes()
Time a Query
- Let us run a similar query and find out how much time it takes to complete.
- Let us query for the details of account number 69271
- We will make use of the explain function to find the time taken to run the query in milliseconds.
db.bigdata.find({"account_no": 69271}).explain("executionStats").executionStats.executionTimeMillis
Delete an Index
- Let’s delete the index we created above
db.bigdata.dropIndex({"account_no":1})