How to Insert Pandas Data Frame Into MongoDB Using PyMongo
MongoDB is an open-source document-oriented database that supports flexible, JSON-like documents to store and query data. It uses a dynamic, schemeless query language (DQL) that allows you to express queries in JavaScript.
We can design MongoDB as a backend database for applications that require fast access to changing data and deployments vary over time as web apps and APIs.
Pandas data frame is a class of Python data structures used for data analysis and data manipulation, like tables in Excel or databases with rows and columns. This tutorial explains the insertion of Pandas data frames into MongoDB using PyMongo.
Insert Pandas Data Frame Into MongoDB Using PyMongo
To insert the pandas
data frame to MongoDB, we need to install the below Python libraries.
-
pandas
PS C:\> pip install pandas
-
json
PS C:\> pip install json
-
pymongo
PS C:\> pip install pymongo
Let’s create a client
by running the below code.
Example Code (saved in demo.py
):
from pymongo import MongoClient
def create_connection():
connection = None
try:
connection = MongoClient("mongodb://localhost:27017/")
print("Connection made!!")
except Exception as e:
print(e)
return connection
client = create_connection()
From the Python package pymongo
, we import a class MongoClient
. The above function create_connection()
uses that class to create a connection
by connecting the MongoDB server locally at port number 27017
.
It then returns the connection
to the client
. Let’s run the below code that creates a database named db
.
Example Code (saved in demo.py
):
def create_database(client, db_name):
db = None
try:
db = client[db_name]
print(f"Database {db_name} created!!")
except Exception as e:
print(e)
return db
db_name = "companyDB" # name of your database
db = create_database(client, db_name)
The function create_database()
creates a database named db
by taking client
and db_name
as arguments. In case of any error, this function will print the Exception
without breaking the program.
Now, let’s run the below code to create a collection
.
Example Code (saved in demo.py
):
def create_collection(db, collection_name):
collection = None
try:
collection = db[collection_name]
print(f"Collection {collection_name} created!!!")
except Exception as e:
print(e)
return collection
collection_name = "startups" # name of your collection
collection = create_collection(db, collection_name)
We create a collection
using the above function create_collection()
with the specified name in the provided database. It allows us the insertion of Pandas Data Frame into the MongoDB.
The below code inserts Pandas Data Frame to the created collection
.
Example Code (saved in demo.py
):
import json
import pandas as pd
def insert_records(collection, records):
rows = None
try:
rows = collection.insert_many(records)
print(f"{len(rows.inserted_ids)} records added successfully")
except Exception as e:
print(e)
return rows
df_file = "50_Startups.csv"
df = pd.read_csv(df_file)
records = json.loads(df.T.to_json()).values()
insert_records(collection, records)
To insert the pandas
data frame into MongoDB, first, we have to read it using the pandas
library. By default, MongoDB supports JSON-type files, so we need to convert the data frame to the supported format using the to_json()
function.
The function insert_records()
take db
, collection_name
, and converted data frame records
as arguments and inserts them into the collection
. We use the insert_many()
function to insert multiple records at once.
Finally, as we have inserted the data frame into the database, we need to close the connection by running the code below.
Example Code (saved in demo.py
):
# Close Connection
def close_connection(client):
if client:
client.close()
print("Connection closed!!")
close_connection(client)
The function close_connection()
uses the client.close()
function to close the connection if it exists.
Now, we run the Python file demo.py
as:
PS C:>python demo.py
Output (printed on console):
Whereas data frame inserted into MongoDB is below.