How to Check if Column Exists in Pandas
-
Use the
IN
Operator to Check if Column Exists in Pandas -
Use the
NOT IN
Operator to Check if Column Exists in Pandas
This tutorial demonstrates ways to check whether a column exists in a Pandas Dataframe or not in Python. We will use the IN
and NOT IN
operators in Python that can be used to do that.
Use the IN
Operator to Check if Column Exists in Pandas
Dataframe is an arrangement that holds two-dimensional data and their corresponding labels. We can find the column labels using the dataframe.column
attribute.
To ensure whether a column exists or not, we use the IN
expression. However, we need to form a dummy dataframe in Pandas to use the mentioned techniques before we begin.
Here we create a dataframe of students’ performance, with column names Name
, Promoted
, and Marks
.
import pandas as pd
import numpy as np
# Creating dataframe
df = pd.DataFrame()
# Adding columns to the dataframe
df["Name"] = ["John", "Doe", "Bill"]
df["Promoted"] = [True, False, True]
df["Marks"] = [82, 38, 63]
# Getting the dataframe as an output
print(df)
The code gives the following output.
Name Promoted Marks
0 John True 82
1 Doe False 38
2 Bill True 63
Once the dataframe is ready, we can check whether the dataframe contains items or is empty by writing the code given below. For this purpose, we can use two methods.
Either we use the df.empty
function that exists in Pandas, or we can check the length of the dataframe using len(df.index)
.
We have used the Pandas attribute df.empty
in the example below.
if df.empty:
print("DataFrame is empty!")
else:
print("Not empty!")
Since we have inserted data into the column, the output must be Not empty!
.
Not empty!
Now, let’s move on and check whether a column in the Pandas dataframe exists or not using the IN
method. See the code below to see this function in action.
if "Promoted" in df:
print("Yes, it does exist.")
else:
print("No, it does not exist.")
The code gives the following output.
Yes, it does exist.
For more clarity, one can also write it as if 'Promoted' in df.columns:
instead of just writing df
.
Use the NOT IN
Operator to Check if Column Exists in Pandas
Let’s see how to use the NOT IN
attribute to perform the same operation. It functions the other way around, and the output gets inverted due to an added negation in the attribute.
Here is the sample working of the NOT IN
attribute given below.
if "Promoted" not in df.columns:
print("Yes, it does not exist.")
else:
print("No, it does exist.")
The code gives the following output.
No, it does exist.
We have seen how to do it for a single column in a dataframe. Pandas also enable users to check multiple columns within a dataframe.
This helps in quick tasking and helps in categorizing multiple columns simultaneously.
Below is the code snippet to check multiple columns in Pandas dataframe.
if set(["Name", "Promoted"]).issubset(df.columns):
print("Yes, all of them exist.")
else:
print("No")
The code gives the following output.
Yes, all of them exist.
The set([])
can also be constructed using curly braces.
if not {"Name", "Promoted"}.issubset(df.columns):
print("Yes")
else:
print("No")
To which the output will be:
No
These are the possible ways to check for one or more columns in the data. Similarly, we can also perform these functions on readily available data instead of dummy data.
We are only required to import the CSV file using the Python Pandas module through the read_csv
method. If Google Colab is used, import the files
module from google.colab
to upload a data file from a personal system during runtime.