How to Iterate Through Columns of a Pandas DataFrame
-
Use the
getitem
([]
) Syntax to Iterate Over Columns in Pandas DataFrame -
Use
dataframe.iteritems()
to Iterate Over Columns in Pandas Dataframe -
Use
enumerate()
to Iterate Over Columns Pandas
DataFrames can be very large and can contain hundreds of rows and columns. It is necessary to iterate over columns of a DataFrame and perform operations on columns individually like regression and many more.
We can use the for
loop to iterate over columns of a DataFrame. The basic syntax of the for
loop is given below:
for value in sequence:
# Body of Loop
We can use multiple methods to run the for
loop over a DataFrame, for example, the getitem syntax (the []
), the dataframe.iteritems()
function, the enumerate()
function and using index of a DataFrame.
Use the getitem
([]
) Syntax to Iterate Over Columns in Pandas DataFrame
We can use column-labels to run the for
loop over the DataFrame using the getitem
syntax([]
). For example:
import pandas as pd
df = pd.DataFrame(
[[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)
print(df)
print("------------------")
for column in df:
print(df[column].values)
Output:
a b c d
0 10 6 7 8
1 1 9 12 14
2 5 8 10 6
------------------
[10 1 5]
[6 9 8]
[ 7 12 10]
[ 8 14 6]
The values()
function is used to extract the object’s elements as a list.
Use dataframe.iteritems()
to Iterate Over Columns in Pandas Dataframe
Pandas provides the dataframe.iteritems()
function, which helps to iterate over a DataFrame and returns the column name and its content as series.
import pandas as pd
df = pd.DataFrame(
[[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)
for (colname, colval) in df.iteritems():
print(colname, colval.values)
Output:
a [10 1 5]
b [6 9 8]
c [ 7 12 10]
d [ 8 14 6]
Use enumerate()
to Iterate Over Columns Pandas
The enumerate()
with DataFrame returns the index and column-label, which allows us to iterate over it.
import pandas as pd
df = pd.DataFrame(
[[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)
for (index, colname) in enumerate(df):
print(index, df[colname].values)
Output:
0 [10 1 5]
1 [6 9 8]
2 [ 7 12 10]
3 [ 8 14 6]
We can very efficiently use any of the above methods to iterate over the DataFrame. We can also run operations like regressions over columns individually. For example, we can set the last column as the independent variable and run OLS regressions with other columns as dependent variables, as shown in the example below:
import pandas as pd
import statsmodels.api as sm
import numpy as np
df = pd.DataFrame(
[[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)
for column in df:
Y = df["d"]
X = df[column]
X = sm.add_constant(X)
model = sm.OLS(X, Y)
results = model.fit()
print(results.params)
Output:
0 1
d 0.094595 0.418919
0 1
d 0.094595 0.75
0 1
d 0.094595 0.959459
0 1
d 0.094595 1.0
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn