How to Apply Square Root Function on a Column of Pandas Data Frame
- Introduction to Square Root
- Apply Square Root Function on a Column of Pandas Data Frame
-
Use
.astype(int)
to Determine Integer Square Roots in Pandas
This tutorial teaches how to apply the square root function on a column of Pandas Data Frame using the exponentiation operator, np.sqrt()
, lambda
, and apply()
functions. Further, we will learn how to use .astype(int)
to determine integer square roots.
Introduction to Square Root
Before moving toward the square root, one must understand what a square is and how we can calculate it. Let’s start with that.
In mathematics, we have learned that the square of a number is calculated by multiplying the specified number by itself, for instance, square of 3 = 3x3 = 9
.
The square of any number n
is represented by a superscript 2
, which we can write as n^2
; it must fulfill the following two properties:
- The square of the specified number can be a floating point number or an integer.
- The square of a specified number will always be a positive number because two negative numbers’ product produces a positive number.
Now, we are ready to learn square roots. The square root of n^2
is n
, which is represented as √n
(also represented as n^(1/2)
).
It is useful for various scientific and mathematical functions.
Now we have a strong understanding of square root, let’s learn how we can calculate it using Python, specifically, how we can apply the square root function on a column of the Pandas’ data frame.
Apply Square Root Function on a Column of Pandas Data Frame
We can apply the square root function using various approaches; some of them are given below. To use all of them, we must have a data frame; for example, we have as follows:
import pandas as pd
data = {
"years": [2020, 2021, 2022],
"teams": ["Bears", "Packers", "Lions"],
"wins": [25, 10, 6],
"losses": [5, 5, 16],
}
df = pd.DataFrame(data, columns=["years", "teams", "wins", "losses"])
df["wins+losses"] = df[["wins", "losses"]].sum(axis=1)
df
Here we have a dictionary containing key-value
pairs that will be converted to a Python data frame using pd.DataFrame()
, which takes the data and an array of column names as parameters.
Then, we add a new column to the data frame, wins+losses
, containing the sum of the wins
and losses
columns. To understand it better, observe the following output.
years | teams | wins | losses | wins+losses | |
---|---|---|---|---|---|
0 | 2020 | Bears | 25 | 5 | 30 |
1 | 2021 | Packers | 10 | 5 | 15 |
2 | 2022 | Lions | 6 | 16 | 22 |
This data frame will be used in the following methods, where we will find the square root of the wins
, losses
, and wins+losses
columns.
Method 1: Use Exponentiation Operator to Calculate Square Root
Example Code:
import pandas as pd
data = {
"years": [2020, 2021, 2022],
"teams": ["Bears", "Packers", "Lions"],
"wins": [25, 10, 6],
"losses": [5, 5, 16],
}
df = pd.DataFrame(data, columns=["years", "teams", "wins", "losses"])
df["wins+losses"] = df[["wins", "losses"]].sum(axis=1)
df["sqrt(wins)"] = df[["wins"]] ** 0.5
df["sqrt(losses)"] = df[["losses"]] ** 0.5
df["sqrt(wins+losses)"] = df[["wins+losses"]] ** 0.5
df
Output:
years | teams | wins | losses | wins+losses | sqrt(wins) | sqrt(losses) | sqrt(wins+losses) | |
---|---|---|---|---|---|---|---|---|
0 | 2020 | Bears | 25 | 5 | 30 | 5.000000 | 2.236068 | 5.477226 |
1 | 2021 | Packers | 10 | 5 | 15 | 3.162278 | 2.236068 | 3.872983 |
2 | 2022 | Lions | 6 | 16 | 22 | 2.449490 | 4.000000 | 4.690416 |
The above code iterates over the specified data frame’s column and uses exponentiation (**
), an arithmetic operator known as a power operator.
We have already learned that the square root of the number n
is represented as √n
, which is equal to n^(1/2)
, also written as n**0.5
in Python. Here, n
is being replaced by each value of the specified column of a Pandas data frame.
Method 2: Use np.sqrt()
to Calculate Square Root
Example Code:
import pandas as pd
import numpy as np
data = {
"years": [2020, 2021, 2022],
"teams": ["Bears", "Packers", "Lions"],
"wins": [25, 10, 6],
"losses": [5, 5, 16],
}
df = pd.DataFrame(data, columns=["years", "teams", "wins", "losses"])
df["wins+losses"] = df[["wins", "losses"]].sum(axis=1)
df["sqrt(wins)"] = np.sqrt(df[["wins"]])
df["sqrt(losses)"] = np.sqrt(df[["losses"]])
df["sqrt(wins+losses)"] = np.sqrt(df[["wins+losses"]])
df
Output:
years | teams | wins | losses | wins+losses | sqrt(wins) | sqrt(losses) | sqrt(wins+losses) | |
---|---|---|---|---|---|---|---|---|
0 | 2020 | Bears | 25 | 5 | 30 | 5.000000 | 2.236068 | 5.477226 |
1 | 2021 | Packers | 10 | 5 | 15 | 3.162278 | 2.236068 | 3.872983 |
2 | 2022 | Lions | 6 | 16 | 22 | 2.449490 | 4.000000 | 4.690416 |
This code snippet is using sqrt()
function of the NumPy
library, which takes an array of input values whose square roots have to be determined.
Method 3: Use the lambda
Expression to Calculate Square Root
Example Code:
import pandas as pd
data = {
"years": [2020, 2021, 2022],
"teams": ["Bears", "Packers", "Lions"],
"wins": [25, 10, 6],
"losses": [5, 5, 16],
}
df = pd.DataFrame(data, columns=["years", "teams", "wins", "losses"])
df["wins+losses"] = df[["wins", "losses"]].sum(axis=1)
df["sqrt(wins)"] = df.transform(lambda x: (df[["wins"]] ** 0.5))
df["sqrt(losses)"] = df.transform(lambda x: (df[["losses"]]) ** 0.5)
df["sqrt(wins+losses)"] = df.transform(lambda x: (df[["wins+losses"]]) ** 0.5)
df
Output:
years | teams | wins | losses | wins+losses | sqrt(wins) | sqrt(losses) | sqrt(wins+losses) | |
---|---|---|---|---|---|---|---|---|
0 | 2020 | Bears | 25 | 5 | 30 | 5.000000 | 2.236068 | 5.477226 |
1 | 2021 | Packers | 10 | 5 | 15 | 3.162278 | 2.236068 | 3.872983 |
2 | 2022 | Lions | 6 | 16 | 22 | 2.449490 | 4.000000 | 4.690416 |
Here, we are using the lambda
expression (which is a function) with exponentiation (**
) to determine the square roots of the specified columns. We use lambda
expressions where we prefer the practicality and simplicity of the code.
We also use transform()
method, which calls a function on self
producing a DataFrame
with transformed items/values. It returns a DataFrame
containing the same length as self
.
Method 4: Use apply()
to Calculate Square Root
Example Code:
import pandas as pd
import numpy as np
data = {
"years": [2020, 2021, 2022],
"teams": ["Bears", "Packers", "Lions"],
"wins": [25, 10, 6],
"losses": [5, 5, 16],
}
df = pd.DataFrame(data, columns=["years", "teams", "wins", "losses"])
df["wins+losses"] = df[["wins", "losses"]].sum(axis=1)
df["sqrt(wins)"] = df[["wins"]].apply(np.sqrt)
df["sqrt(losses)"] = df[["losses"]].apply(np.sqrt)
df["sqrt(wins+losses)"] = df[["wins+losses"]].apply(np.sqrt)
df
Output:
years | teams | wins | losses | wins+losses | sqrt(wins) | sqrt(losses) | sqrt(wins+losses) | |
---|---|---|---|---|---|---|---|---|
0 | 2020 | Bears | 25 | 5 | 30 | 5.000000 | 2.236068 | 5.477226 |
1 | 2021 | Packers | 10 | 5 | 15 | 3.162278 | 2.236068 | 3.872983 |
2 | 2022 | Lions | 6 | 16 | 22 | 2.449490 | 4.000000 | 4.690416 |
This code fence is using apply()
method from the Pandas library, which takes np.sqrt
as a parameter and returns a DataFrame
of square-root values.
You may have noticed that all approaches above return square roots as float
values. What if we want them as integer values?
Use .astype(int)
to Determine Integer Square Roots in Pandas
Example Code:
import pandas as pd
import numpy as np
data = {
"years": [2020, 2021, 2022],
"teams": ["Bears", "Packers", "Lions"],
"wins": [25, 10, 6],
"losses": [5, 5, 16],
}
df = pd.DataFrame(data, columns=["years", "teams", "wins", "losses"])
df["wins+losses"] = df[["wins", "losses"]].sum(axis=1)
df["sqrt(wins)"] = df[["wins"]].apply(np.sqrt).astype(int)
df["sqrt(losses)"] = df[["losses"]].apply(np.sqrt).astype(int)
df["sqrt(wins+losses)"] = df[["wins+losses"]].apply(np.sqrt).astype(int)
df
Output:
years | teams | wins | losses | wins+losses | sqrt(wins) | sqrt(losses) | sqrt(wins+losses) | |
---|---|---|---|---|---|---|---|---|
0 | 2020 | Bears | 25 | 5 | 30 | 5 | 2 | 5 |
1 | 2021 | Packers | 10 | 5 | 15 | 3 | 2 | 3 |
2 | 2022 | Lions | 6 | 16 | 22 | 2 | 4 | 4 |
Similarly, we can use .astype(int)
with other approaches. Remember, finding the square of 0
will not cause any error because 0
raised to the power of anything would also be 0
, but you may get ValueError
or NaN
if you try to find the square root of a negative number.