How to Convert Object to Float in Pandas
- Convert an Object-Type Column to Float in Pandas
-
Use the
astype()
Method to Convert Object to Float in Pandas -
Use the
to_numeric()
Function to Convert Object to Float in Pandas -
Use the
apply()
Function With a Lambda Function to Convert an Object to Float in Pandas - Conclusion
Data manipulation is a cornerstone of any data science or analysis endeavor. Often, datasets arrive in formats that require careful preprocessing to unlock their full analytical potential.
One common challenge is converting object-type columns, which may contain numerical information stored as strings, into a more numerical format like floats. Pandas is the go-to library for data manipulation in the Python ecosystem, offering several methods for achieving this conversion.
This tutorial will focus on converting an object-type column to float in Pandas.
Convert an Object-Type Column to Float in Pandas
An object-type column contains a string or a mix of other types, whereas a float contains decimal values. We will work on the following DataFrame in this article.
import pandas as pd
df = pd.DataFrame(
[["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
columns=["a", "b", "c", "d"],
)
print(df)
print("---------------------------")
print(df.info())
The above code first imports the Pandas module and then creates a DataFrame named df
with three rows and four columns labeled 'a'
, 'b'
, 'c'
, and 'd'
. The initial values are a mix of strings and integers.
After creating the DataFrame, it prints the contents of df
and then adds a separator line.
Following that, it prints information about the DataFrame using the info()
method. This provides details like the data types of each column, as well as the number of non-null entries, which is useful for understanding the structure of the dataset.
Output:
a b c d
0 10.0 6 7 8
1 1.0 9 12 14
2 5.0 8 10 6
---------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 3 non-null object
1 b 3 non-null int64
2 c 3 non-null int64
3 d 3 non-null int64
dtypes: int64(3), object(1)
memory usage: 224.0+ bytes
None
Notice the type of column 'a'
, which is of the object
type. We will convert this object to float using pd.to_numeric()
, astype()
, and apply()
functions in Pandas.
convert_objects()
function, which is deprecated and removed.Use the astype()
Method to Convert Object to Float in Pandas
Pandas provide the astype()
method to convert a column to a specific type. We pass float
to the method and set the parameter errors
as 'raise'
, which means it will raise exceptions for invalid values.
Syntax:
DataFrame.astype(dtype, copy=True, errors="raise")
dtype
: The data type that we want to assign to our object.copy
: A Boolean parameter. It returns a copy whenTrue
.errors
: It controls the raising of exceptions on invalid data for the provided data type. It has two options.
3.1.raise
: allows exceptions to be raised.
3.2.ignore
: suppresses exceptions. If an error exists, then it returns the original object.
The following code uses the astype()
method to convert the object to float in Pandas.
import pandas as pd
df = pd.DataFrame(
[["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
columns=["a", "b", "c", "d"],
)
df["a"] = df["a"].astype(float, errors="raise")
print(df.info())
This code imports the Pandas library and aliases it as pd
. Next, it creates a DataFrame called df
using a list of lists, where each inner list represents a row of data.
The columns of the DataFrame are labeled as 'a'
, 'b'
, 'c'
, and 'd'
. The data in column 'a'
is initially stored as strings, but the subsequent line of code attempts to convert them into floating-point numbers using the astype()
method.
The errors = 'raise'
argument means that if there are any issues with the conversion, it will raise an error. Finally, it prints out information about the DataFrame using the info()
method, which provides details like the column data types and memory usage.
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 3 non-null float64
1 b 3 non-null int64
2 c 3 non-null int64
3 d 3 non-null int64
dtypes: float64(1), int64(3)
memory usage: 224.0 bytes
None
This method is efficient and suitable for cases where the data is clean and consistent.
Use the to_numeric()
Function to Convert Object to Float in Pandas
The Pandas to_numeric()
function can be used to convert a list, a series, an array, or a tuple to a numeric datatype, which means signed, or unsigned int
and float
type. It also has the errors
parameter to raise exceptions.
Syntax:
DataFrame.to_numeric(arg, errors="raise", downcast=None)
arg
: It is a scalar, list, tuple, 1-d array, orSeries
. It is the argument that we want to convert to numeric.errors
: It is a string parameter. It has three options:ignore
,raise
, orcoerce
. If it is set toraise
, then an invalid argument will raise an exception. If it is set tocoerce
, then an invalid argument will be set asNaN
. If it is set toignore
, then an invalid argument will return the input.downcast
: It is a string parameter. It has four options:integer
,signed
,unsigned
, orfloat
.
An example of converting the object type to float using to_numeric()
is shown below.
import pandas as pd
df = pd.DataFrame(
[["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
columns=["a", "b", "c", "d"],
)
df["a"] = pd.to_numeric(df["a"], errors="coerce")
print(df.info())
This code first imports the Pandas library and then creates a DataFrame (a table-like data structure) with three rows and four columns labeled 'a'
, 'b'
, 'c'
, and 'd'
. The values in the 'a'
column are initially given as strings, like '10.0'
.
The code then converts the values in the 'a'
column to numeric format using the pd.to_numeric
function. The errors='coerce'
argument is used, which means that if any conversion errors occur (e.g., if a value cannot be converted to a number), those cells will be replaced with NaN
(Not a Number).
Finally, it prints out information about the DataFrame using the df.info()
function, which provides details about the DataFrame, including the data types of each column and the number of non-null entries.
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 3 non-null float64
1 b 3 non-null int64
2 c 3 non-null int64
3 d 3 non-null int64
dtypes: float64(1), int64(3)
memory usage: 224.0 bytes
None
This method provides more flexibility when dealing with messy data.
Use the apply()
Function With a Lambda Function to Convert an Object to Float in Pandas
The apply()
function is a versatile tool in Pandas that allows us to apply a given function along an axis of a DataFrame or a Series. It can be used to transform data in a multitude of ways.
Syntax for DataFrame:
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
func
: This is the function that you want to apply to each element or row/column of the DataFrame or Series.axis
: Specifies the axis along which the function is applied. For DataFrames,0
applies the function to each column, and1
applies it to each row.raw
: A Boolean parameter. If set toTrue
, the function will receive NumPy arrays as input. If set toFalse
, it will receive a Series.result_type
: For DataFrames, you can specify the desired return type (e.g.,'expand'
,'reduce'
,'broadcast'
, orNone
).args
: A tuple of additional arguments to pass to the function being applied.**kwds
: Keyword arguments for the functionfunc
.
Syntax for Series:
Series.apply(func, convert_dtype=True, args=(), **kwds)
convert_dtype
is a Boolean parameter. If set to True
, it tries to infer better data types for the output.
A lambda function, also known as an anonymous function, is a small throwaway function defined without a name. It’s particularly useful for short, one-off operations.
When you use apply()
with a lambda function, it provides an efficient way to perform element-wise operations. Look at the following code as an example.
import pandas as pd
df = pd.DataFrame(
[["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
columns=["a", "b", "c", "d"],
)
# Assuming df is your DataFrame and 'a' is the column to be converted
df["a"] = df["a"].apply(lambda x: float(x) if x.replace(".", "", 1).isdigit() else None)
print(df.info())
This code starts by importing the Pandas library and then creates a DataFrame named df
with three rows and four columns labeled 'a'
, 'b'
, 'c'
, and 'd'
. The values in the 'a'
column are initially given as strings, like '10.0'
.
The code then applies a lambda function to the 'a'
column using df['a'].apply(...)
. This lambda function checks if each value in column 'a'
can be converted to a float.
If it can, it performs the conversion; otherwise, it assigns None
to that cell. The replace('.', '', 1).isdigit()
checks if the value is a valid float representation.
Finally, it prints out information about the DataFrame using df.info()
, which provides details about the DataFrame, including the data types of each column and the number of non-null entries. This code effectively attempts to convert valid string representations of floats in column 'a'
while handling invalid ones appropriately.
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 3 non-null float64
1 b 3 non-null int64
2 c 3 non-null int64
3 d 3 non-null int64
dtypes: float64(1), int64(3)
memory usage: 224.0 bytes
None
This method provides a way to implement custom logic during the conversion process.
Conclusion
This tutorial has extensively covered converting an object-type column to float in Pandas, showcasing three distinct approaches: using the astype()
method, employing the to_numeric()
function, and harnessing the power of the apply()
function coupled with a lambda function.
Always choose the method that best fits your data and use case. Remember, it’s essential to validate your data after conversion to ensure accuracy in your analysis.
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn