How to Fix Error Tokenizing Data C Error in Python
-
What Is the
ParserError: Error tokenizing data. C error
in Python -
How to Fix the
ParserError: Error tokenizing data. C error
in Python -
Skip Rows to Fix the
ParserError: Error tokenizing data. C error
-
Use the Correct Separator to Fix the
ParserError: Error tokenizing data. C error
-
Use
dropna()
to Fix theParserError: Error tokenizing data. C error
-
Use the
fillna()
Function to Fill Up theNaN
Values
When playing with data for any purpose, it is mandatory to clean the data, which means filling the null values and removing invalid entries to clean the data, so it doesn’t affect the results, and the program runs smoothly.
Furthermore, the causes of the ParserError: Error tokenizing data. C error
can be providing the wrong data in the files, like mixed data, a different number of columns, or several data files stored as a single file.
And you can also encounter this error if you read a CSV file as read_csv
but provide different separators and line terminators.
What Is the ParserError: Error tokenizing data. C error
in Python
As discussed, the ParserError: Error tokenizing data. C error
occurs when your Python program parses CSV data but encounters errors like invalid values, null values, unfilled columns, etc.
Let’s say we have this data in the data.csv
file, and we are using it to read with the help of pandas
, although it has an error.
Name,Roll,Course,Marks,CGPA
Ali,1,SE,87,3
John,2,CS,78,
Maria,3,DS,13,,
Code example:
import pandas as pd
pd.read_csv("data.csv")
Output:
ParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 6
As you can see, the above code has thrown a ParserError: Error tokenizing data. C error
while reading data from the data.csv
file, which says that the compiler was expecting 5
fields in line 4
but got 6
instead.
The error itself is self-explanatory; it indicates the exact point of the error and shows the reason for the error, too, so we can fix it.
How to Fix the ParserError: Error tokenizing data. C error
in Python
So far, we have understood the ParserError: Error tokenizing data. C error
in Python; now let’s see how we can fix it.
It is always recommended to clean the data before analyzing it because it may affect the results or fail your program to run.
Data cleansing helps in removing invalid data inputs, null values, and invalid entries; basically, it is a pre-processing stage of the data analysis.
In Python, we have different functions and parameters that help clean the data and avoid errors.
Skip Rows to Fix the ParserError: Error tokenizing data. C error
This is one of the most common techniques that skip the row, causing the error; as you can see from the above data, the last line was causing the error.
Now using the argument on_bad_lines = 'skip'
, it has ignored the buggy row and stored the remaining in data frame df
.
import pandas as pd
df = pd.read_csv("data.csv", on_bad_lines="skip")
df
Output:
Name Roll Course Marks CGPA
0 Ali 1 SE 87 3.0
1 John 2 CS 78 NaN
The above code will skip all those lines causing errors and printing the others; as you can see in the output, the last line is skipping because it was causing the error.
But we are getting the NaN
values that need to be fixed; otherwise, it will affect the results of our statistical analysis.
Use the Correct Separator to Fix the ParserError: Error tokenizing data. C error
Using an invalid separator can also cause the ParserError
, so it is important to use the correct and suitable separator depending on the data you provide.
Sometimes we use tab
to separate the CSV data or space, so it is important to specify that separator in your program too.
import pandas as pd
pd.read_csv("data.csv", sep=",", on_bad_lines="skip", lineterminator="\n")
Output:
Name Roll Course Marks CGPA\r
0 Ali 1 SE 87 3\r
1 John 2 CS 78 \r
The separator is ,
that’s why we have mentioned sep=','
and the lineterminator ='\n'
because our line ends with \n
.
Use dropna()
to Fix the ParserError: Error tokenizing data. C error
The dropna
function is used to drop all the rows that contain any Null
or NaN
values.
import pandas as pd
df = pd.read_csv("data.csv", on_bad_lines="skip")
print(" **** Before dropna ****")
print(df)
print("\n **** After dropna ****")
print(df.dropna())
Output:
**** Before dropna ****
Name Roll Course Marks CGPA
0 Ali 1 SE 87 3.0
1 John 2 CS 78 NaN
**** After dropna ****
Name Roll Course Marks CGPA
0 Ali 1 SE 87 3.0
Since we have only two rows, one row has all the attributes but the second row has NaN
values so the dropna()
function has skip the row with the NaN
value and displayed just a single row.
Use the fillna()
Function to Fill Up the NaN
Values
When you get NaN
values in your data, you can use the fillna()
function to replace other values that use the default value 0
.
Code Example:
import pandas as pd
print(" **** Before fillna ****")
df = pd.read_csv("data.csv", on_bad_lines="skip")
print(df, "\n\n")
print(" **** After fillna ****")
print(df.fillna(0)) # using 0 inplace of NaN
Output:
**** Before fillna ****
Name Roll Course Marks CGPA
0 Ali 1 SE 87 3.0
1 John 2 CS 78 NaN
**** After fillna ****
Name Roll Course Marks CGPA
0 Ali 1 SE 87 3.0
1 John 2 CS 78 0.0
The fillna()
has replaced the NaN
with 0
so we can analyze the data properly.
Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.
LinkedInRelated Article - Python Error
- Can Only Concatenate List (Not Int) to List in Python
- How to Fix Value Error Need More Than One Value to Unpack in Python
- How to Fix ValueError Arrays Must All Be the Same Length in Python
- Invalid Syntax in Python
- How to Fix the TypeError: Object of Type 'Int64' Is Not JSON Serializable
- How to Fix the TypeError: 'float' Object Cannot Be Interpreted as an Integer in Python