How to Split CSV Into Multiple Files in Python
In this article, we will learn how to split a CSV file into multiple files in Python. We will use Pandas to create a CSV file and split it into multiple other files.
Create a CSV File in Python Using Pandas
To create a CSV in Python using Pandas, it is mandatory to first install Pandas through Command Line Interface (CLI).
pip install pandas
This command will download and install Pandas into your local machine. Using the import
keyword, you can easily import it into your current Python program.
Let’s verify Pandas if it is installed or not.
Code Example:
import pandas as pd
print("The Version of Pandas is: ", pd.__version__)
Output:
The Version of Pandas is: 1.3.5
Now, let’s create a CSV
file.
Code example:
import pandas as pd
# create a data set
data_dict = {
"Roll no": [1, 2, 3, 4, 5, 6, 7, 8],
"Gender": ["Male", "Female", "Female", "Male", "Male", "Female", "Male", "Female"],
"CGPA": [3.5, 3.3, 2.7, 3.8, 2.4, 2.1, 2.9, 3.9],
"English": [76, 77, 85, 91, 49, 86, 66, 98],
"Mathematics": [78, 87, 54, 65, 90, 59, 63, 89],
"Programming": [99, 45, 68, 85, 60, 39, 55, 88],
}
# create a data frame
data = pd.DataFrame(data_dict)
# convert the data frame into a csv file
data.to_csv("studesnts.csv")
# Print the output
print(data)
Output:
Roll no Gender CGPA English Mathematics Programming
0 1 Male 3.5 76 78 99
1 2 Female 3.3 77 87 45
2 3 Female 2.7 85 54 68
3 4 Male 3.8 91 65 85
4 5 Male 2.4 49 90 60
5 6 Female 2.1 86 59 39
6 7 Male 2.9 66 63 55
7 8 Female 3.9 98 89 88
Split a CSV File Into Multiple Files in Python
We have successfully created a CSV
file. Let’s split it into multiple files, but different matrices could be used to split a CSV on the bases of columns or rows.
Split a CSV File Based on Rows
Let’s split a CSV file on the bases of rows in Python.
Code Example:
import pandas as pd
# read DataFrame
data = pd.read_csv("students.csv")
# number of csv files along with the row
k = 2
size = 4
for i in range(k):
df = data[size * i : size * (i + 1)]
df.to_csv(f"students{i+1}.csv", index=False)
file1 = pd.read_csv("students1.csv")
print(file1)
print("\n")
file2 = pd.read_csv("students2.csv")
print(file2)
Output:
Roll no Gender CGPA English Mathematics Programming
0 1 Male 3.5 76 78 99
1 2 Female 3.3 77 87 45
2 3 Female 2.7 85 54 68
3 4 Male 3.8 91 65 85
Roll no Gender CGPA English Mathematics Programming
4 5 Male 2.4 49 90 60
5 6 Female 2.1 86 59 39
6 7 Male 2.9 66 63 55
7 8 Female 3.9 98 89 88
The above code has split the students.csv
file into two multiple files, student1.csv
and student2.csv
. The file is separated row-wise; rows 0 to 3 are stored in student.csv
, and rows 4 to 7 are stored in the student2.csv
file.
Split a CSV File Based on Columns
We can split any CSV file based on column matrices with the help of the groupby()
function. The groupby()
function belongs to the Pandas library and uses group data.
In this case, we are grouping the students
data based on Gender
.
Code example:
import pandas as pd
# read DataFrame
data = pd.read_csv("students.csv")
for (gender), group in data.groupby(["Gender"]):
group.to_csv(f"{gender} students.csv", index=False)
print(pd.read_csv("Male students.csv"))
print("\n")
print(pd.read_csv("Female students.csv"))
Output:
Roll no Gender CGPA English Mathematics Programming
0 1 Male 3.5 76 78 99
1 4 Male 3.8 91 65 85
2 5 Male 2.4 49 90 60
3 7 Male 2.9 66 63 55
Roll no Gender CGPA English Mathematics Programming
0 2 Female 3.3 77 87 45
1 3 Female 2.7 85 54 68
2 6 Female 2.1 86 59 39
3 8 Female 3.9 98 89 88
Conclusion
Splitting data is a useful data analysis technique that helps understand and efficiently sort the data.
In this article, we’ve discussed how to create a CSV file using the Pandas library. In addition, we have discussed the two common data splitting techniques, row-wise and column-wise data splitting.
Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.
LinkedIn