How to Build Pandas DataFrame Row by Row
- Create Rows in Pandas DataFrame
-
Using
loc()
Function to Create Rows in Pandas DataFrame -
Using
pandas.concat()
Function to Create Rows in Pandas DataFrame
This article demonstrates how to build a Dataframe row-wise instead of the customarily followed column-wise convention in Pandas.
Create Rows in Pandas DataFrame
Pandas DataFrame is a structure that stores data with two dimensions and the labels corresponding to those dimensions. DataFrames are comparable to SQL tables and spreadsheets that can be manipulated in applications such as Excel and Calc.
Because they are an essential component of the Python and NumPy ecosystems, DataFrames are frequently superior to tables and spreadsheets in terms of speed, usability, and power. This is the case for many applications.
As a data storing structure, based on a specific condition, it may be needed that data needs to be input row by row instead of column by column.
Consider the following code.
import pandas
df = pandas.DataFrame(
columns=["a", "b", "c", "d", "e"], index=["v", "w", "x", "y", "z"]
)
y = {"a": 1, "b": 5, "c": 2, "d": 3, "e": 7}
print("Attempt 1")
# df['y'] = y
# print(df)
print("Attempt 2")
# df.join(y)
The following outputs of each attempt are written separately.
Output (Attempt 1):
Attempt 1
a b c d e y
v NaN NaN NaN NaN NaN NaN
w NaN NaN NaN NaN NaN NaN
x NaN NaN NaN NaN NaN NaN
y NaN NaN NaN NaN NaN NaN
z NaN NaN NaN NaN NaN NaN
Output (Attempt 2):
Traceback (most recent call last):
File "d:\Test\test.py", line 13, in <module>
df.join(y)
File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 9969, in join
return self._join_compat(
File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 10036, in _join_compat
can_concat = all(df.index.is_unique for df in frames)
File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 10036, in <genexpr>
can_concat = all(df.index.is_unique for df in frames)
AttributeError: 'builtin_function_or_method' object has no attribute 'is_unique'
In the code above, first a DataFrame instance is initialized, with columns ['a','b','c','d', 'e']
with indexes ['v', 'w','x','y','z']
. The main objective is to add elements row-wise, which, as evident from the code in our case, is y
.
The data to be input in the input is initialized, with the values corresponding to each column given as {'a':1, 'b':5, 'c':2, 'd':3, 'e': 7}
.
In attempt one, the created data is assigned to the DataFrame by setting it to the index y
, using df[y]
. But, as seen from the output, a new column is created, with all its members being NaN
, as with all the other elements.
In the second attempt, the join()
method is used to try and join the declared data to the DataFrame itself, which also gives an error, the "builtin_function_or_method' object has no attribute 'is_unique'"
. This problem can be approached with the following techniques mentioned below.
- Using
loc()
Function.
- Using
pandas.concat()
Function.
Using loc()
Function to Create Rows in Pandas DataFrame
Consider the following code:
import pandas
df = pandas.DataFrame(
columns=["a", "b", "c", "d", "e"], index=["v", "w", "x", "y", "z"]
)
print("Current Shape:\n" + str(df))
y = {"a": 1, "b": 5, "c": 2, "d": 3, "e": 7}
df.loc["y"] = pandas.Series(y)
print("DataFrame:\n" + str(df))
Output:
Current Shape:
a b c d e
v NaN NaN NaN NaN NaN
w NaN NaN NaN NaN NaN
x NaN NaN NaN NaN NaN
y NaN NaN NaN NaN NaN
z NaN NaN NaN NaN NaN
DataFrame:
a b c d e
v NaN NaN NaN NaN NaN
w NaN NaN NaN NaN NaN
x NaN NaN NaN NaN NaN
y 1 5 2 3 7
z NaN NaN NaN NaN NaN
The loc
property of the DataFrame class is used to access a row or column of a DataFrame. The loc
property allows access to a single or a group of rows and columns and a Boolean array.
In our code, we used the loc
property since the property is label based. Hence we passed the desired label (or index), y
in our case.
Note that the panda.Series()
is to align the input in case you don’t have to specify all the elements.
Using pandas.concat()
Function to Create Rows in Pandas DataFrame
Consider the following code:
import pandas
df = pandas.DataFrame(columns=["a", "b", "c", "d", "e"], index=[])
print("Current Shape:\n" + str(df))
entry = pandas.DataFrame.from_dict(
{
"a": [1, 6, 11, 16],
"b": [2, 7, 12, 17],
"c": [3, 8, 13, 18],
"d": [4, 9, 14, 19],
"e": [5, 10, 15, 20],
}
)
df = pandas.concat([df, entry])
print("DataFrame:\n" + str(df))
Output:
Current Shape:
Empty DataFrame
Columns: [a, b, c, d, e]
Index: []
DataFrame:
a b c d e
0 1 2 3 4 5
1 6 7 8 9 10
2 11 12 13 14 15
3 16 17 18 19 20
The from_dict()
method, which contains a dictionary containing column names and their corresponding values, is declared, from which a new DataFrame is created. This newly created DataFrame instance is then stored in the variable named entry
, which corresponds to the new elements we want to add to our original DataFrame.
After the DataFrame is created and data is assigned to the DataFrame, we now need to find a way to join the two DataFrame instances. Using the pandas.concat()
method, we can concatenate two DataFrame instances, and the resulting DataFrame is then stored in the first instance.
Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.
LinkedIn