How to Randomly Shuffle DataFrame Rows in Pandas
-
pandas.DataFrame.sample()
method to Shuffle DataFrame Rows in Pandas -
numpy.random.permutation()
to Shuffle Pandas DataFrame Rows -
sklearn.utils.shuffle()
to Shuffle Pandas DataFrame Rows
We could use sample()
method of the Pandas DataFrame
objects, permutation()
function from NumPy module and shuffle()
function from sklearn
package to randomly shuffle DataFrame
rows in Pandas.
pandas.DataFrame.sample()
method to Shuffle DataFrame Rows in Pandas
pandas.DataFrame.sample()
can be used to return a random sample of items from an axis of DataFrame object. We set the axis
parameter to 0 as we need to sample elements from row-wise, which is the default value for the axis
parameter.
The frac
parameter determines what fraction of total instances need to be returned. If we wish to shuffle, we set the value of frac
to 1.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]
df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
print(df)
df_shuffled = df.sample(frac=1).reset_index(drop=True)
print(df_shuffled)
Output:
Date Fruit Price
0 April-10 Apple 3
1 April-11 Papaya 1
2 April-12 Banana 2
3 April-13 Mango 4
Date Fruit Price
3 April-13 Mango 4
2 April-12 Banana 2
0 April-10 Apple 3
1 April-11 Papaya 1
Dataframe.shuttle
method shuffles rows of Pandas DataFrame, as shown above. The indices of DataFrame rows keep the same as initial indices.
We could add reset_index()
method to reset the dataframe index.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]
df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
print(df)
df_shuffled = df.sample(frac=1).reset_index(drop=True)
print(df_shuffled)
Output:
Date Fruit Price
0 April-10 Apple 3
1 April-11 Papaya 1
2 April-12 Banana 2
3 April-13 Mango 4
Date Fruit Price
0 April-11 Papaya 1
1 April-13 Mango 4
2 April-10 Apple 3
3 April-12 Banana 2
Here, the drop=True
option prevents the index
column from being added as the new column.
numpy.random.permutation()
to Shuffle Pandas DataFrame Rows
We can use numpy.random.permutation()
to shuffle indices of DataFrame. When the shuffled indices are used to select rows using the iloc()
method, we get randomly shuffled rows.
import pandas as pd
import numpy as np
dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]
df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
df_shuffled = df.iloc[np.random.permutation(df.index)].reset_index(drop=True)
print(df_shuffled)
Output:
Date Fruit Price
0 April-13 Mango 4
1 April-12 Banana 2
2 April-10 Apple 3
3 April-11 Papaya 1
You might get a different result while running the same code. It is because np.random.permutation()
function generates different permutations of numbers each time.
sklearn.utils.shuffle()
to Shuffle Pandas DataFrame Rows
We can also use sklearn.utils.shuffle()
to shuffle rows of Pandas DataFrame.
import pandas as pd
import numpy as np
import sklearn
dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]
df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
df_shuffled = sklearn.utils.shuffle(df)
print(df_shuffled)
Output:
Date Fruit Price
3 April-13 Mango 4
0 April-10 Apple 3
1 April-11 Papaya 1
2 April-12 Banana 2
If you do not have sklearn
package installed in your you can simply install it using the script:
pip install -U scikit-learn
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedInRelated Article - Pandas DataFrame Row
- How to Get the Row Count of a Pandas DataFrame
- How to Filter Dataframe Rows Based on Column Values in Pandas
- How to Iterate Through Rows of a DataFrame in Pandas
- How to Get Index of All Rows Whose Particular Column Satisfies Given Condition in Pandas
- How to Find Duplicate Rows in a DataFrame Using Pandas