How to Randomly Shuffle DataFrame Rows in Pandas

Suraj Joshi Feb 02, 2024 Pandas Pandas DataFrame Row

pandas.DataFrame.sample()method to Shuffle DataFrame Rows in Pandas
numpy.random.permutation() to Shuffle Pandas DataFrame Rows
sklearn.utils.shuffle() to Shuffle Pandas DataFrame Rows

How to Randomly Shuffle DataFrame Rows in Pandas

We could use sample() method of the Pandas DataFrame objects, permutation() function from NumPy module and shuffle() function from sklearn package to randomly shuffle DataFrame rows in Pandas.

`pandas.DataFrame.sample()`method to Shuffle DataFrame Rows in Pandas

pandas.DataFrame.sample() can be used to return a random sample of items from an axis of DataFrame object. We set the axis parameter to 0 as we need to sample elements from row-wise, which is the default value for the axis parameter.

The frac parameter determines what fraction of total instances need to be returned. If we wish to shuffle, we set the value of frac to 1.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
print(df)

df_shuffled = df.sample(frac=1).reset_index(drop=True)
print(df_shuffled)

Output:

       Date   Fruit  Price
0  April-10   Apple      3
1  April-11  Papaya      1
2  April-12  Banana      2
3  April-13   Mango      4
       Date   Fruit  Price
3  April-13   Mango      4
2  April-12  Banana      2
0  April-10   Apple      3
1  April-11  Papaya      1

Dataframe.shuttle method shuffles rows of Pandas DataFrame, as shown above. The indices of DataFrame rows keep the same as initial indices.

We could add reset_index() method to reset the dataframe index.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})
print(df)

df_shuffled = df.sample(frac=1).reset_index(drop=True)
print(df_shuffled)

Output:

       Date   Fruit  Price
0  April-10   Apple      3
1  April-11  Papaya      1
2  April-12  Banana      2
3  April-13   Mango      4
       Date   Fruit  Price
0  April-11  Papaya      1
1  April-13   Mango      4
2  April-10   Apple      3
3  April-12  Banana      2

Here, the drop=True option prevents the index column from being added as the new column.

`numpy.random.permutation()` to Shuffle Pandas DataFrame Rows

We can use numpy.random.permutation() to shuffle indices of DataFrame. When the shuffled indices are used to select rows using the iloc() method, we get randomly shuffled rows.

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})

df_shuffled = df.iloc[np.random.permutation(df.index)].reset_index(drop=True)
print(df_shuffled)

Output:

       Date   Fruit  Price
0  April-13   Mango      4
1  April-12  Banana      2
2  April-10   Apple      3
3  April-11  Papaya      1

You might get a different result while running the same code. It is because np.random.permutation() function generates different permutations of numbers each time.

`sklearn.utils.shuffle()` to Shuffle Pandas DataFrame Rows

We can also use sklearn.utils.shuffle() to shuffle rows of Pandas DataFrame.

import pandas as pd
import numpy as np
import sklearn

dates = ["April-10", "April-11", "April-12", "April-13"]
fruits = ["Apple", "Papaya", "Banana", "Mango"]
prices = [3, 1, 2, 4]

df = pd.DataFrame({"Date": dates, "Fruit": fruits, "Price": prices})

df_shuffled = sklearn.utils.shuffle(df)
print(df_shuffled)

Output:

       Date   Fruit  Price
3  April-13   Mango      4
0  April-10   Apple      3
1  April-11  Papaya      1
2  April-12  Banana      2

If you do not have sklearn package installed in your you can simply install it using the script:

pip install -U scikit-learn

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Suraj Joshi

Suraj Joshi is a backend software engineer at Matrice.ai.

pandas.DataFrame.sample()method to Shuffle DataFrame Rows in Pandas

numpy.random.permutation() to Shuffle Pandas DataFrame Rows

sklearn.utils.shuffle() to Shuffle Pandas DataFrame Rows

Related Article - Pandas DataFrame Row

`pandas.DataFrame.sample()`method to Shuffle DataFrame Rows in Pandas

`numpy.random.permutation()` to Shuffle Pandas DataFrame Rows

`sklearn.utils.shuffle()` to Shuffle Pandas DataFrame Rows