How to Create Nested DataFrames in Pandas

Salman Mehmood Feb 02, 2024 Pandas Pandas DataFrame

Create Nested DataFrames in Pandas Using the pd.DataFrame() Function
Create Nested Dataframes in Pandas Using the pd.concat() Function
Conclusion

How to Create Nested DataFrames in Pandas

Pandas DataFrames are foundational structures for managing data with two dimensions and associated labels. These versatile tools are frequently used in data-intensive sectors such as data science, machine learning, and scientific computing, and their similarity to SQL tables and spreadsheet apps like as Excel and Calc makes them essential for data processing.

This article will guide us through the creation and manipulation of nested DataFrames in Pandas. We’ll go over two methods for creating nested DataFrames, as well as tips on reading and dealing with potential problems that may arise while working with nested DataFrames in Python.

Create Nested DataFrames in Pandas Using the `pd.DataFrame()` Function

In the realm of data management with Pandas, the ability to combine DataFrame instances into a new, more complex structure known as nested DataFrames. This technique allows us to organize and manipulate data in an efficient and structured manner.

However, when working with substantial volumes of data, scenarios may arise where it’s beneficial to consolidate related DataFrames into a single, more manageable structure. Consider the following code:

import pandas as pd

data = [
    {"a": 1, "b": 2, "c": 3},
    {"a": 10, "b": 20, "c": 30},
    {"a": 40, "b": 50, "c": 60},
    {"a": 70, "b": 80, "c": 90},
]

data2 = [
    {"d": 1, "e": 2, "f": 3},
    {"d": 10, "e": 20, "f": 30},
    {"d": 40, "e": 50, "f": 60},
    {"d": 70, "e": 80, "f": 90},
]

data3 = [
    {"g": 1, "h": 2, "i": 3},
    {"g": 10, "h": 20, "i": 30},
    {"g": 40, "h": 50, "i": 60},
    {"g": 70, "h": 80, "i": 90},
]

df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)

df4 = pd.DataFrame({"idx": [1, 2, 3], "dfs": [df, df2, df3]})

print(df4)

In this code, we created three separate DataFrames, df, df2, and df3. Each DataFrame holds valuable data, but individually managing them can be unwieldy, especially when working with substantial datasets.

To tackle this issue, nesting these DataFrames can be a strategic approach for simplified data access. A nested DataFrame is essentially a new DataFrame that encapsulates the related DataFrames, providing a more organized and manageable structure.

To create a nested DataFrame, we use this line of code: df4 = pd.DataFrame({"idx": [1, 2, 3], "dfs": [df, df2, df3]}). In this line of code, we create a new DataFrame, df4, with two columns.

The "idx" column contains numerical indices, while the "dfs" column is an array containing our previously defined DataFrames: df, df2, and df3. This combination results in a nested DataFrame, making it easier to work with related datasets.

Output:

   idx                                                dfs
0    1      a   b   c
0   1   2   3
1  10  20  30
2  4...
1    2      d   e   f
0   1   2   3
1  10  20  30
2  4...
2    3      g   h   i
0   1   2   3
1  10  20  30
2  4...

The above output of df4 displays these nested DataFrames, although the structure may not be immediately clear. To retrieve and work with the individual DataFrames within this nested structure, you can use the following code:

import pandas as pd

data = [
    {"a": 1, "b": 2, "c": 3},
    {"a": 10, "b": 20, "c": 30},
    {"a": 40, "b": 50, "c": 60},
    {"a": 70, "b": 80, "c": 90},
]

data2 = [
    {"d": 1, "e": 2, "f": 3},
    {"d": 10, "e": 20, "f": 30},
    {"d": 40, "e": 50, "f": 60},
    {"d": 70, "e": 80, "f": 90},
]

data3 = [
    {"g": 1, "h": 2, "i": 3},
    {"g": 10, "h": 20, "i": 30},
    {"g": 40, "h": 50, "i": 60},
    {"g": 70, "h": 80, "i": 90},
]

df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)

df4 = pd.DataFrame({"idx": [1, 2, 3], "dfs": [df, df2, df3]})

print(
    "Dataframe 1: \n"
    + str(df4["dfs"].iloc[0])
    + "\n\nDataframe 2:\n"
    + str(df4["dfs"].iloc[1])
    + "\n\nDataframe 3:\n"
    + str(df4["dfs"].iloc[2])
)

In the provided code, we are printing three different DataFrames from a nested DataFrame, df4. We use the iloc[] method to access specific rows within the "dfs" column of df4.

Here, we are referencing these rows by their numerical indices. So, in the printed output, "Dataframe 1" represents the first DataFrame from the "dfs" column, "Dataframe 2" is the second one, and "Dataframe 3" is the third one.

By using this approach, we can view and analyze the content of each nested DataFrame individually. This is particularly useful when we have multiple related datasets stored within the nested structure, allowing us to examine and work with each one separately for more focused data analysis.

Output:

Dataframe 1:
    a   b   c
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

Dataframe 2:
    d   e   f
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

Dataframe 3:
    g   h   i
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

As we can see in the output, the DataFrames are now organized. This clarity in organizing your data can significantly enhance your workflow.

It’s important to note that nested DataFrames are context-specific and are most suitable for specific scenarios and use cases. Therefore, it’s crucial to carefully evaluate your data structure and intended operations before deciding whether nested DataFrames are the right fit.

Create Nested Dataframes in Pandas Using the `pd.concat()` Function

Creating nested DataFrames in Pandas using the pd.concat() function is a powerful way to combine multiple DataFrames. This approach allows you to concatenate or stack DataFrames side by side or on top of each other.

Let’s have another example of creating nested DataFrames in Pandas.

import pandas as pd

data1 = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 22]}
df1 = pd.DataFrame(data1)

data2 = {"Math": [85, 90, 78], "Science": [92, 88, 76]}
df2 = pd.DataFrame(data2)

data3 = {"English": [80, 85, 88], "History": [75, 82, 90]}
df3 = pd.DataFrame(data3)

nested_data = {"Student Info": df1, "Math Scores": df2, "Other Scores": df3}
nested_df = pd.concat(nested_data, axis=1)

print(nested_df)

Create Nested DataFrames in Pandas Using the pd.DataFrame() Function

Create Nested Dataframes in Pandas Using the pd.concat() Function

Create Nested DataFrames in Pandas Using the `pd.DataFrame()` Function

Create Nested Dataframes in Pandas Using the `pd.concat()` Function