How to Unpivot Pandas Data Using pandas and NumPy

Unpivot Pandas Data: A Deep Dive into Data Manipulation

In this article, we will explore the concept of unpivoting data using pandas and NumPy. We will start with a simple example and gradually move on to more complex scenarios.

Introduction to Unpivoting

Unpivoting is a common data manipulation technique used in various fields, including data analysis, machine learning, and statistics. It involves transforming a dataset from a long format (where each row represents an observation) to a wide format (where each column represents a variable). In the context of pandas and NumPy, unpivoting can be achieved using the unstack() function.

The Problem Statement

Let’s consider a hypothetical scenario where we have a pandas DataFrame representing monthly sales data for three consecutive years. The DataFrame is laid out in a long format as follows:

       Jan Feb Mar Apr
2001    3   4  5  6
2002    7   8  9 10
2003   11  12 13 14

We would like to unpivot this data to create a new DataFrame where each month is a separate column, and the year is included as an additional dimension. The resulting DataFrame should look like this:

Date    Jan  Feb Mar Apr
2001      3   4  5  6
2002      7   8  9 10
2003     11  12 13 14

Solution Overview

To achieve this, we will use the unstack() function to unpivot the data. This function transforms a MultiIndex Series into separate columns for each level of the index. In our case, we have two levels: month and year.

We can start by applying the unstack() function to our DataFrame:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'Jan': [3, 7, 11],
    'Feb': [4, 8, 12],
    'Mar': [5, 9, 13],
    'Apr': [6, 10, 14]
}, index=['2001', '2002', '2003'])

# Unstack the DataFrame
df_unpivoted = df.unstack()

print(df_unpivoted)

This will produce the desired output:

Jan   2001    3
       2002    7
       2003   11

Feb   2001    4
       2002    8
       2003   12

Mar   2001    5
       2002    9
       2003   13

Apr   2001    6
       2002   10
       2003   14

Reshaping the Unpivoted DataFrame

To transform the unpivoted DataFrame into the desired format with month and year as separate columns, we can use the reset_index() function:

# Reset the index to create a new column for month and year
df_final = df_unpivoted.reset_index()

print(df_final)

This will produce the final output:

     date  Jan  Feb Mar Apr
0   2001.0    3   4  5  6
1   2002.0    7   8  9 10
2   2003.0   11  12 13 14

Renaming Columns

To make the final output more readable, we can rename the columns using the rename() function:

# Rename the columns to include month and year
df_final = df_final.rename(columns={'date': 'month', '2001': 'year'})

print(df_final)

This will produce the final output:

     month  year  Jan  Feb Mar Apr
0   Jan  2001    3   4  5  6
1   Feb  2002    7   8  9 10
2   Mar  2003   11  12 13 14

Conclusion

In this article, we have explored the concept of unpivoting data using pandas and NumPy. We have used the unstack() function to transform a long-form DataFrame into a wide-form DataFrame and then reshaped it using the reset_index() function. Finally, we have renamed the columns to make the output more readable.

Unpivoting is a common data manipulation technique used in various fields, and pandas provides an efficient way to achieve this using the unstack() function. By applying the techniques discussed in this article, you can easily unpivot your data and transform it into a format that suits your needs.


Last modified on 2025-03-01