Calculating Percentage Change with Respect to Initial Year in a Pandas DataFrame
In this article, we will explore how to calculate the percentage change of values with respect to an initial year for each variable, scenario, and region in a pandas DataFrame. We will use the pandas library and provide examples and explanations throughout the article.
Introduction
The pandas library is a powerful tool for data manipulation and analysis in Python. It provides a wide range of data structures and functions for working with tabular data, including DataFrames. In this article, we will focus on calculating the percentage change of values with respect to an initial year for each variable, scenario, and region in a pandas DataFrame.
Creating a Sample DataFrame
To illustrate the concept, let’s create a sample DataFrame using numpy and pandas. We’ll create three scenarios (BAU, ETS, and ESD), two regions (Italy and France), and two variables (GDP and GHG). Each variable will have values for five years: 2015, 2016, 2017, 2018, and 2019.
import numpy as np
import pandas as pd
sns.set()
df = pd.DataFrame({
'scenario': np.random.choice(['BAU', 'ETS', 'ESD'], 27),
'region': np.random.choice(['Italy', 'France'], 27),
'variable': np.random.choice(['GDP', 'GHG'], 27),
'2015': np.random.randn(27),
'2016': np.random.randn(27),
'2017': np.random.randn(27),
'2018': np.random.randn(27),
'2019': np.random.randn(27)
})
df2 = pd.melt(df, id_vars=['scenario', 'region', 'variable'], var_name='year')
all_names_index = df2.set_index(['scenario', 'region', 'variable', 'year']).sort_index()
Calculating Percentage Change
To calculate the percentage change of values with respect to an initial year for each variable, scenario, and region, we’ll use the following steps:
- Reset the index to create a flat DataFrame.
- Group by
scenario,region, andvariable. - Calculate the sum of values for each group.
- Divide each value by its initial value (2015).
- Subtract 1 from each result.
- Multiply by 100 to convert to percentage.
all_names_index.reset_index(inplace=True)
all_names_index = all_names_index.groupby(by=['scenario', 'region', 'variable'])['value'].sum().reset_index()
all_names_index['pct_change'] = all_names_index.groupby(by=['scenario', 'region', 'variable'])['value'].apply(lambda x: (x - x.iloc[0]) / x.iloc[0] * 100)
Handling Missing Values
If there are missing values in the DataFrame, you’ll need to handle them before performing the percentage change calculation. One way to do this is by filling the missing values with a specific value or using interpolation techniques.
all_names_index['pct_change'] = all_names_index.groupby(by=['scenario', 'region', 'variable'])['value'].apply(lambda x: (x.fillna(x.iloc[0]) - x.iloc[0]) / x.iloc[0] * 100)
Example Use Case
Let’s say we have a DataFrame with the following structure:
| scenario | region | variable | year | value |
|---|---|---|---|---|
| BAU | France | GDP | 2015 | 10.0 |
| ETS | Italy | GHG | 2016 | 20.0 |
| … | … | … | … | … |
We can use the pandas library to calculate the percentage change of values with respect to the initial year (2015) for each variable, scenario, and region.
import pandas as pd
df = pd.DataFrame({
'scenario': ['BAU', 'ETS', 'ESD'],
'region': ['France', 'Italy', 'Spain'],
'variable': ['GDP', 'GHG', 'GDP'],
'year': [2015, 2016, 2017],
'value': [10.0, 20.0, 30.0]
})
df['pct_change'] = (df['value'] - df.iloc[0]['value']) / df.iloc[0]['value'] * 100
print(df)
Output:
| scenario | region | variable | year | value | pct_change |
|---|---|---|---|---|---|
| BAU | France | GDP | 2015 | 10.0 | 0.0 |
| ETS | Italy | GHG | 2016 | 20.0 | 100.0 |
| ESD | Spain | GDP | 2017 | 30.0 | 200.0 |
In this example, the percentage change for the first variable (GDP) in the BAU scenario is 0%, indicating no change from the initial value. The percentage change for the second variable (GHG) in the ETS scenario is 100%, indicating a 100% increase from the initial value.
Last modified on 2025-05-05