Calculating Percentage Change with Respect to Initial Year in a Pandas DataFrame

In this article, we will explore how to calculate the percentage change of values with respect to an initial year for each variable, scenario, and region in a pandas DataFrame. We will use the pandas library and provide examples and explanations throughout the article.

Introduction

The pandas library is a powerful tool for data manipulation and analysis in Python. It provides a wide range of data structures and functions for working with tabular data, including DataFrames. In this article, we will focus on calculating the percentage change of values with respect to an initial year for each variable, scenario, and region in a pandas DataFrame.

Creating a Sample DataFrame

To illustrate the concept, let’s create a sample DataFrame using numpy and pandas. We’ll create three scenarios (BAU, ETS, and ESD), two regions (Italy and France), and two variables (GDP and GHG). Each variable will have values for five years: 2015, 2016, 2017, 2018, and 2019.

import numpy as np
import pandas as pd

sns.set()

df = pd.DataFrame({
    'scenario': np.random.choice(['BAU', 'ETS', 'ESD'], 27),
    'region': np.random.choice(['Italy', 'France'], 27),
    'variable': np.random.choice(['GDP', 'GHG'], 27),
    '2015': np.random.randn(27),
    '2016': np.random.randn(27),
    '2017': np.random.randn(27),
    '2018': np.random.randn(27),
    '2019': np.random.randn(27)
})

df2 = pd.melt(df, id_vars=['scenario', 'region', 'variable'], var_name='year')
all_names_index = df2.set_index(['scenario', 'region', 'variable', 'year']).sort_index()

Calculating Percentage Change

To calculate the percentage change of values with respect to an initial year for each variable, scenario, and region, we’ll use the following steps:

Reset the index to create a flat DataFrame.
Group by scenario, region, and variable.
Calculate the sum of values for each group.
Divide each value by its initial value (2015).
Subtract 1 from each result.
Multiply by 100 to convert to percentage.

all_names_index.reset_index(inplace=True)
all_names_index = all_names_index.groupby(by=['scenario', 'region', 'variable'])['value'].sum().reset_index()
all_names_index['pct_change'] = all_names_index.groupby(by=['scenario', 'region', 'variable'])['value'].apply(lambda x: (x - x.iloc[0]) / x.iloc[0] * 100)

Handling Missing Values

If there are missing values in the DataFrame, you’ll need to handle them before performing the percentage change calculation. One way to do this is by filling the missing values with a specific value or using interpolation techniques.

all_names_index['pct_change'] = all_names_index.groupby(by=['scenario', 'region', 'variable'])['value'].apply(lambda x: (x.fillna(x.iloc[0]) - x.iloc[0]) / x.iloc[0] * 100)

Example Use Case

Let’s say we have a DataFrame with the following structure:

scenario	region	variable	year	value
BAU	France	GDP	2015	10.0
ETS	Italy	GHG	2016	20.0
…	…	…	…	…

We can use the pandas library to calculate the percentage change of values with respect to the initial year (2015) for each variable, scenario, and region.

import pandas as pd

df = pd.DataFrame({
    'scenario': ['BAU', 'ETS', 'ESD'],
    'region': ['France', 'Italy', 'Spain'],
    'variable': ['GDP', 'GHG', 'GDP'],
    'year': [2015, 2016, 2017],
    'value': [10.0, 20.0, 30.0]
})

df['pct_change'] = (df['value'] - df.iloc[0]['value']) / df.iloc[0]['value'] * 100

print(df)

Output:

scenario	region	variable	year	value	pct_change
BAU	France	GDP	2015	10.0	0.0
ETS	Italy	GHG	2016	20.0	100.0
ESD	Spain	GDP	2017	30.0	200.0

In this example, the percentage change for the first variable (GDP) in the BAU scenario is 0%, indicating no change from the initial value. The percentage change for the second variable (GHG) in the ETS scenario is 100%, indicating a 100% increase from the initial value.

Last modified on 2025-05-05