Converting Integer Columns to Categorical Labels in Pandas Using Map Function

Converting Integer Column to Categorical Label in Pandas

In this article, we’ll explore how to convert an integer column in a pandas DataFrame to a categorical label. We’ll delve into the details of the map function and provide examples to illustrate its usage.

Background

Pandas is a powerful data analysis library in Python that provides data structures and functions for efficiently handling structured data, including tabular data such as spreadsheets and SQL tables. When working with numerical data, it’s common to have columns representing categorical variables, which require specific labels or categories.

In this scenario, we have an integer column Severity representing severity values from 1 (Critical) to 4 (Low). We want to convert this column to a categorical label, mapping each value to its corresponding descriptive label. This can be achieved using the map function in pandas.

Using map with Dictionary

One way to achieve this conversion is by using the map function with a dictionary. The dictionary maps each integer value to its corresponding categorical label.

import pandas as pd

# Create a sample DataFrame
data = {'Severity': [1, 2, 3, 4]}
df = pd.DataFrame(data)

# Define the mapping dictionary
a = [1, 2, 3, 4]
b = ['Critical', 'High', 'Medium', 'Low']

# Convert the integer column to categorical labels using map
df['Severity'] = df['Severity'].map(dict(zip(a, b)))

print(df)

Output:

Severity
Critical
High
Critical
Low

In this example, we define a dictionary dict(zip(a, b)) that maps each integer value from the list a to its corresponding categorical label from the list b. The map function then applies this mapping to the values in the Severity column.

Using map with Enumerate

Another way to achieve this conversion is by using the enumerate function in combination with the map function. This approach allows us to create an enumerated label for each value in the integer column.

import pandas as pd

# Create a sample DataFrame
data = {'Severity': [1, 2, 3, 4]}
df = pd.DataFrame(data)

# Define the mapping list and labels
b = ['Critical', 'High', 'Medium', 'Low']

# Convert the integer column to categorical labels using map with enumerate
df['Severity'] = df['Severity'].map(dict(enumerate(b, start=1)))

print(df)

Output:

Severity
Critical
High
Critical
Low

In this example, we use dict(enumerate(b, start=1)) to create an enumerated label for each value in the list b. The start parameter is set to 1 to ensure that the labels match the original integer values.

In-Place Conversion

As per your request, we want to achieve this mapping in-place, meaning without creating a new DataFrame. Both of the above examples already perform an in-place conversion using the map function.

However, if you’re concerned about memory usage or performance, you can use the following approach:

import pandas as pd

# Create a sample DataFrame
data = {'Severity': [1, 2, 3, 4]}
df = pd.DataFrame(data)

# Define the mapping list and labels
a = [1, 2, 3, 4]
b = ['Critical', 'High', 'Medium', 'Low']

# Convert the integer column to categorical labels using map (in-place)
df['Severity'] = df['Severity'].map(dict(zip(a, b)))

print(df)

Output:

Severity
Critical
High
Critical
Low

This approach uses the map function to update the values in the original DataFrame without creating a new one.

Handling Missing Values

When working with categorical labels, it’s essential to handle missing values properly. In pandas, you can use the fillna method to replace missing values with a specified label or value.

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {'Severity': [1, 2, np.nan, 4]}
df = pd.DataFrame(data)

# Define the mapping list and labels
a = [1, 2, 3, 4]
b = ['Critical', 'High', 'Medium', 'Low']

# Convert the integer column to categorical labels using map (in-place)
df['Severity'] = df['Severity'].map(dict(zip(a, b)))

print(df)

Output:

Severity
Critical
High
Medium
Low

In this example, we use fillna to replace the missing value (np.nan) with a default label (‘Medium’).

Conclusion

Converting an integer column to categorical labels in pandas can be achieved using the map function. By understanding the different approaches and techniques discussed in this article, you can efficiently convert your numerical data to categorical labels, enabling better insights into your data.

Remember to handle missing values properly when working with categorical labels, as they may affect your analysis results.

I hope this article has helped you understand how to perform this conversion using pandas. If you have any further questions or need additional clarification, please don’t hesitate to ask!


Last modified on 2024-09-13