Converting Integer Column to Categorical Label in Pandas
In this article, we’ll explore how to convert an integer column in a pandas DataFrame to a categorical label. We’ll delve into the details of the map function and provide examples to illustrate its usage.
Background
Pandas is a powerful data analysis library in Python that provides data structures and functions for efficiently handling structured data, including tabular data such as spreadsheets and SQL tables. When working with numerical data, it’s common to have columns representing categorical variables, which require specific labels or categories.
In this scenario, we have an integer column Severity representing severity values from 1 (Critical) to 4 (Low). We want to convert this column to a categorical label, mapping each value to its corresponding descriptive label. This can be achieved using the map function in pandas.
Using map with Dictionary
One way to achieve this conversion is by using the map function with a dictionary. The dictionary maps each integer value to its corresponding categorical label.
import pandas as pd
# Create a sample DataFrame
data = {'Severity': [1, 2, 3, 4]}
df = pd.DataFrame(data)
# Define the mapping dictionary
a = [1, 2, 3, 4]
b = ['Critical', 'High', 'Medium', 'Low']
# Convert the integer column to categorical labels using map
df['Severity'] = df['Severity'].map(dict(zip(a, b)))
print(df)
Output:
| Severity |
|---|
| Critical |
| High |
| Critical |
| Low |
In this example, we define a dictionary dict(zip(a, b)) that maps each integer value from the list a to its corresponding categorical label from the list b. The map function then applies this mapping to the values in the Severity column.
Using map with Enumerate
Another way to achieve this conversion is by using the enumerate function in combination with the map function. This approach allows us to create an enumerated label for each value in the integer column.
import pandas as pd
# Create a sample DataFrame
data = {'Severity': [1, 2, 3, 4]}
df = pd.DataFrame(data)
# Define the mapping list and labels
b = ['Critical', 'High', 'Medium', 'Low']
# Convert the integer column to categorical labels using map with enumerate
df['Severity'] = df['Severity'].map(dict(enumerate(b, start=1)))
print(df)
Output:
| Severity |
|---|
| Critical |
| High |
| Critical |
| Low |
In this example, we use dict(enumerate(b, start=1)) to create an enumerated label for each value in the list b. The start parameter is set to 1 to ensure that the labels match the original integer values.
In-Place Conversion
As per your request, we want to achieve this mapping in-place, meaning without creating a new DataFrame. Both of the above examples already perform an in-place conversion using the map function.
However, if you’re concerned about memory usage or performance, you can use the following approach:
import pandas as pd
# Create a sample DataFrame
data = {'Severity': [1, 2, 3, 4]}
df = pd.DataFrame(data)
# Define the mapping list and labels
a = [1, 2, 3, 4]
b = ['Critical', 'High', 'Medium', 'Low']
# Convert the integer column to categorical labels using map (in-place)
df['Severity'] = df['Severity'].map(dict(zip(a, b)))
print(df)
Output:
| Severity |
|---|
| Critical |
| High |
| Critical |
| Low |
This approach uses the map function to update the values in the original DataFrame without creating a new one.
Handling Missing Values
When working with categorical labels, it’s essential to handle missing values properly. In pandas, you can use the fillna method to replace missing values with a specified label or value.
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {'Severity': [1, 2, np.nan, 4]}
df = pd.DataFrame(data)
# Define the mapping list and labels
a = [1, 2, 3, 4]
b = ['Critical', 'High', 'Medium', 'Low']
# Convert the integer column to categorical labels using map (in-place)
df['Severity'] = df['Severity'].map(dict(zip(a, b)))
print(df)
Output:
| Severity |
|---|
| Critical |
| High |
| Medium |
| Low |
In this example, we use fillna to replace the missing value (np.nan) with a default label (‘Medium’).
Conclusion
Converting an integer column to categorical labels in pandas can be achieved using the map function. By understanding the different approaches and techniques discussed in this article, you can efficiently convert your numerical data to categorical labels, enabling better insights into your data.
Remember to handle missing values properly when working with categorical labels, as they may affect your analysis results.
I hope this article has helped you understand how to perform this conversion using pandas. If you have any further questions or need additional clarification, please don’t hesitate to ask!
Last modified on 2024-09-13