Extracting City and State Information from a CSV Column using Python with pandas Library

Extracting City and State from a Column in CSV using Python

In this article, we will explore how to extract city and state information from a column in a CSV file using Python. We will use the pandas library, which is a powerful tool for data manipulation and analysis.

Introduction

CSV (Comma Separated Values) files are a common format for storing tabular data. However, when working with this type of data, it can be challenging to extract specific information, such as city and state names, from a single column. In this article, we will walk through the process of using Python to extract city and state information from a CSV file.

Prerequisites

To follow along with this tutorial, you will need:

  • Python installed on your system
  • The pandas library installed (you can install it using pip: pip install pandas)

Reading the CSV File

The first step in working with our CSV data is to read it into a DataFrame using the pandas library. Here’s an example of how we can do this:

import pandas as pd

# Create a sample DataFrame from the CSV file
all_data = pd.DataFrame({'Purchase Address': ['917 1st St, Dallas, TX 75001',
                                               '682 Chestnut St, Boston, MA 02215',
                                               '669 Spruce St, Los Angeles, CA 90001',
                                               '669 Spruce St, Los Angeles, CA 90001']})

# Print the DataFrame to verify its contents
print(all_data)

This code creates a sample DataFrame from our CSV file and prints it to the console. You should see the following output:

                 Purchase Address
0     917 1st St, Dallas, TX 75001
1   682 Chestnut St, Boston, MA 02215
2    669 Spruce St, Los Angeles, CA 90001
3    669 Spruce St, Los Angeles, CA 90001

Extracting City and State Information

Now that we have our DataFrame in hand, let’s move on to extracting city and state information from the Purchase Address column. We’ll use a function called get_city_state() to do this.

def get_city_state(address):
    asplit = address.split(",")
    ssplit = address.split(" ")
    city = asplit[1].split()[-1]
    state = asplit[2].split()[0]
    return city, state

This function takes an address string as input and splits it into two parts: the city name (which is in the second position) and the state name (which is in the third position). It then returns these values as a tuple.

Applying the Function to the DataFrame

To extract city and state information from our DataFrame, we can apply the get_city_state() function to each value in the Purchase Address column. Here’s how we can do this:

all_data['City'] = all_data['Purchase Address'].apply(get_city_state).to_list()

This code applies the get_city_state() function to each value in the Purchase Address column and stores the results in a new column called City.

Similarly, we can extract state information by applying the get_city_state() function to the Purchase Address column:

all_data['States'] = all_data['Purchase Address'].apply(get_city_state).to_list()

This code extracts the state information from each address and stores it in a new column called States.

Extracting Both City and State Information

Alternatively, we can use the get_city_state() function to extract both city and state information from each address. Here’s how we can do this:

all_data[['City', 'State']] = all_data['Purchase Address'].apply(get_city_state).to_list()

This code applies the get_city_state() function to each value in the Purchase Address column and stores the results in two new columns called City and State.

Output

Once we have applied the get_city_state() function to our DataFrame, we can see the extracted city and state information in our new columns:

                 Purchase Address      City        State
0     917 1st St, Dallas, TX 75001   Dallas        TX
1   682 Chestnut St, Boston, MA 02215    Boston        MA
2    669 Spruce St, Los Angeles, CA 90001   Los Angeles    CA
3    669 Spruce St, Los Angeles, CA 90001   Los Angeles    CA

As you can see, the City and State columns contain the extracted information from each address.

Conclusion

In this article, we explored how to extract city and state information from a CSV file using Python. We used the pandas library to read the data into a DataFrame and applied a custom function called get_city_state() to extract the desired information. By applying this function to our DataFrame, we were able to create new columns containing the extracted city and state information.

This technique can be useful for working with CSV files in Python, especially when you need to extract specific information from each row.


Last modified on 2025-03-15