Introduction to Pandas Scoring System
Understanding Ties in Rankings
In the context of competitive events like track and field, ranking athletes based on their performance is crucial. However, when multiple athletes share the same score, it can be challenging to determine their relative rankings. The pandas library in Python provides an efficient way to handle such scenarios.
In this article, we will delve into the world of pandas scoring systems and explore how to display ties as “3-4” or other custom formats.
Background on Scoring Systems
The scoring system used in track and field events is often based on a combination of factors, including:
- Performance times
- Distance covered
- Number of hurdles cleared
- Weight lifted (in the case of throwing events)
In our example, we will focus on a simplified scoring system that takes into account only performance times.
Reading the Data
We start by reading the data from a CSV file using pandas:
import numpy as np
from os import sep
import pandas as pd
# Read the data from the CSV file
df = pd.read_csv("Decathlon.csv", sep=";", header=None)
# Rename columns for better readability
df.columns = ["Name", "100 m", "Long jump", "Shot put", "High jump", "400 m",
"110 m hurdles", "Discus throw", "Pole vault", "Javelin throw",
"1500 m"]
Calculating Scores
Next, we calculate the scores for each event based on the performance times:
# Calculate scores for each event
df['100m score'] = round(25.4347*((18-df["100 m"])**1.81))
df["Long jump score"] = round(0.14354*(((df["Long jump"]-220)*-1)**1.4))
df["shot put score"] = round(51.39*((df["Shot put"]-1.5)**1.05))
df["high jump score"] = round(0.8465*(((df["High jump"]-75)*-1)**1.42))
df["400m score"] = round(1.53775*((82-df["400 m"])**1.81))
df['110m hurdles score'] = round(5.74352*((28.5-df['110 m hurdles'])**1.92))
df['Discus throw score'] = round(12.91*((df['Discus throw']-4)**1.1))
df['Pole vault score'] = round(0.2797*(((df['Pole vault']-100)*-1)*1.35))
df['Javelin throw score'] = round(10.14*(((df['Javelin throw']-7)**1.08)))
df['1500 m'] = pd.to_datetime(df['1500 m'].str.strip(), format='%M.%S.%f')
Merging Scores and Calculating Total Score
Now, we merge the scores for each event to calculate the total score:
# Merge scores for each event to calculate total score
df["Total Score"] = df['100m score']+df["Long jump score"]+df["shot put score"]+df["high jump score"]+df["400m score"]+df['110m hurdles score']+df['Discus throw score']+df['Pole vault score']+df['Javelin throw score']
Handling Ties in Rankings
To handle ties in rankings, we can apply a custom function to the total scores:
# Apply a custom function to display ties as '3-4'
def display_ties(score):
if len(df[df["Total Score"] == score]) > 1:
return str(min(df["Ranking"])) + '-' + str(max(df["Ranking"]))
else:
return str(df.loc[0, "Ranking"])
df['Display Score'] = df['Total Score'].apply(display_ties)
Sorting and Ranking
Finally, we sort the data by total score in descending order and assign rankings:
# Sort data by total score in descending order and assign rankings
df.sort_values(by='Total Score', ascending=False, inplace=True)
df.reset_index(drop=True, inplace=True)
df["Ranking"] = df.index + 1
Writing the Data to a JSON File
We can write the sorted data to a JSON file for further analysis:
# Write sorted data to a JSON file
df.to_json('Json file')
By following these steps, we have successfully implemented a pandas scoring system that handles ties in rankings and displays custom formats like “3-4” or other specified formats.
Code Example
Here is the complete code example for reference:
import numpy as np
from os import sep
import pandas as pd
# Read the data from the CSV file
df = pd.read_csv("Decathlon.csv", sep=";", header=None)
# Rename columns for better readability
df.columns = ["Name","100 m","Long jump","Shot put","High jump","400 m",
"110 m hurdles","Discus throw","Pole vault","Javelin throw","1500 m"]
# Calculate scores for each event
df['100m score'] = round(25.4347*((18-df["100 m"])**1.81))
df["Long jump score"] = round(0.14354*(((df["Long jump"]-220)*-1)**1.4))
df["shot put score"] = round(51.39*((df["Shot put"]-1.5)**1.05))
df["high jump score"] = round(0.8465*(((df["High jump"]-75)*-1)**1.42))
df["400m score"] = round(1.53775*((82-df["400 m"])**1.81))
df['110m hurdles score'] = round(5.74352*((28.5-df['110 m hurdles'])**1.92))
df['Discus throw score'] = round(12.91*((df['Discus throw']-4)**1.1))
df['Pole vault score'] = round(0.2797*(((df['Pole vault']-100)*-1)*1.35))
df['Javelin throw score'] = round(10.14*(((df['Javelin throw']-7)**1.08)))
df['1500 m'] = pd.to_datetime(df['1500 m'].str.strip(), format='%M.%S.%f')
# Merge scores for each event to calculate total score
df["Total Score"] = df['100m score']+df["Long jump score"]+df["shot put score"]+df["high jump score"]+df["400m score"]+df['110m hurdles score']+df['Discus throw score']+df['Pole vault score']+df['Javelin throw score']
# Apply a custom function to display ties as '3-4'
def display_ties(score):
if len(df[df["Total Score"] == score]) > 1:
return str(min(df["Ranking"])) + '-' + str(max(df["Ranking"]))
else:
return str(df.loc[0, "Ranking"])
df['Display Score'] = df['Total Score'].apply(display_ties)
# Sort data by total score in descending order and assign rankings
df.sort_values(by='Total Score', ascending=False, inplace=True)
df.reset_index(drop=True, inplace=True)
df["Ranking"] = df.index + 1
# Write sorted data to a JSON file
df.to_json('Json file')
By following this guide, you can implement a pandas scoring system that handles ties in rankings and displays custom formats like “3-4” or other specified formats.
Conclusion
In conclusion, the pandas library provides an efficient way to handle complex data sets, including sports scoring systems. By applying custom functions and sorting data by total score in descending order, you can display ties in rankings as desired.
Whether you’re working with track and field events, football games, or any other competitive scenario, a pandas scoring system can provide valuable insights and help you make informed decisions.
With this guide, you should now have a solid understanding of how to implement a pandas scoring system that handles ties in rankings.
Last modified on 2024-06-30