In this article, we’ll walk through a Python-based analysis of EA Sports FC 25 top 100 players using the FIFA dataset. This tutorial covers everything from data loading and cleaning to performing exploratory data analysis (EDA) and visualizing player statistics based on positions and nationalities.
Video
1. Importing Libraries
First, we need to import the necessary Python libraries for our analysis. Pandas will be used for data manipulation, while Seaborn and Matplotlib will help visualize the results.
import pandas as pd # Pandas for data manipulation
import numpy as np # NumPy for numerical computations
import matplotlib.pyplot as plt # Matplotlib for plotting
import seaborn as sns # Seaborn for statistical data visualization
The Dataset
The dataset used in this analysis consists of publicly available data on the top 100 EA Sports FC 25 players, capturing essential attributes that define their in-game performance. This includes overall ratings (OVA) along with specific skill metrics like pace (PAC), shooting (SHO), passing (PAS), dribbling (DRI), defense (DEF), and physicality (PHY). These ratings are provided by FIFA’s extensive evaluation system, which assesses players across different leagues and countries, giving a comprehensive overview of the top-performing athletes in football. The dataset is a valuable resource for both casual fans and data analysts, offering insights into how various attributes contribute to a player’s effectiveness on the field.
The dataset can be downloaded here.
2. Loading the Dataset
Next, we’ll load the dataset, which contains key attributes of EA Sports FC 25 top 100 players. This dataset includes attributes such as player nationality, position, and various skill ratings like shooting, passing, and overall rating.
df = pd.read_csv('
EA Sports FC 25.csv', encoding='utf-8') # Load the FIFA dataset
3. Understanding the Data
Before diving into analysis, it’s crucial to understand the structure of our dataset. We can inspect the data types and look for missing values using the following code:
df.info() # Overview of data structure and types
df.isnull().sum() # Check for missing values
We can also quickly preview the data:
df.head() # Display the first five rows
df.sample(5) # Display a random sample
4. Exploratory Data Analysis (EDA)
Now, let’s explore the dataset and visualize some interesting patterns. We’ll begin by calculating descriptive statistics for all numerical columns:
df.describe() # Get summary statistics for numerical columns
4.1 Player Positions and Attributes
One common analysis is to group players by their position and examine the average player attributes for each position. Here’s how we can calculate and display the results:
position_stats = df.groupby('Position')[['OVA', 'PAC', 'SHO', 'PAS', 'DRI', 'DEF', 'PHY']].mean().reset_index().sort_values('OVA', ascending=False)
This gives us an overview of how different positions compare in terms of key attributes like overall rating (OVA), pace (PAC), shooting (SHO), and so on.
4.2 Visualizing Top Countries
We can also analyze which countries produce the most top FIFA players. To do this, we group players by nationality and visualize the top 10 countries:
top_countries = df['Nationality'].value_counts().head(10)
sns.barplot(x=top_countries.values, y=top_countries.index, palette='magma')
plt.title('Top 10 Countries by Number of Players')
plt.show()
This bar chart shows the top countries, providing insights into which nations dominate the top 100 player list.
5. Comparative Analysis: Positions
To analyze specific positions in more detail, let’s focus on Strikers (ST) and Central Midfielders (CM). We can create a scatter plot to compare their shooting (SHO) and passing (PAS) attributes:
df_filtered = df.query('Position in ["ST", "CM"]')
plt.figure(figsize=(8,8))
sns.scatterplot(data=df_filtered, x='SHO', y='PAS', hue='Position', size='OVA')
plt.title('Shooting vs Passing for ST and CM Positions')
plt.xlabel('Shooting (SHO)')
plt.ylabel('Passing (PAS)')
plt.show()
This scatter plot highlights how players in different positions compare, particularly in shooting and passing attributes.
6. Conclusion
In this article, we’ve explored FIFA’s top 100 players using Python and data science techniques. From understanding the dataset to analyzing players by position and nationality, we’ve uncovered interesting insights that can help fans and analysts alike appreciate the skill distribution across top players.
For more detailed analyses, feel free to modify the code and explore additional aspects of the dataset. You can also enhance the visualizations by diving deeper into specific positions, nationalities, or player attributes.
Full Code
# Import necessary libraries
import pandas as pd # Pandas for data manipulation
import numpy as np # NumPy for numerical computations
import matplotlib.pyplot as plt # Matplotlib for plotting
import seaborn as sns # Seaborn for statistical data visualization
# Read the Dataset
df = pd.read_csv('EA Sports FC 25.csv',encoding='utf-8') # Load the FIFA dataset into a DataFrame
# Data Understanding
df.info()
# Exploring the DataFrame
df.head() # Display the first five rows of the DataFrame
df.tail() # Display the last five rows of the DataFrame
df.sample(5) # Display a random sample of one row from the DataFrame
# Statistical Summary
df.describe() # Generate descriptive statistics for numerical columns
# Checking for Missing Values
df.isnull().sum() # Calculate the total number of missing values in each column
# Exploratory Data Analysis
# Overall Player Analysis
# Distribution of Overall Ratings (OVA)
sns.countplot(data=df,x=df.OVA,palette='winter',hue=df.OVA)
plt.title('Distribution of Overall Ratings') # Set the title of the plot
plt.show() # Display the plot
# Correlation Matrix
corr_matrix = df[['PAC', 'SHO', 'PAS', 'DRI', 'DEF', 'PHY']].corr() # Compute correlation matrix for selected attributes
sns.heatmap(corr_matrix, annot=True) # Plot heatmap of the correlation matrix with annotations
plt.title('Attribute Correlation Matrix') # Set the title of the plot
plt.show() # Display the plot
# Top Players Analysis
# Grouping by Position
position_stats = df.groupby('Position')[['OVA','PAC', 'SHO', 'PAS', 'DRI', 'DEF', 'PHY']].mean().reset_index().sort_values('OVA',ascending=False) # Calculate average attributes by position
# Top countries
top_countries = df['Nationality'].value_counts().head(10) # Get the top 10 countries by number of players
sns.barplot(x=top_countries.values, y=top_countries.index,palette='magma',hue=top_countries.index) # Create a horizontal bar plot of the top countries
plt.title('Top 10 Countries by Number of Players') # Set the title of the plot
plt.show() # Display the plot
# Questions about the Data
# Filter the DataFrame for the positions you're interested in
df_filtered = df.query('Position in ["ST", "CM"]')
plt.figure(figsize=(8,8))
# Create a scatter plot with seaborn
sns.scatterplot(data=df_filtered, x='SHO', y='PAS', hue='Position',size='OVA',legend='auto')
# Add title and labels
plt.title('Shooting vs Passing for ST and CM Positions')
plt.xlabel('Shooting (SHO)')
plt.ylabel('Passing (PAS)')
# Display the plot
plt.show()