Calculating Z-scores in Python is a straightforward process that involves using basic mathematical operations or leveraging libraries like NumPy for more efficiency.
This tutorial will guide you through calculating Z-scores in Python both manually and using NumPy.
Manually Calculating Z-Scores in Python
To calculate Z-scores manually, you’ll need to compute the mean and standard deviation of your dataset, and then apply the Z-score formula for each element in your dataset.
Calculate Mean and Standard Deviation:
data = [101, 208, 230, 240, 259]
mean_data = sum(data) / len(data)
import math
std_dev = math.sqrt(sum([(x - mean_data) ** 2 for x in data]) / len(data))
Calculate Z-Scores for Each Element:
z_scores = [(x - mean_data) / std_dev for x in data]
print(z_scores)
The result will be something like this:
Calculating Z-Scores Using NumPy
NumPy is a powerful library for numerical computations in Python. It provides a more efficient way to calculate Z-scores, especially for large datasets. While the manual method is straightforward and educational, using NumPy is more efficient for larger datasets.
First, ensure NumPy is installed in your environment (run in bash). If not, you can install it using pip:
pip install numpy
Then, start by importing NumPy in your Python script:
import numpy as np
Calculate Z-Scores for Single-dimention Arrays
First declare your dataset. Then calculate the mean and standard deviation using NumPy.
data_np = np.array([75, 125, 150, 175, 200]) mean_np = np.mean(data_np) std_dev_np = np.std(data_np)
To calculate and print the Z-scores:
z_scores_np = (data_np - mean_np) / std_dev_np
print(z_scores_np)
The result will be something like this:
Each z-score tells how many standard deviations away an individual value is from the mean. For example:
- Our first value of “75” in the array is –1.6274 standard deviations below the mean.
- Our third value of “150” in the array is 0.11 standard deviations away from the mean (if it would’ve been 0 it would be equal to the mean).
- The last value of “200” in the array is 1.2787 standard deviations above the mean.
Calculate Z-scores for Multiple-dimensions Arrays
Calculating Z-scores for multi-dimensional arrays, such as matrices, can also be efficiently handled using NumPy. Since we may want to calculate Z-scores across a specific axis, often column-wise for datasets, this means calculating the mean and standard deviation for each column and then applying the Z-score formula for each element in that column.
First, let’s create a multi-dimensional array using NumPy:
import numpy as np data_multi = np.array([[10, 15, 20, 25], [40, 45, 50, 55], [57, 58, 59, 60]])
To calculate Z-scores for each column, you specify the axis along which the calculations are performed. In NumPy, axis=0
refers to columns, and axis=1
refers to rows.
mean_col = np.mean(data_multi, axis=0)
std_dev_col = np.std(data_multi, axis=0)
z_scores_multi = (data_multi - mean_col) / std_dev_col
print(z_scores_multi)
The result will be something like this: