Discover Top Posts Tagged with #z-score

The z-score in statistics

Okay, time to get back to statistics, if only for today! P-value, z-score, f-statistic, there are a lot of ways to get information about the sample of data you have. Of course, they all tell you something slightly different about the data and that information is useful when you know what the heck it is even trying to tell you. For that reason we’re diving into the z-score, it’s actually one of…

View On WordPress

#academia #Education #learning #Math #mathematics #PhD #school #science #statistics #stats #student #z-score #z score #data

New Oc, his name is 'Conception Z-score', the middle brothers of the group.

#my art #my oc #z-score

Feature Scaling & Normalization

Introduction

Feature scaling and normalization are essential steps in machine learning because most algorithms rely on numerical stability and distance-based calculations. When features are on vastly different scales—such as age (0–100) and income (0–100,000)—the model may unintentionally give more importance to the larger-scaled variable. Scaling ensures that all features contribute equally, improves optimisation speed, and prevents distorted model behaviour.

Some algorithms are highly sensitive to feature magnitude—like SVM, KNN, and neural networks—while others, such as tree-based models, remain unaffected. Understanding when and how to scale is a key skill in feature engineering.

Why Scaling Matters

Prevents one feature from dominating others

Improves gradient descent convergence

Ensures fair distance calculations in KNN, K-Means, SVM

Helps stabilise neural network training

Reduces numerical instability

Makes model behaviour more interpretable and reliable

Common Feature Scaling & Normalization Methods

1. Standardization (Z-Score Scaling)

Standardization transforms data so that each feature has a mean of 0 and a standard deviation of 1.

Formula: new_value = (value – mean) / standard_deviation

Use when:

Your data follows a normal distribution

You're using linear models, logistic regression, SVM, KNN, PCA, or neural networks

Why it’s useful: It centers the distribution and helps algorithms converge faster.

2. Min-Max Normalization

Rescales data into a fixed range, often 0 to 1.

Formula: new_value = (value – min) / (max – min)

Use when:

You need values strictly between 0 and 1

You use distance-based algorithms (KNN, K-Means)

Neural network models (especially those using sigmoid or tanh activation)

Important note: Sensitive to outliers—extreme values can compress everything else.

3. Robust Scaling

Reduces the effect of outliers by scaling based on the median and IQR (interquartile range).

Formula: new_value = (value – median) / IQR

Use when:

Your dataset contains extreme outliers

You want stable scaling without letting outliers dominate

4. Log Transform

Applies a logarithmic transformation to reduce skewness.

Use when:

The feature is right-skewed (e.g., income, transaction amounts)

You want to compress large ranges

You need a more normal-like distribution

Note: Can only be applied to positive values.

Which Algorithms Need Scaling?

Algorithms that require scaling:

Support Vector Machines (SVM)

K-Nearest Neighbours (KNN)

K-Means clustering

Logistic Regression

Linear Regression (better performance)

PCA (Principal Component Analysis)

Neural Networks (deep learning models)

These are sensitive because they rely on distance calculations or gradient descent.

Algorithms that do not need scaling:

Decision Trees

Random Forest

XGBoost, LightGBM, CatBoost

Naive Bayes

Rules-based algorithms

Tree models split on thresholds, so feature magnitude does not affect performance.

Common Mistakes to Avoid

Scaling before splitting into train/test (causes data leakage)

Scaling categorical data accidentally

Using Min-Max with heavy outliers

Applying log transform to zero or negative values

Scaling target variable unless specifically required for regression

Best Practices

Always fit the scaler only on training data

Use the same scaler to transform the test set

Use pipelines to automate scaling with model training

Combine scaling with imputation and encoding in a proper workflow

Closing Summary

Feature scaling is an essential preprocessing step that directly influences model accuracy, stability, and training efficiency. While not all algorithms require scaling, understanding which methods to apply—and when—is critical for producing robust machine-learning models. This episode equips you with the foundational techniques to scale features correctly and avoid common pitfalls, setting the stage for deeper feature engineering strategies in the coming episodes.

#feature-scaling #normalization #z-score #min-max #robust-scaling #ml-preprocessing #data-cleaning #machine-learning #model-training #data-engineering

Z- score = (score - mean)/standard deviation

#research methodology #tests and measurements #z-score

Menghitung Varian, Standart Deviasi & Z Score

Untuk menentukan dasar penghitungan varian dan simpangan baku merupakan keinginan untuk mengetahui variasi dari kelompok data

https://informatikalogi.com/menghitung-varian-standart-deviasi-z-score/

#deviasi #standar #varian #z-score

The Z-score

The Z-Score

The Z-score also referred to as standardized raw scores is a useful statistic because not only permits to compute the probability (chances or likelihood) of raw score (occurring within normal distribution) but also it helps to compare two raw scores from different normal distributions. The Z-score is a dimensionless measure since it is derived by subtracting the population mean from…

View On WordPress

#standardization #Standardized score #Z-score

Altman Z-Score Plus is a Smartphone application by Business Compass LLC, USA, provides the timely assessments of credit risk and probability of default of companies based on corporate credit analysis.

#altmanzscore #zscore #z-score #edaltman #sribatsadas #businesscompass #sribatsa #defaultrisk #creditrisk #bondrating #creditrating #bankruptcies #ain #albany

Feature Scaling & Normalization

Introduction

Why Scaling Matters

Prevents one feature from dominating others

Improves gradient descent convergence

Ensures fair distance calculations in KNN, K-Means, SVM

Helps stabilise neural network training

Reduces numerical instability

Makes model behaviour more interpretable and reliable

Common Feature Scaling & Normalization Methods

1. Standardization (Z-Score Scaling)

Standardization transforms data so that each feature has a mean of 0 and a standard deviation of 1.

Formula: new_value = (value – mean) / standard_deviation

Use when:

Your data follows a normal distribution

You're using linear models, logistic regression, SVM, KNN, PCA, or neural networks

Why it’s useful: It centers the distribution and helps algorithms converge faster.

2. Min-Max Normalization

Rescales data into a fixed range, often 0 to 1.

Formula: new_value = (value – min) / (max – min)

Use when:

You need values strictly between 0 and 1

You use distance-based algorithms (KNN, K-Means)

Neural network models (especially those using sigmoid or tanh activation)

Important note: Sensitive to outliers—extreme values can compress everything else.

3. Robust Scaling

Reduces the effect of outliers by scaling based on the median and IQR (interquartile range).

Formula: new_value = (value – median) / IQR

Use when:

Your dataset contains extreme outliers

You want stable scaling without letting outliers dominate

4. Log Transform

Applies a logarithmic transformation to reduce skewness.

Use when:

The feature is right-skewed (e.g., income, transaction amounts)

You want to compress large ranges

You need a more normal-like distribution

Note: Can only be applied to positive values.

Which Algorithms Need Scaling?

Algorithms that require scaling:

Support Vector Machines (SVM)

K-Nearest Neighbours (KNN)

K-Means clustering

Logistic Regression

Linear Regression (better performance)

PCA (Principal Component Analysis)

Neural Networks (deep learning models)

These are sensitive because they rely on distance calculations or gradient descent.

Algorithms that do not need scaling:

Decision Trees

Random Forest

XGBoost, LightGBM, CatBoost

Naive Bayes

Rules-based algorithms

Tree models split on thresholds, so feature magnitude does not affect performance.

Common Mistakes to Avoid

Scaling before splitting into train/test (causes data leakage)

Scaling categorical data accidentally

Using Min-Max with heavy outliers

Applying log transform to zero or negative values

Scaling target variable unless specifically required for regression

Best Practices

Always fit the scaler only on training data

Use the same scaler to transform the test set

Use pipelines to automate scaling with model training

Combine scaling with imputation and encoding in a proper workflow

Closing Summary

#feature-scaling #normalization #z-score #min-max #robust-scaling #ml-preprocessing #data-cleaning #machine-learning #model-training #data-engineering

#z-score

Trending Tags

Recently Viewed Tags

#z-score