What did the z distribution say to the t distribution?
"You may look like me, but you're not normal."
seen from Saudi Arabia

seen from Germany
seen from Türkiye
seen from Türkiye
seen from Türkiye
seen from United States

seen from India

seen from United Kingdom
seen from United States
seen from United States
seen from China
seen from Türkiye
seen from United States

seen from Ecuador

seen from United Kingdom

seen from United States
seen from Italy

seen from United States

seen from Australia
seen from United States
What did the z distribution say to the t distribution?
"You may look like me, but you're not normal."
The z-score in statistics
Okay, time to get back to statistics, if only for today! P-value, z-score, f-statistic, there are a lot of ways to get information about the sample of data you have. Of course, they all tell you something slightly different about the data and that information is useful when you know what the heck it is even trying to tell you. For that reason we’re diving into the z-score, it’s actually one of…
View On WordPress
New Oc, his name is 'Conception Z-score', the middle brothers of the group.
Feature Scaling & Normalization
Introduction
Feature scaling and normalization are essential steps in machine learning because most algorithms rely on numerical stability and distance-based calculations. When features are on vastly different scales—such as age (0–100) and income (0–100,000)—the model may unintentionally give more importance to the larger-scaled variable. Scaling ensures that all features contribute equally, improves optimisation speed, and prevents distorted model behaviour.
Some algorithms are highly sensitive to feature magnitude—like SVM, KNN, and neural networks—while others, such as tree-based models, remain unaffected. Understanding when and how to scale is a key skill in feature engineering.
Why Scaling Matters
Prevents one feature from dominating others
Improves gradient descent convergence
Ensures fair distance calculations in KNN, K-Means, SVM
Helps stabilise neural network training
Reduces numerical instability
Makes model behaviour more interpretable and reliable
Common Feature Scaling & Normalization Methods
1. Standardization (Z-Score Scaling)
Standardization transforms data so that each feature has a mean of 0 and a standard deviation of 1.
Formula: new_value = (value – mean) / standard_deviation
Use when:
Your data follows a normal distribution
You're using linear models, logistic regression, SVM, KNN, PCA, or neural networks
Why it’s useful: It centers the distribution and helps algorithms converge faster.
2. Min-Max Normalization
Rescales data into a fixed range, often 0 to 1.
Formula: new_value = (value – min) / (max – min)
Use when:
You need values strictly between 0 and 1
You use distance-based algorithms (KNN, K-Means)
Neural network models (especially those using sigmoid or tanh activation)
Important note: Sensitive to outliers—extreme values can compress everything else.
3. Robust Scaling
Reduces the effect of outliers by scaling based on the median and IQR (interquartile range).
Formula: new_value = (value – median) / IQR
Use when:
Your dataset contains extreme outliers
You want stable scaling without letting outliers dominate
4. Log Transform
Applies a logarithmic transformation to reduce skewness.
Use when:
The feature is right-skewed (e.g., income, transaction amounts)
You want to compress large ranges
You need a more normal-like distribution
Note: Can only be applied to positive values.
Which Algorithms Need Scaling?
Algorithms that require scaling:
Support Vector Machines (SVM)
K-Nearest Neighbours (KNN)
K-Means clustering
Logistic Regression
Linear Regression (better performance)
PCA (Principal Component Analysis)
Neural Networks (deep learning models)
These are sensitive because they rely on distance calculations or gradient descent.
Algorithms that do not need scaling:
Decision Trees
Random Forest
XGBoost, LightGBM, CatBoost
Naive Bayes
Rules-based algorithms
Tree models split on thresholds, so feature magnitude does not affect performance.
Common Mistakes to Avoid
Scaling before splitting into train/test (causes data leakage)
Scaling categorical data accidentally
Using Min-Max with heavy outliers
Applying log transform to zero or negative values
Scaling target variable unless specifically required for regression
Best Practices
Always fit the scaler only on training data
Use the same scaler to transform the test set
Use pipelines to automate scaling with model training
Combine scaling with imputation and encoding in a proper workflow
Closing Summary
Feature scaling is an essential preprocessing step that directly influences model accuracy, stability, and training efficiency. While not all algorithms require scaling, understanding which methods to apply—and when—is critical for producing robust machine-learning models. This episode equips you with the foundational techniques to scale features correctly and avoid common pitfalls, setting the stage for deeper feature engineering strategies in the coming episodes.
Z- score = (score - mean)/standard deviation
Menghitung Varian, Standart Deviasi & Z Score
Untuk menentukan dasar penghitungan varian dan simpangan baku merupakan keinginan untuk mengetahui variasi dari kelompok data
https://informatikalogi.com/menghitung-varian-standart-deviasi-z-score/
The Z-score
The Z-Score
The Z-score also referred to as standardized raw scores is a useful statistic because not only permits to compute the probability (chances or likelihood) of raw score (occurring within normal distribution) but also it helps to compare two raw scores from different normal distributions. The Z-score is a dimensionless measure since it is derived by subtracting the population mean from…
View On WordPress
Altman Z-Score Plus is a Smartphone application by Business Compass LLC, USA, provides the timely assessments of credit risk and probability of default of companies based on corporate credit analysis.