Creating New Features
🔍 Introduction
Creating new features is one of the most impactful steps in feature engineering. While algorithms learn patterns, features tell the model what patterns to look for. By transforming raw data into meaningful representations, we help machine-learning models uncover relationships that are not immediately obvious.
Feature creation goes far beyond simple preprocessing — it uses domain knowledge, mathematical transformations, and behavioural insights. In this episode, we explore methods like polynomial features, interaction terms, binning, datetime extraction, and rolling statistics, with practical examples from finance, e-commerce, and healthcare.
1. Polynomial Features
Polynomial features introduce power transformations that help models capture nonlinear relationships.
✔ What it does
Adds squared, cubic, or higher-degree versions of features
Adds interactions between features
Helps simple models (e.g., linear regression) learn complex curves
✔ Example
If you have a feature “age”, you can create: age², age³ — capturing nonlinear growth trends.
✔ Use cases
Finance: modelling compound growth effects
Healthcare: capturing nonlinear relationships between age and disease risk
Engineering: modelling stress vs. pressure curves
2. Interaction Features
Interaction terms represent how two or more features influence each other.
✔ What it does
Multiplies or combines two features
Highlights relationships not visible individually
✔ Example
price × number_of_items Shows how spending behaves at different price points.
✔ Use cases
E-commerce: modelling promotion × customer segment
Healthcare: medication dosage × weight
Finance: interest rate × loan amount
3. Binning (Discretization)
Converts continuous variables into grouped categories.
✔ Why it’s useful
Reduces noise
Highlights thresholds
Makes patterns more interpretable
✔ Example
Age → 0–18, 19–35, 36–60, 60+
✔ Use cases
Credit risk: income brackets
Marketing: customer age groups
Education: score bands
4. Datetime Feature Extraction
Datetime columns contain hidden features that can dramatically improve model performance.
✔ Extractable elements
Hour
Day
Day of week
Month
Quarter
Weekend/weekday
Season
Time since last event
✔ Use cases
Finance: identifying seasonality or high-volatility months
E-commerce: peak shopping hours, holiday spikes
Healthcare: hourly patient inflow patterns, flu season peaks
5. Rolling & Aggregation Features
Used heavily in time-series and behavioural modelling.
✔ What it does
Generates:
Rolling mean
Rolling sum
Rolling count
Exponential moving averages
Lag features (previous day/week/month values)
✔ Use cases
Finance: moving averages for stock price trends
E-commerce: previous 7-day purchase patterns
Healthcare: patient vital sign trends over time
6. Domain-Specific Feature Examples
Finance
Volatility over last 30 days
Transaction frequency
Ratio of credit used to credit limit
Time since last default
E-Commerce
Session duration
Number of items viewed
Discount percentage
Cart abandonment indicator
Click-through behaviour patterns
Healthcare
BMI (weight/height²)
Risk scores combining multiple vitals
Medication adherence ratio
Time since last appointment
Change in vital signs over time
7. When to Avoid Creating Too Many Features
Too many features may cause overfitting
Polynomial features can explode dimensionality
Unsupervised feature creation without domain understanding may create noise
Highly correlated new features may reduce model stability
8. Best Practices for Feature Creation
Start simple — do not create hundreds of features at once
Use domain knowledge wherever possible
Validate new features with cross-validation
Keep track of transformations in pipelines
Remove features that do not improve performance
Avoid data leakage (especially with rolling features)














