Creating new features is one of the most impactful steps in feature engineering. While algorithms learn patterns, features tell the model what patterns to look for. By transforming raw data into meaningful representations, we help machine-learning models uncover relationships that are not immediately obvious.
Feature creation goes far beyond simple preprocessing — it uses domain knowledge, mathematical transformations, and behavioural insights. In this episode, we explore methods like polynomial features, interaction terms, binning, datetime extraction, and rolling statistics, with practical examples from finance, e-commerce, and healthcare.
Polynomial features introduce power transformations that help models capture nonlinear relationships.
Adds squared, cubic, or higher-degree versions of features
Adds interactions between features
Helps simple models (e.g., linear regression) learn complex curves
If you have a feature “age”, you can create:
age², age³ — capturing nonlinear growth trends.
Finance: modelling compound growth effects
Healthcare: capturing nonlinear relationships between age and disease risk
Engineering: modelling stress vs. pressure curves
Interaction terms represent how two or more features influence each other.
Multiplies or combines two features
Highlights relationships not visible individually
price × number_of_items
Shows how spending behaves at different price points.
E-commerce: modelling promotion × customer segment
Healthcare: medication dosage × weight
Finance: interest rate × loan amount
3. Binning (Discretization)
Converts continuous variables into grouped categories.
Makes patterns more interpretable
Age → 0–18, 19–35, 36–60, 60+
Credit risk: income brackets
Marketing: customer age groups
4. Datetime Feature Extraction
Datetime columns contain hidden features that can dramatically improve model performance.
Finance: identifying seasonality or high-volatility months
E-commerce: peak shopping hours, holiday spikes
Healthcare: hourly patient inflow patterns, flu season peaks
5. Rolling & Aggregation Features
Used heavily in time-series and behavioural modelling.
Exponential moving averages
Lag features (previous day/week/month values)
Finance: moving averages for stock price trends
E-commerce: previous 7-day purchase patterns
Healthcare: patient vital sign trends over time
6. Domain-Specific Feature Examples
Volatility over last 30 days
Ratio of credit used to credit limit
Cart abandonment indicator
Click-through behaviour patterns
Risk scores combining multiple vitals
Medication adherence ratio
Time since last appointment
Change in vital signs over time
7. When to Avoid Creating Too Many Features
Too many features may cause overfitting
Polynomial features can explode dimensionality
Unsupervised feature creation without domain understanding may create noise
Highly correlated new features may reduce model stability
8. Best Practices for Feature Creation
Start simple — do not create hundreds of features at once
Use domain knowledge wherever possible
Validate new features with cross-validation
Keep track of transformations in pipelines
Remove features that do not improve performance
Avoid data leakage (especially with rolling features)