Integrating Non-Linear Variables into Linear Models

28th February 2024

As a consultant, my engagements often lead me into intriguing discussions that uncover hidden facets of business strategies. In a recent conversation with a client, we touched upon the topic of vendor evaluation. The client mentioned tracking quartile percentiles of vendors on a unique scale they had devised, only to realize that their linear models weren’t capturing the dynamics they hoped for.

Upon further exploration, I discovered a crucial oversight—the client’s linear models were not incorporating the rate of change of vendor rankings. In essence, amongst vendors within acceptable percentile ranges, their models failed to downgrade those vendors whose quality was declining, emphasising a limitation inherent in linear regression.

Linear Models and Non-Linear Dynamics

Linear models assume a straightforward relationship between input variables and outcomes. However, the real world often exhibits more complex, non-linear patterns that linear models struggle to capture. This limitation becomes apparent when dealing with scenarios where the relationship between variables is not strictly proportional. In our context, a linear model might overlook vendors experiencing a rapid change in quality.

Beyond Machine Learning for Causative Understanding

Machine learning (ML) models do come to the rescue by offering the ability to capture non-linear relationships and intricate patterns. Unlike linear regression, ML algorithms can adapt to the complexity of real-world scenarios, providing predictions based on more accurate representation of the underlying dynamics in vendor evaluation. However, in various industries, identifying the impact of rates of change is vital for a comprehensive understanding of dynamic systems, something that ML models do not explicitly uncover.

Incorporating Non-Linear Variables into Linear Models

The linear model formula, $Y = \beta_0 + \beta_1X + \beta_2\left(\frac{dX}{dt}\right) + \epsilon$ integrates the rate of change into a linear regression framework.

There are various scenarios where both a variable $X$ and its rate of change $\frac{dX}{dt}$ could be relevant in a linear regression model. Here are a few examples:

Sales and Growth Rate: A linear regression model could examine how both the current sales and the rate of change in sales impact future outcomes, such as profits or customer satisfaction.
- $X$ : Monthly sales for a product.
- $\frac{dX}{dt}$ : The month-to-month growth rate of sales.

Investment and Return Rate: Understanding how the size of an investment and its rate of return jointly influences financial outcomes could be a valuable analysis in finance.
- $X$ : Investment amount in a financial portfolio.
- $\frac{dX}{dt}$ : The rate of return on the investment.

Inventory and Inventory Turnover: This could help businesses optimize their inventory management by considering both current inventory levels and the rate at which items are moving.
- $X$ : Inventory levels at the end of each month.
- $\frac{dX}{dt}$ : The rate at which inventory turnover is changing.

Advertising Spending and Effectiveness: Analyzing how both the amount spent on advertising and the rate of change in spending impact sales or brand awareness could be useful for marketing strategies.
- $X$ : Monthly advertising spending.
- $\frac{dX}{dt}$ : The rate of change in advertising spending.

Employee Count and Hiring Rate: Businesses may want to understand how the current workforce size and the hiring rate jointly affect productivity or company performance.
- $X$ : Total number of employees.
- $\frac{dX}{dt}$ : The rate at which new employees are hired.

Associated Risks

While adding derivatives can enhance a linear regression model’s ability to capture non-linear relationships, it poses risks. Overfitting is a concern, where the model may become too tailored to training data, capturing noise over true patterns. The increased complexity may reduce interpretability, making it challenging to identify genuine drivers. Additionally, addressing multicollinearity is crucial, as derivatives introduce new predictors that may correlate with existing ones. As a leading machine learning consultancy in Dubai, we know how introducing derivatives based on business domain knowledge or theoretical underpinning ensures that the potential risks are justified by the anticipated rewards, offering a more informed approach to model enhancement.

In Conclusion

The conversation with my client unveiled a critical aspect of addressing non-linearity within linear models, highlighting the significance of not just measuring variables but also measuring the impact of variable dynamics. While Machine Learning offers flexibility, understanding the theoretical underpinnings behind business dynamics helps strike a balance between model prediction and model interpretability.

LLMs Replicate, They Do Not Reason

Machine Learning, Artificial Intelligence

1 week ago

The core insight from the paper “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models” (link) is that large language models (LLMs) replicate patterns from training data rather…

How Marketways Arabia Aligns with the Nation’s Vision for Responsible AI

Artificial Intelligence, Machine Learning

2 weeks ago

On October 11, 2024, the UAE Cabinet, under the leadership of His Highness Sheikh Mohammed bin Rashid Al Maktoum, took a monumental step by approving the country’s official stance on…

The Need for Explainable and Accountable Models in AI

Econometrics, Artificial Intelligence, Machine Learning

1 month ago

Artificial intelligence (AI) and machine learning (ML) are transforming industries at an unprecedented rate. From healthcare to finance, the deployment of AI models has brought significant advantages in automating decisions…

Transparency in AI: It’s Not About Showing the Code

Artificial Intelligence, Machine Learning

2 months ago

When discussing AI transparency, one of the most common misconceptions is that it simply involves revealing the algorithm’s source code. While open-source AI systems certainly have their merits, true transparency…

Addressing Reverse Causality in Data Science Consultancy Projects

Econometrics, Machine Learning

3 months ago

In data analysis, understanding the direction of causality between variables is crucial for making informed decisions. For example, does poor sleep lead to depression or does depression lead to poor…

Controlling for Unobserved Heterogeneity in Statistical Modelling

Econometrics

5 months ago

A Key to Accurate Insights In the realm of statistical modelling, controlling for unobserved heterogeneity is crucial. Unobserved heterogeneity refers to factors that influence the outcome variable but are not…