• About us
    About us
    question mark
    Who we are

    Learn more about Mantu values, governance and offices.

    hexagon
    Our brands

    11 brands united by a shared vision.

    leave
    Sustainability

    Our strategy through diversity, environment and innovation.

    bookshelf
    Pressroom

    Breakthroughs, partnerships, and voices behind the transformation.

  • What we do
    What we do
    mantu
    PRACTICES

    Four practices designed to empower organizations, connect talent, and shape sustainable growth.

    cpu
    Technology

    Deep industry knowledge & cutting edge technology to co-create meaningful solutions.

    handshake
    Total Talent Management

    Tech to boost talent and create strong links between companies and the minds they need.

    digital qr
    Creative Intelligence

    Ensure continuity between decision, activation, and adoption. One team, one trajectory, through to lasting impact.

    medal
    Leadership & Advocacy

    Equip executive teams to define their purpose, shape their positioning and drive their strategy.

  • Insights
    Insights
    book open 4
    Blog

    Bold thinking. Fresh perspectives.

    book check
    Client Stories

    Where audacious ideas turn into real stories.

    mantu best managed companies award
    Mantu awarded one of Switzerland’s Best Managed Companies 2025 by Deloitte

    This award highlights the exceptional performance of privately held Swiss companies that demonstrate excellence in strategy, governance, innovation, and long-term results.

    Read more
    WeMeet 2025-2772 1 1
    Mantu signs the DEI Charter

    At the beginning of July 2025, Mantu’s Executive Committee signed the DEI Charter to foster diversity, equity, and inclusion at Mantu.

    Read more
  • Careers
    Careers
    binoculars
    Life at Mantu

    Mantu, as seen by its team members.

    building
    Find a company

    Mantu brings together complementary brands that cover many sectors, all around the world.

machine learning for forecasting thumbnail blog mantu

Machine learning for forecasting: regression, time series, and ensemble methods explained

This article demystifies the three primary model families used in business forecasting regression, time series, and ensemble methods explaining what each does, when it works, and when to look elsewhere.


Why model selection matters as much as model quality


A highly optimized model of the wrong type will still produce poor forecasts. Each model family makes implicit assumptions about the structure of the data: whether the relationship between variables is linear or nonlinear, whether the order of observations in time carries signal, whether variance is constant or evolving. When those assumptions are violated because the real-world data does not conform to them the model's output will be systematically biased in ways that tuning hyperparameters cannot fix.

The goal of model selection is not to find the "best" model in the abstract. It is to find the model whose assumptions are closest to the structure of the problem at hand and whose failure modes are least costly given the decisions it will inform. That requires understanding what each family of forecasting algorithms is actually doing, not just what it outputs.

Regression analysis forecasting: the baseline method


Regression analysis forecasting models the relationship between a target variable the quantity to be forecasted and one or more predictor variables, using historical data to estimate the parameters of that relationship. It answers the question: given known values of X, what is the expected value of Y?

Regression models

Estimate a functional relationship between input features and a continuous output. Linear regression assumes a linear relationship; polynomial and nonlinear variants relax that constraint. Regularized variants (Ridge, Lasso) add penalty terms to prevent overfitting on high-dimensional feature sets.

BEST SUITED FOR

LIMITATIONS

• Demand driven by identifiable causal factors (price, promotions, seasonality as features)
• Cross-sectional predictions across entities
• Contexts where interpretability is required

• Assumes independence of observations breaks on sequential data
• Linear variants miss nonlinear dynamics
• Requires meaningful, clean predictor variables

Regression is the right starting point when you have a clear theory about what drives the target variable when the forecast is a function of explainable inputs rather than purely of past values. It is also the most interpretable family, which matters in regulated industries or in contexts where business stakeholders need to understand and validate the model's logic.

Time series forecasting: when sequence is signal


Time series forecasting treats the temporal structure of data as informative in itself. Rather than modeling a relationship between predictor variables and an outcome, time series models learn patterns in the sequence of past values trends, seasonality, cycles, and autocorrelation and project those patterns forward.


Time series models

Classical approaches (ARIMA, exponential smoothing) capture linear temporal dependencies and decompose series into trend and seasonal components. Modern ML-based approaches (Prophet, LSTM networks, Temporal Fusion Transformers) handle nonlinear dynamics, multiple seasonalities, and complex long-range dependencies.

BEST SUITED FOR

LIMITATIONS

• Univariate forecasting where history is the primary predictor
• Strong seasonal or cyclical patterns
• High-frequency operational forecasting (hourly, daily)

• Assumes future patterns resemble the past fragile at structural breaks
• Classical methods struggle with multiple interacting seasonalities
• Deep learning variants require large data volumes

Classical vs neural time series approaches

The choice between classical statistical methods and neural network-based time series models is not purely a performance question it is also a data volume and interpretability question. ARIMA and exponential smoothing models are highly interpretable, computationally efficient, and effective when data volumes are modest and the series is relatively stationary. LSTM networks and transformer-based architectures can capture far more complex temporal patterns, but require substantially more data to train reliably and are considerably harder to explain to non-technical stakeholders.

In most enterprise forecasting contexts, starting with classical methods and moving to neural approaches only where they demonstrably outperform and where the data volume justifies the complexity is the more defensible engineering strategy.

Ensemble methods in machine learning: combining models for robustness


Ensemble methods machine learning aggregate the predictions of multiple models to produce an output that is more stable and accurate than any individual model alone. The core insight is that different models make different errors and when their errors are uncorrelated, combining them reduces overall variance.

Ensemble methods

Bagging (e.g. Random Forests) trains multiple models on bootstrapped data subsets and averages their outputs. Boosting (e.g. XGBoost, LightGBM) trains models sequentially, each correcting the errors of the last. Stacking combines heterogeneous model types via a meta-learner. In practice, gradient boosting ensembles are among the most consistently high-performing methods across tabular forecasting tasks.

BEST SUITED FOR

LIMITATIONS

• High-dimensional feature spaces with complex interactions
• Tabular data forecasting at moderate to large scale
• Situations where prediction accuracy outweighs interpretability

• Reduced interpretability vs single models
• Computationally intensive for very large datasets
• Boosting can overfit if not carefully regularized

Ensemble methods are frequently the practical choice for business forecasting problems where both structured features and historical patterns are available where neither pure regression nor pure time series modeling captures the full signal. Gradient boosting variants in particular have become a default starting point for many ML engineering teams tackling tabular forecasting, given their strong out-of-the-box performance and mature tooling.

Selecting, implementing, and validating these approaches at production scale is precisely the kind of work Mantu's machine learning consulting teams are built for from model selection through to MLOps deployment and ongoing performance monitoring.

Choosing the right forecasting algorithm for your context


There is no universally superior forecasting algorithm. The right choice depends on the structure of the data, the business context, interpretability requirements, and the maturity of the ML infrastructure available. The following decision framework is a starting point:

Context

Primary signal

Recommended family

Forecast driven by known causal factors (price, weather, promotions)

Feature relationships

Regression

Strong seasonality, univariate series, high-frequency data

Temporal patterns

Time series

Complex interactions, high-dimensional features, accuracy-first

Mixed / nonlinear

Ensemble

Regulated context, stakeholder explainability required

Any

Regression or classical Time series

Large-scale production forecasting across many series

Both features and history

Ensemble + Time series hybrid

In practice, the most robust production forecasting systems combine model families using time series decomposition to handle seasonality, regression to incorporate causal features, and ensemble methods to capture residual complexity. Building and validating those hybrid architectures, and maintaining them as data distributions shift over time, is the engineering challenge at the core of machine learning for forecasting.

Mantu's machine learning consulting expertise supports data and engineering teams at every stage of this work from exploratory model benchmarking through to scalable deployment and monitoring.