1. Introduction

Single-indicator trading strategies are easy to construct but fragile in practice: each indicator has regimes where it fails, and a strategy that relies on one signal alone will suffer extended drawdowns during those regimes. The natural remedy is to combine multiple signals, hoping that their errors are at least partially independent. This is the ensemble approach, borrowed from statistical learning, applied to CTA signal construction.

This article develops the mathematical framework for multi-signal combination, focusing on two canonical mechanisms—linear weighting and majority voting—and analyzes the conditions under which adding signals improves rather than degrades performance.

2. Mathematical Framework

2.1 Signal Definition

Let S={s1,s2,,sN}\mathcal{S} = \{s_1, s_2, \ldots, s_N\} be a set of NN signals. Each signal sis_i produces a position recommendation at time tt:

si(t){1,0,+1} s_i(t) \in \{-1, 0, +1\}

where 1-1 denotes short, 00 neutral, and +1+1 long. The choice of a discrete output space reflects the typical CTA setup where position sizing is standardized rather than continuous.

2.2 Linear Weighted Combination

The most straightforward combination rule is a linear weighted sum:

Ptarget(t)=i=1Nwisi(t) P_{\text{target}}(t) = \sum_{i=1}^{N} w_i \cdot s_i(t)

where wiRw_i \in \mathbb{R} is the weight assigned to signal ii. In the unweighted case, wi=1w_i = 1 for all ii, and the target position is simply the sum of votes.

The resulting target position Ptarget(t)P_{\text{target}}(t) can take integer values in [N,N][-N, N], which naturally provides a conviction scale: a target of ±N\pm N indicates unanimous agreement, while a target near zero indicates disagreement among signals.

1
2
3
4
5
6
7
# Pseudocode: Linear weighted combination

function compute_target_position(signals, weights):
    target = 0
    for i in range(len(signals)):
        target += weights[i] * signals[i].get_position()
    return target

2.3 Majority Voting

An alternative is majority voting, which produces a discrete output:

Ptarget(t)=sign(i=1Nwisi(t)) P_{\text{target}}(t) = \text{sign}\left(\sum_{i=1}^{N} w_i \cdot s_i(t)\right)

where sign()\text{sign}(\cdot) maps to {1,0,+1}\{-1, 0, +1\}. This discards conviction information but produces a simpler, more interpretable signal.

The choice between linear weighting and voting involves a trade-off:

PropertyLinear WeightingMajority Voting
Output spaceContinuous / integerDiscrete {1,0,+1}\{-1, 0, +1\}
Conviction preservationYesNo
Sensitivity to outlier signalsHigherLower
Position sizingVariableFixed

3. Three Canonical Signals

Consider a concrete ensemble of three signals: RSI, CCI, and Moving Average crossover.

3.1 RSI Signal

The Relative Strength Index over a window of nn periods is:

RSIt=1001001+EMA(Δt+,n)EMA(Δt,n) \text{RSI}_t = 100 - \frac{100}{1 + \frac{\text{EMA}(\Delta^+_t, n)}{\text{EMA}(\Delta^-_t, n)}}

where Δt+=max(ptpt1,0)\Delta^+_t = \max(p_t - p_{t-1}, 0) and Δt=max(pt1pt,0)\Delta^-_t = \max(p_{t-1} - p_t, 0).

The signal rule:

sRSI(t)={+1if RSIt>θhigh (overbought → trend strength)1if RSIt<θlow (oversold → trend strength)0otherwise s_{\text{RSI}}(t) = \begin{cases} +1 & \text{if } \text{RSI}_t > \theta_{\text{high}} \text{ (overbought → trend strength)} \\ -1 & \text{if } \text{RSI}_t < \theta_{\text{low}} \text{ (oversold → trend strength)} \\ 0 & \text{otherwise} \end{cases}

Note: In a trend-following context, extreme RSI values confirm trend strength rather than signal reversal.

3.2 CCI Signal

The Commodity Channel Index measures deviation from the statistical mean:

CCIt=TPtSMA(TP,n)0.015MAD(TP,n) \text{CCI}_t = \frac{\text{TP}_t - \text{SMA}(\text{TP}, n)}{0.015 \cdot \text{MAD}(\text{TP}, n)}

where TPt=(Ht+Lt+Ct)/3\text{TP}_t = (H_t + L_t + C_t) / 3 is the typical price, and MAD is the mean absolute deviation.

The signal rule:

sCCI(t)={+1if CCIt>+1001if CCIt<1000otherwise s_{\text{CCI}}(t) = \begin{cases} +1 & \text{if } \text{CCI}_t > +100 \\ -1 & \text{if } \text{CCI}_t < -100 \\ 0 & \text{otherwise} \end{cases}

3.3 MA Crossover Signal

Given a fast moving average MAf\text{MA}_f and a slow moving average MAs\text{MA}_s (with f<sf < s):

sMA(t)={+1if MAf(t)>MAs(t)(golden cross)1if MAf(t)<MAs(t)(death cross) s_{\text{MA}}(t) = \begin{cases} +1 & \text{if } \text{MA}_f(t) > \text{MA}_s(t) \quad \text{(golden cross)} \\ -1 & \text{if } \text{MA}_f(t) < \text{MA}_s(t) \quad \text{(death cross)} \end{cases}

Unlike RSI and CCI, the MA signal is always active (never zero), which means it always contributes a vote.

3.4 Combined Signal

With equal weights wi=1w_i = 1, the target position is:

Ptarget(t)=sRSI(t)+sCCI(t)+sMA(t) P_{\text{target}}(t) = s_{\text{RSI}}(t) + s_{\text{CCI}}(t) + s_{\text{MA}}(t)

The following table illustrates the combinatorial logic:

RSICCIMAPtargetP_{\text{target}}Interpretation
+1+1+1+3Strong long consensus
+10+1+2Moderate long
+10-10Conflicting signals, neutral
-1-1+1-1Weak short
-1-1-1-3Strong short consensus

4. Inter-Signal Correlation

The benefit of ensemble combination depends critically on the correlation structure among signals. If all signals are highly correlated, the ensemble provides no diversification benefit.

4.1 Pairwise Correlation

Define the pairwise correlation between signals sis_i and sjs_j as:

ρij=E[sisj]E[si]E[sj]Var(si)Var(sj) \rho_{ij} = \frac{E[s_i \cdot s_j] - E[s_i] \cdot E[s_j]}{\sqrt{\text{Var}(s_i) \cdot \text{Var}(s_j)}}

For the three-signal system, the correlation matrix is:

R=(1ρRSI,CCIρRSI,MAρRSI,CCI1ρCCI,MAρRSI,MAρCCI,MA1) \mathbf{R} = \begin{pmatrix} 1 & \rho_{\text{RSI,CCI}} & \rho_{\text{RSI,MA}} \\ \rho_{\text{RSI,CCI}} & 1 & \rho_{\text{CCI,MA}} \\ \rho_{\text{RSI,MA}} & \rho_{\text{CCI,MA}} & 1 \end{pmatrix}

Since all three signals are derived from the same price series, they will tend to be positively correlated. The ensemble benefit is maximized when ρij\rho_{ij} is low—in the extreme case of ρij=0\rho_{ij} = 0, the variance of the combined signal is reduced by a factor of NN.

4.2 Variance Reduction

For unweighted linear combination with NN signals having equal individual variance σ2\sigma^2 and uniform pairwise correlation ρ\rho, the variance of the combined signal is:

Var(Ptarget)=Nσ2+N(N1)ρσ2=Nσ2(1+(N1)ρ) \text{Var}(P_{\text{target}}) = N\sigma^2 + N(N-1)\rho\sigma^2 = N\sigma^2(1 + (N-1)\rho)

The variance reduction ratio relative to a single signal is:

Var(Ptarget)Nσ2=1+(N1)ρ \frac{\text{Var}(P_{\text{target}})}{N \cdot \sigma^2} = 1 + (N-1)\rho

When ρ=0\rho = 0, the ratio is 11 (full diversification). When ρ=1\rho = 1, the ratio is NN (no diversification—adding signals merely amplifies). For typical CTA signals derived from the same instrument, ρ\rho often ranges from 0.30.3 to 0.70.7, meaning the diversification benefit is real but moderate.

4.3 Signal Selection Principles

Given the correlation constraint, signal selection should follow:

  1. Complementarity: Prefer signals that capture different market features (e.g., trend strength vs. mean reversion) rather than different parameterizations of the same feature.
  2. Minimum redundancy: Among candidate signals, remove those with ρ>0.8\rho > 0.8 with any existing signal in the ensemble.
  3. Incremental value: Each added signal should reduce the combined Sharpe ratio’s variance by a meaningful amount, which can be tested via bootstrap.

5. Overfitting Risk

Adding more signals introduces more parameters, and more parameters increase overfitting risk. This section quantifies the trade-off.

5.1 Parameter Count

For a three-signal ensemble, the free parameters include:

ComponentParameters
RSIWindow nn, thresholds θhigh,θlow\theta_{\text{high}}, \theta_{\text{low}}
CCIWindow nn, thresholds ±100\pm 100
MAFast window ff, slow window ss
Weightsw1,w2,w3w_1, w_2, w_3

This yields approximately 9–10 free parameters. With NN signals, the parameter count grows as O(N)O(N), and the risk of finding a parameter combination that fits noise increases.

5.2 The Curse of In-Sample Optimization

The in-sample Sharpe ratio S^IS\hat{S}_{\text{IS}} is an upwardly biased estimator of the true Sharpe ratio SS. The bias grows with the number of parameters kk and shrinks with the sample size TT:

E[S^IS]S+kTσS E[\hat{S}_{\text{IS}}] \approx S + \sqrt{\frac{k}{T}} \cdot \sigma_S

where σS\sigma_S is the standard deviation of the Sharpe estimator. For the three-signal ensemble with k10k \approx 10 and a backtest of T=500T = 500 trading days, the inflation is approximately 10/5000.14\sqrt{10/500} \approx 0.14 Sharpe points—not negligible.

5.3 Mitigation Strategies

  • Walk-forward validation: Reserve out-of-sample data that is never touched during parameter selection.
  • Parameter stability check: Small perturbations to parameters should not dramatically change results; if they do, the solution is likely an artifact of noise.
  • Information criterion: Prefer parsimonious models. Adding a signal is only justified if it reduces AIC/BIC, not merely because it increases the in-sample Sharpe ratio.
  • Economic rationale: Every signal in the ensemble should have a clear economic rationale for why it captures a distinct market inefficiency.

6. Summary

Multi-signal ensemble strategies offer a principled way to combine weak signals into a stronger composite signal. The key insights are:

  • Linear combination preserves conviction information and allows flexible position sizing; majority voting is more robust to outlier signals but discards intensity.
  • The diversification benefit depends on inter-signal correlation; for typical CTA signals derived from the same instrument, the benefit is moderate.
  • Overfitting risk grows with the number of signals and their parameters; rigorous out-of-sample validation is essential.
  • Signal selection should prioritize complementarity and economic rationale over mere in-sample performance.

The ensemble approach does not guarantee profitability, but it does provide a framework for constructing more robust and interpretable trading signals than any single indicator alone.

References

  1. Brillouin, L. (1962). Science and Information Theory, 2nd ed. Academic Press.
  2. Dietterich, T. G. (2000). “Ensemble Methods in Machine Learning.” Multiple Classifier Systems, LNCS 1857, pp. 1–15.
  3. Harvey, C. R. and Liu, Y. (2015). “Backtesting.” Journal of Portfolio Management, 42(1), pp. 13–28.
  4. Bailey, D. H., Borwein, J. M., Lopez de Prado, M., and Zhu, Q. J. (2017). “The Probability of Backtest Overfitting.” Journal of Computational Finance, 20(4), pp. 39–69.
  5. Aronson, D. R. (2007). Evidence-Based Technical Analysis. Wiley.