Introduction

⊕The Nairobi Securities Exchange (NSE) is the principal securities exchange of Kenya

Most investors base their investment decisions on two metrics: mean(return), and variance(risk)11 The use of variance and volatility as a measure of portfolio risk, has a long history. For instance, see Fisher (1906) and Tobin (1958), where some choose to invest in portfolios giving the highest return for a given level of risk, while others invest in portfolios giving the lowest risk, for a given level of return.

Markowitz (1952), was the first to formalize portfolio risk and asset selection, in what is known as Markowitz Portfolio Theory (MPT). MPT states that the portfolio’s return is given by the weighted average of the expected returns for the individual assets in the portfolio and the portfolio’s risk is equal to the variance of the portfolio returns which is a function of the variances and covariances between securities and their weights in the portfolio.

This is shown below:

\[\mu_p = \sum_{i=1}^n w_i\mu_i\] \[\sigma^2_p = \sum_{i=1}^n w_iw_j*\sigma_{ij}\] Where:

\(\mu_p, \sigma^2_p\): The portfolio’s expected return and variance respectively.

\(n\): The number of assets in the portfolio.

\(\mu_i\): the expected returns for asset \(i\).

\(w_i, w_j\): The weight invested in assets \(i\) and \(j\) respectively.

\(\sigma^2_{ij}\): The covariance between asset \(i\) and \(j\). If \(i = j\), then \(\sigma^2_{ij}\) is the variance of the \(i^{th}\) asset in the portfolio.

The efficient frontier and efficient portfolios

The efficient frontier is a concept used in modern portfolio theory that helps investors to identify the optimal portfolio that maximizes returns for a given level of risk. The theory assumes that investors are risk-averse and will choose the portfolio with the highest expected return for a given level of risk. It is usually represented as a graph that plots the expected return and standard deviation22 as a measure of risk of all possible portfolios that can be constructed from a set of assets.

The portfolios on the efficient frontier are considered to be efficient because they provide the highest possible expected return for a given level of risk. Thus, investors can choose any portfolio on the efficient frontier that suits their level of risk tolerance. An important aspect to note from the efficient frontier is: investors can only increase their expected return by taking on more risk.

Tangency portfolio

A tangency portfolio, is a portfolio that maximizes the risk-adjusted return for a given level of risk aversion. Specifically, a tangency portfolio lies on the efficient frontier, and it is tangent to the capital market line33 Simple a straight line representing the risk/reward profiles of different combinations of a risky portfolio with a riskless asset. It is widely used in portfolio management to construct investment portfolios that offer the highest expected return for a given level of risk or the lowest level of risk for a given expected return.

Despite the emergence of alternative portfolio construction techniques, such as risk parity, tangency portfolios remain relevant in investment management due to their ability to provide an optimal balance between risk and return. They are a key component of modern portfolio theory and provide a useful reference point for evaluating portfolio performance. They are constructed based on the efficient frontier - as such, they represent the most efficient way to achieve a particular level of expected return, making them an important tool for portfolio optimization.

The optimization problem for the tangency portfolio can be expressed follows:

Suppose we have a universe of \(n\) assets with returns \(\mathbf{\mu}\) and covariance matrix \(\mathbf{\Sigma}\). We want to construct a portfolio with weights \(\mathbf{w}\) that maximizes the Sharpe ratio subject to the constraint that all weights are non-negative, and sum to one:

\[max_w \hspace{2 mm} h(w) = \frac{\mu^Tw - r_f}{w^T \Sigma w}\] \[\text{subject to} \quad \mathbf{w} \geq \mathbf{0}, \hspace{2 mm} \sum_{i=1}^n w_i = 1\] The quantity \(h(w)\) is precisely the Sharpe ratio.

Solving for \(w\), gives:

\[\mathbf{w} = \frac{\mathbf{\Sigma}^{-1} (\mathbf{\mu} - r_f \mathbf{1})}{\mathbf{\mathbf{1^T} \Sigma}^{-1}(\mathbf{\mu} - r_f \mathbf{1})} \]

From the solution above, the covariance matrix and mean vector are crucial in the solution for the tangency portfolio. The mean vector represents the expected returns for each asset in the portfolio, while the covariance matrix represents the co-movements between the returns of each asset in the portfolio.

The optimal portfolio is found by balancing the expected returns and risks of the individual assets in the portfolio, which is done using the mean vector and covariance matrix. In the next section, we explore several ways of estimating the expected returns and covariance matrix for the assets in the portfolio

Covariance matrix estimation

In this section, we review covariance matrix estimation methods present in literature, giving their formulations.

The sample covariance matrix (SAMPLE)

The sample covariance matrix is the most commonly used estimator. It is calculated from the historical returns of the assets in the portfolio, and makes a key assumption that the historical returns of the assets are representative of their future returns. This assumption may be violated, especially in rapidly changing markets.

The sample location estimate for a portfolio of \(n\) observations and \(p\) assets is given by:

\[ E(r_p) = \sum_{i=1}^{n} w_i E(r_i)\] The sample covariance matrix is given by:

\[\begin{equation} \hat{\Sigma} = \frac{1}{n-1} \sum_{i=1}^{n} (\mathbf{x_i} - \boldsymbol{\bar{x}})(\mathbf{x_i} - \boldsymbol{\bar{x}})^T, \end{equation}\]

where:

\(E(r_p)\) is the expected return of the portfolio

\(E(r_i)\) is the expected return of the \(i\)th asset

\(\mathbf{x_i}\) is the \(i\)-th observation vector,

\(\boldsymbol{\bar{x}}\) is the sample mean vector, and the superscript \(T\) denotes the matrix transpose operation.

Note that \(\hat{\Sigma}\) is a symmetric \(p \times p\) matrix.

One of the main drawbacks of the sample covariance matrix is that it is sensitive to outliers. Additionally, if the number of assets in the portfolio is large, the sample covariance matrix may become computationally expensive to calculate and may suffer from issues related to numerical stability.

Since, the sample covariance matrix is sensitive to outliers, there is need to examine other ‘robust’ approaches to estimating the covariance matrix of historical stock returns. The robust methods provide alternative approaches to classical methods for location and scale estimates, as they are not affected by slight departures from model assumptions. In the next section, we investigate some of the robust covariance estimators:

1. Minimum Covariance Determinant Estimator (MCD)

The MCD estimator is based on the concept of finding the subset of data points: \(h(>N/2)\) whose classical covariance matrix has the lowest possible determinant, which is a measure of the spread of the data.

When using the MCD estimator, the location estimate of a portfolio is given by the arithmetic mean of the asset returns in the robust subset, as follows:

\[\bar{r}_{MCD} = \frac{1}{n_{robust}} \sum_{i=1}^{n_{robust}} r_i\] Where:

\(\bar{r}_{MCD}\) is the location estimate using the MCD estimator,

\(n_{robust}\) is the number of assets in the robust subset, and

\(r_i\) is the return of the \(i\)th asset in the robust subset.

The MCD estimator for the covariance matrix is computed as follows:

\[\begin{equation} \hat{\Sigma}_{MCD} = c_{p,n} \cdot \text{MAD}(\mathbf{X})^{-2} \cdot \sum_{i=1}^{h} w_i (\mathbf{x_i} - \boldsymbol{\bar{x}})(\mathbf{x_i} - \boldsymbol{\bar{x}})^T, \end{equation}\]

where:

\(\mathbf{X}\) is a \(n \times p\) matrix of observations,

\(\boldsymbol{\bar{x}}\) is the sample mean vector,

\(\text{MAD}(\mathbf{X})\) is the Median Absolute Deviation of the observations,

\(h\) is the number of non-outlier observations,

\(w_i\) are weights assigned to each non-outlier observation, and

\(c_{p,n}\) is a constant that depends on the dimension of the data and the number of observations.

One of the main strengths of the MCD estimator is its ability to handle both symmetric and asymmetric outliers in the data, making it robust. It i also computationally efficient and can be used with high dimensional datasets, and portfolios with a large number of assets.

A limitation of the MCD estimator is that it assumes that the data points are multivariate normal, which may not always hold true in practice. The MCD estimator may also produce unreliable estimates of the covariance matrix when the proportion of outliers in the data is very high.

2. Minimum Volume Ellipsoid Estimator (MVE)

The MVE is a multivariate location and scale estimator which uses a subset of the data in its computation. It is based on the concept of fitting an ellipsoid to the data points that minimizes its volume, while ensuring that it contains a specified proportion44 This specified proportion is based on user-supplied quantiles of the data points. It can be thought of as estimating the mean and covariance of the ‘good part’ of the data.

When using the MVE estimator, the location estimate of a portfolio is given by the arithmetic mean of the asset returns in the ellipsoid that encloses the robust subset, as follows:

\[\bar{r}_{MVE} = \frac{1}{n_{robust}} \sum_{i=1}^{n_{robust}} r_i\]

where:

\(\bar{r}_{MVE}\) is the location estimate using the MVE estimator,

\(n_{robust}\) is the number of assets in the robust subset, and

\(r_i\) is the return of the \(i\)th asset in the robust subset.

The MVE estimator for the covariance matrix is computed as follows:

\[\begin{equation} \hat{\Sigma}_{MVE} = \frac{n}{m} \cdot \text{cov}_{\text{MCD}}(\mathbf{X}), \end{equation}\]

where:

\(\mathbf{X}\) is a \(n \times p\) matrix of observations,

\(\text{cov}_{\text{MCD}}(\mathbf{X})\) is the covariance matrix computed using the MCD estimator, and

\(m\) is the number of observations used to compute the MCD estimator.

Other than being robust to outliers, the MVE provides a more accurate estimate of the covariance matrix when the distribution of the asset returns is non-normal. In such cases, the MVE estimator can capture the non-linear relationships between the assets and produce more accurate estimates of the covariance matrix.

However, the MVE estimator can be computationally expensive to calculate, especially when dealing with portfolios with a large number of assets and the assumption that data points are elliptically distributed may not always hold true in some cases.

3. Orthogonalized Gnanadesikan-Kettenring Estimator (OGK)

This method is based on the concept of orthogonalizing the data matrix to remove any correlations between the variables and then estimating the covariance matrix from the resulting matrix.

The OGK estimate for the robust location measure is defined as:

\[\mu(X) = \frac{\sum_i w_ix_i}{\sum_i w_i}\]

Where,

\(w_i\) are the robustness weights compute as:

\[w_i := w_{c1}\frac{(x_i - med(X))}{MAD(X)}\]

The OGK estimator for the covariance matrix can be computed as follows:

\[\begin{equation} \hat{\Sigma}_{OGK} = \frac{1}{n} \cdot \sum_{i=1}^{n} \lambda_i(\mathbf{S}) \cdot \mathbf{V_i} \cdot \mathbf{V_i}^T, \end{equation}\]

where

\(\mathbf{S}\): is the sample covariance matrix,

\(\lambda_i(\mathbf{S})\) and \(\mathbf{V_i}\) are the eigenvalues and eigenvectors of \(\mathbf{S}\), respectively, and \(n\) is the number of observations.

One of the main strengths of the OGK estimator is its ability to handle datasets with high levels of multicollinearity55 High correlations between the variables in the data, which can cause instability in the estimates of the covariance matrix. The OGK estimator reduces this instability by orthogonalizing the data matrix

The OGK estimator also produces estimates of the covariance matrix that are more accurate than those obtained using the sample covariance matrix when the distribution of the asset returns is non-normal.

Some limitations to using the OGK estimator include: it can be computationally expensive to calculate, especially when dealing with portfolios with a large number of assets. The OGK estimator may also produce unreliable estimates of the covariance matrix when the proportion of outliers in the data is very high.

4. Nearest-Neighbour Variance Estimator (NNV)

The NNV estimator is based on the concept of using the k-nearest neighbors of each data point to estimate its variance, which is then used to estimate the covariance matrix.

The location estimate is given by the arithmetic mean of the asset returns in the dataset. The formula for the location estimate using the NNV estimator is:

\[\bar{r}_{NNV} = \frac{1}{n} \sum_{i=1}^{n} r_i\]

where:

\(\bar{r}_{NNV}\) is the location estimate using the NNV estimator,

\(n\) is the number of assets in the portfolio,

and \(r_i\) is the return of the \(i^{th}\) asset in the portfolio.

The NNV estimator for the covariance matrix is computed as follows:

\[\begin{equation} \hat{\Sigma}_{NNV} = \frac{2}{n(n-1)h^2} \sum_{i=1}^{n} \sum_{j=1}^{n} w\left(\frac{|\mathbf{x_i} - \mathbf{x_j}|}{h}\right) (\mathbf{x_i} - \mathbf{\bar{x}})(\mathbf{x_j} - \mathbf{\bar{x}})^T, \end{equation}\]

where

\(\mathbf{x_i}\) and \(\mathbf{x_j}\) are the \(i^{th}\) and \(j^{th}\) observations, respectively,

\(\mathbf{\bar{x}}\) is the sample mean,

\(h\) is a smoothing parameter, and \(w(\cdot)\) is a weighting function66 A common choice for \(w(\cdot)\) is the Epanechnikov kernel..

One of the main strengths of the NNV estimator is its ability to handle datasets with a large number of assets, and thus, it can produce estimates of the covariance matrix that are less affected by the curse of dimensionality than other methods.

Limitations to using the NNV estimator include: it assumes that the data points are independent and identically distributed, which may not always hold true in practice. Additionally, the NNV estimator may produce unreliable estimates of the covariance matrix when the dataset contains outliers or when the nearest neighbors parameter \(k\) is chosen incorrectly.

5. Shrinkage Estimator (SHRINK)

The Shrinkage estimator is based on the concept of shrinking the sample covariance matrix towards a target matrix such as the diagonal matrix or a structured matrix, which can improve the accuracy of the estimated covariance matrix.

The Shrinkage Estimator of the location parameter is given by the arithmetic mean of the asset returns.

The Shrinkage Estimator for the covariance matrix is computed as follows:

\[\begin{equation} \hat{\Sigma}_{Shrink} = \alpha \cdot \hat{\Sigma}_{D} + (1-\alpha) \cdot \hat{\Sigma}_{S}, \end{equation}\]

where:

\(\hat{\Sigma}_{D}\) is the diagonal matrix of the sample variances,

\(\hat{\Sigma}_{S}\) is the target matrix, and \(\alpha\) is the shrinkage intensity parameter that determines the degree of shrinkage towards the target matrix.

A common choice for the target matrix is the Ledoit-Wolf estimator, which is based on the assumption that the true covariance matrix can be expressed as a linear combination of a diagonal matrix and a low-rank matrix. The Ledoit-Wolf estimator is computed as follows:

\[\begin{equation} \hat{\Sigma}_{LW} = \gamma \cdot \hat{\Sigma}_{D} + (1-\gamma) \cdot \hat{\Sigma}_{L}, \end{equation}\]

where:

\(\hat{\Sigma}_{D}\) is the diagonal matrix of the sample variances,

\(\hat{\Sigma}_{L}\) is the low-rank matrix that minimizes the Frobenius norm distance to the sample covariance matrix, and

\(\gamma\) is a parameter that determines the degree of shrinkage towards the diagonal matrix.

The main strengths of the Shrinkage estimator is that it produces reliable estimates of the covariance matrix than the sample covariance matrix when the number of observations or assets is small, as well as when the number of assets is large, and thus, it is less affected by the curse of dimensionality than other methods.

However, there are also some limitations to using the Shrinkage estimator. The accuracy of the estimated covariance matrix depends on the choice of the target matrix. It is also important to note that, the Shrinkage estimator may produce unreliable estimates of the covariance matrix when the dataset contains outliers or when the amount of shrinkage is too large.

6. Bagging Estimator (BAGGED)

The bagged estimator is based on the concept of bootstrapping, which involves creating multiple samples of the original dataset and using these samples to estimate the covariance matrix.

The Bagging Estimator for the location parameter is then given by the average of the sample means:

\[ \boldsymbol{\tilde{r}} = \frac{1}{B} \sum_{b=1}^B \boldsymbol{\bar{r}}_b \]

where

\(\boldsymbol{\bar{r}}_b\) is the sample mean of the \(b\)th bootstrap sample, and

\(B\) is the number of bootstrap samples.

The Bagging Estimator for the covariance matrix is computed as follows:

\[\begin{equation} \hat{\Sigma}_{Bagging} = \frac{1}{B} \sum_{b=1}^{B} \hat{\Sigma}_{b}, \end{equation}\]

where \(\hat{\Sigma}_{b}\) is the sample covariance matrix computed from the \(b^{th}\) bootstrap sample, and \(B\) is the number of bootstrap samples used.

To generate each bootstrap sample, we randomly sample \(n\) observations with replacement from the original data. Then, we compute the sample covariance matrix from the bootstrap sample.

An advantage of the bagged estimator is: that it can handle datasets with non-normal distributions or with a high proportion of outliers. By creating multiple samples of the dataset, it can reduce the impact of extreme values on the estimated covariance matrix.

However, the accuracy of the estimated covariance matrix depends on the number of bootstrap samples (\(B\)) and the Bagged estimator may also produce unreliable estimates of the covariance matrix when the dataset contains highly correlated assets.

7. Multivariate Student’s t estimator (COV-T)

The Multivariate Student’s t estimator is a technique used to estimate the covariance matrix in portfolio optimization based on the assumption that the asset returns follow a multivariate Student’s t distribution, which can better capture the presence of outliers and heavy tails in the data compared to the normal distribution.

The Multivariate Student’s t Estimator for the location parameter is given by:

\[ \boldsymbol{\tilde{r}} = \frac{\sum_{i=1}^n w_i \boldsymbol{r}_i}{\sum_{i=1}^n w_i} \]

where:

\(\boldsymbol{r}_i\) is the \(i^{th}\) asset return vector, and

\(w_i\) is the weight assigned to the \(i^{th}\) asset, which is determined by the Mahalanobis distance of the \(i^{th}\) return vector from the empirical median of the returns, and the degrees of freedom parameter \(\nu\) of the Student’s t distribution.

The Multivariate Student’s t estimator for the covariance matrix is computed as follows:

\[\begin{equation} \hat{\Sigma}_{t} = \frac{\kappa}{\kappa-2} \cdot \frac{\sum_{i=1}^{n}(X_i-\bar{X})(X_i-\bar{X})^T}{n}, \end{equation}\]

where:

\(X_i\) is the \(i^{th}\) observation in the data,

\(\bar{X}\) is the sample mean,

\(n\) is the sample size, and

\(\kappa\) is a parameter that controls the degrees of freedom of the Student’s t-distribution.

The main strength of the Multivariate Student’s t estimator is that it can produce more reliable estimates of the covariance matrix than other methods when the dataset contains extreme values or when the returns are not normally distributed.

A key limitation to this method is that it assumes that the asset returns follow a multivariate Student’s t distribution, which may not always hold true in practice. Additionally, the Multivariate Student’s t estimator may produce unreliable estimates of the covariance matrix when the dataset contains highly correlated assets.

8. Adaptive Re-weighted estimator (ARW)

The Adaptive Re-weighted estimator is a technique based on the idea of iteratively re-weighting the observations based on their Mahalanobis distances from the current estimate of the mean and covariance matrix, which can improve the accuracy of the estimated covariance matrix. It is designed to be robust to outliers and heavy-tailed data.

\[\boldsymbol{\tilde{r}} = \boldsymbol{m}_{AR}^{(K)} = \boldsymbol{m}_{AR}^{(0)} + \sum_{k=1}^K \boldsymbol{\epsilon}^{(k)} \boldsymbol{W}^{(k)}\]

where:

\(\boldsymbol{m}_{AR}^{(0)}\) is an initial estimate of the location parameter,

\(\boldsymbol{\epsilon}^{(k)}\) is the residual error of the linear regression of the asset returns on the estimate of the location parameter at iteration \(k-1\), and

\(\boldsymbol{W}^{(k)}\) is the weight matrix assigned to each asset at iteration \(k\), which is determined by the Mahalanobis distance of the return vector from the estimate of the location parameter at iteration \(k-1\), and a tuning constant \(\alpha_k\) that scales the distance threshold.

The re-weighted estimator for the covariance matrix is given by:

\[\begin{equation} \hat{\Sigma}_{ARW} = \frac{1}{n} \sum_{i=1}^{n} w_i (X_i - \bar{X})(X_i - \bar{X})^T, \end{equation}\]

where:

\(X_i\) is the \(i^{th}\) observation in the data,

\(\bar{X}\) is the sample mean,

\(n\) is the sample size, and \(w_i\) is the weight assigned to the \(i^{th}\) observation.

The weights are computed iteratively using the following update rule:

\[\begin{equation} w_i^{(k+1)} = \frac{1}{\sqrt{(X_i - \bar{X})^T \hat{\Sigma}_{ARW}^{(k)-1} (X_i - \bar{X})}}, \end{equation}\]

where \(\hat{\Sigma}_{ARW}^{(k)}\) is the estimate of the covariance matrix at iteration \(k\). The iterations continue until the weights converge to a stable value.

The main advantage of this method is that it can produce estimates of the covariance matrix that are robust to model misspecification, such as when the asset returns do not follow a multivariate normal distribution.

Some limitations to using the Adaptive Re-weighted estimator include: the accuracy of the estimated covariance matrix depends on the choice of the re-weighting function. The Adaptive Re-weighted estimator may also produce unreliable estimates of the covariance matrix when the dataset contains highly correlated assets.

Data

Returns correlation plot. The data do not exhibit high levels of correlation.

The data used for this study are the constituents of the NSE 20 Share Index (NSE 20) from the period 2015-01-01 to 2022-12-31, a total of 8 years. The stocks are listed below:

Company Name	Ticker	Industry
Absa Bank Kenya PLC	ABSA	Banking
Diamond Trust Bank Kenya Ltd Ord 4.00	DTK	Banking
Equity Group Holdings Plc Ord 0.50	EQTY	Banking
KCB Group Plc Ord 1.00	KCB	Banking
NIC Group Plc Ord 5.00	NCBA	Banking
Stanbic Holdings Plc ord.5.00	SBIC	Banking
Standard Chartered Bank Kenya Ltd Ord 5.00	SCBK	Banking
The Co-operative Bank of Kenya Ltd Ord 1.00	COOP	Banking
Nation Media Group Ltd Ord. 2.50	NMG	Commercial and Services
WPP Scangroup Plc Ord 1.00	SCAN	Commercial and Services
Bamburi Cement Ltd Ord 5.00	BAMB	Construction and Allied
KenGen Co. Plc Ord. 2.50	KEGN	Energy and Petroleum
Kenya Power & Lighting Co Ltd Ord 2.50	KPLC	Energy and Petroleum
Britam Holdings Plc Ord 0.10	BRIT	Insurance
Kenya Re Insurance Corporation Ltd Ord 2.50	KNRE	Insurance
Centum Investment Co Plc Ord 0.50	CTUM	Investment
Nairobi Securities Exchange Plc Ord 4.00	NSE	Investment Services
British American Tobacco Kenya Plc Ord 10.00	BAT	Manufacturing & Allied
East African Breweries Ltd Ord 2.00	EABL	Manufacturing & Allied
Safaricom Plc Ord 0.05	SCOM	Telecommunications

Portfolio construction

The portfolio constrcuted has the Long-Only constraint and is re-balanced yearly, using a rolling walk-forward analysis, with a rolling optimization period of four years, and a validation period of one year as shown below77 The portfolio is optimized using data from the optimization period, and the weights obtained are used in the validation period.:

Optimization Period	Validation period
2015::2018	2019
2016::2019	2020
2017::2020	2021
2018::2021	2022

For the portfolio construction, the hyper-parameters are chosen as follows:

For the Multivariate Student’s t estimator, the degree of freedom (\(\nu\)) is the only hyper parameter, and it is obtained by first fitting the distribution on the optimization data, and then obtaining the fitted degree of freedom, to use in the portfolio validation phase.
For the Bagged estimator, the number of bootstrap samples (\(B\)) used, was fixed to 250 as informed by a pre-analysis, where several values of \(B\) were tested ranging from 100 to 500, and the results indicated that the estimates from the bagged estimator do not vary as much with different values of \(B\).
For the MCD estimator, the hyper-parameter \(\alpha\) was fixed to the default value of 0.5 as indicated by a pre-analysis.
For the Shrink estimator, the mixing weights \(\lambda\) are computed analytically and do not have to be user supplied.
The NNV estimator will not be used in the analysis, due to its instability, and the computation time it takes.
For the ARW estimator, a pre-analysis shows that fixing the value of the hyper-parameter \(\alpha\) to 0.025 (the default) is sufficient.

Benchmarks

The portfolio constructed using the sample covariance matrix, and the equally-weighted portfolio88 over the whole period are used as benchmarks.

Results

The portfolio cumulative returns chart is shown below:

The annualized statistics are shown below:

	SAMPLE	MCD	OGK	MVE	SHRINK	BAGGED	COV.T	ARW	EQW
Annualized Return	3.24%	-0.37%	2.61%	2.17%	2.92%	3.24%	6.35%	4.10%	-8.53%
Annualized Volatility	17.75%	16.17%	16.96%	16.88%	17.53%	17.76%	17.25%	16.25%	11.31%
Annualized Sharpe(Rf=0%)	0.1824	-0.0227	0.1541	0.1283	0.1666	0.1825	0.3682	0.2524	-0.7547
Maximum Drawdown	32.731%	29.100%	31.559%	30.095%	32.340%	32.762%	31.294%	28.244%	42.340%
End Equity	11346.16	9855.15	11076.84	10886.71	11209.21	11348.08	12765.42	11727.54	7021.37

Benchmarking statistics

The benchmarking statistics compare the performance of portfolios constructed using the other estimators to the portfolio constructed using the sample covariance matrix estimator.

	MCD	OGK	MVE	SHRINK	BAGGED	COV.T	ARW
Alpha	-0.0001	0.0000	0.0000	0.0000	0.0000	0.0001	0.0001
Beta	0.8812	0.7497	0.9351	0.9869	1.0006	0.8698	0.7897
Beta+	0.8196	0.6579	0.8967	0.9797	1.0008	0.8115	0.7184
Beta-	0.9007	0.8432	0.9606	0.9933	1.0006	0.9841	0.9104
R-squared	0.9360	0.6161	0.9673	0.9988	1.0000	0.8012	0.7440
Annualized Alpha	-0.0320	0.0046	-0.0088	-0.0028	0.0000	0.0357	0.0160
Correlation	0.9675	0.7849	0.9835	0.9994	1.0000	0.8951	0.8625
Correlation p-value	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
Tracking Error	0.0460	0.1141	0.0326	0.0066	0.0010	0.0803	0.0903
Active Premium	-0.0360	-0.0062	-0.0107	-0.0032	0.0000	0.0312	0.0086
Information Ratio	-0.7833	-0.0547	-0.3284	-0.4776	0.0463	0.3880	0.0957
Treynor Ratio	-0.0042	0.0349	0.0232	0.0296	0.0324	0.0730	0.0519

Conclusion

From the results above, the best performing portfolio is the COV-T99 Multivariate Student’s t estimator, followed by the ARW1010 Adaptive Re-weighted estimator portfolio. The portfolios constructed using estimators from the Shrink, bagged and sample covariance matrix produce almost the same results. The MCD portfolio performed the worst among all the estimators, probably due to the incorrect assumption that the asset returns are multivariate normal distributed. The poor performance by MVE suggests that the assumption of elliptic distribution of asset returns was not met. The good performance delivered by COV-T and ARW suggest that the data contained heavy tailed returns, which were better handled using the two methods.

Overall, estimation methods which are suited to handling non-normal returns performed better as compared to those which make the assumption of normally distributed returns.

All portfolios out-performed the Equally-weighted (1/n) portfolio in the period under consideration, however, only the COV-T and ARW estimators out-performed the sample covariance matrix estimator, in terms of annualized returns and sharpe ratio.

Based on the information ratio1111 This relates the degree to which an investment has beaten the benchmark to the consistency with which the investment has beaten the benchmark., the COV-T portfolio outperformed the SAMPLE portfolio by the largest margin.

Recommendations

Future analysis could include the NNV estimator, and other distribution estimators such as the NIG distribution.
Future analysis could include estimators obtained after adjusting data for liquidity, and auto-correlation.

References

Gnanadesikan, R. and John R. Kettenring (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81–124.
H. Markowitz. Portfolio Selection. The Journal of Finance, 7(1):77–91, 1952.
H.M. Markowitz. Foundations of portfolio theory. Journal of Finance, pages 469–477, 1991.
Irving Fisher. The Nature of Capital and Income. Macmillan, 1906.
J. Tobin. Liquidity preference as behavior towards risk. The Review of Economic Studies, pages 65–86, 1958.
Maronna, R.A. and Zamar, R.H. (2002) Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44(4), 307–317.
Rousseeuw, P. & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223. 55, 279
Sharpe, W. F. (1994). The sharpe ratio. Journal of Portfolio Management, 21(1), 49–58. 227

A comparison of covariance matrix estimation methods for tangency portfolios: Nairobi Securities Exchange

Stanley Sayianka

2023-03-06

Introduction

The efficient frontier and efficient portfolios

Tangency portfolio

Covariance matrix estimation

The sample covariance matrix (SAMPLE)

1. Minimum Covariance Determinant Estimator (MCD)

2. Minimum Volume Ellipsoid Estimator (MVE)

3. Orthogonalized Gnanadesikan-Kettenring Estimator (OGK)

4. Nearest-Neighbour Variance Estimator (NNV)

5. Shrinkage Estimator (SHRINK)

6. Bagging Estimator (BAGGED)

7. Multivariate Student’s t estimator (COV-T)

8. Adaptive Re-weighted estimator (ARW)

Data

Portfolio construction

Benchmarks

Results

Benchmarking statistics

Conclusion

Recommendations

References