ANALYSIS OF POPULATION GROWTH AND RELATED FACTORS

An exploratory data analysis of kenyan demographic data

Stanley Sayianka

2022-10-30

INTRODUCTION

The scientific study of the characteristics and nature of human population is called Demography. The main aim of this article is to study the population growth, as well as related factors which could be of interest when dealing with population data. This article aims to analyze Kenyan population from the year 1960 to 2019, using visualizations which will capture the population dynamics, as well as, analyze possible factors which influence population growth from my perspective.

The rest of the study is outlined as follows:

The second section focuses on data visualization in order to bring out the necessary relationships between the variables chosen and population. This second section also looks to explore possible relationships between the variables themselves, in order to bring out the aspect of dependence between the variables while giving a meaningful economic, social and demographic explanation as to the existence of certain relationships between the variables.

The third section focuses on modelling population as well as projection, first using common mathematical models for population forecasting, and modelling. The section then introduces modelling population as a stochastic process, using various models such as: a geometric growth rate model, Auto-regressive model, Moving average model, stochastic process models and linear regression models. The section also looks to explore population growth modelling using common statistical distributions such as the poisson distribution. The section finally explores modelling population using linear models by using the discussed variables during section 2.

DATA

The data used to conduct this analysis was fetched from trusted sources such as the UN open data website, and the open data vendor Macrotrends. Since the data is not in aggregated format, there could be certain factors with complete data, while other factors have missing data for certain years.1 See the following websites for additional information on the data sources, variable descriprions and other related queries United Nations and Macrotrends.com, and descriptions. The variables used in the analysis include:

EXPLORATORY DATA ANALYSIS

In studying the population growth and dynamics, as well as the factors affecting population growth, i decided to include the above mentioned variables (as possibly related to population growth) due to the following reasons, for each variable:

Population

The kenyan population size over the years is shown to be increasing steadily.

Min. 1st Qu. Median Mean 3rd Qu. Max.
Statistic 6076758 10539894 20622560 24214503 35635271 56215221

Gross Domestic Product(GDP)

GDP and its related GDP per capita are measures of economic growth of a county. The relationship between economic and population growth has been studied in various literature on demographic studies. (Tsen and Furuoka, 2005)2 See the full paper Tsen and Fukuroka(2005) argued that there exists an equilibrium positive relationship between the economic growth as measured by GDP/capita and population growth. Several other researches such as (Dao, 2012)3 See the full paper Population and Economic Growth in Developing Countries found that there exists a linear dependence between GDP per capita and population growth. In Kenya, (Thuku, Paul, and Almadi, 2013)4 See the full paper Thuku, Paul, and Almadi(2013) studied the relationship between economic growth and population growth and established that there exists a positive linear relationship between the two. Thus adding it as a variable in population studies seems only relevant.

(%) changes for GDP, and Per Capital GDP (%) changes for GDP, and Per Capital GDP

Fertility rates

Fertility rates explore the level of reproduction in a given society. The fertility rate at a given age is the number of children born alive to women of that age during the year as a proportion of the average annual population of women of the same age.

Annual Fertility rates Annual Fertility rates

(%) changes in annual fertility rates (%) changes in annual fertility rates

A higher fertility rate has a positive correlation with population growth in the long run, keeping other factors constant. Factors such as literacy rates have been shown to be negatively related to fertility rates, and research in both developed and developing nations show that fertility rates have been on a decrease with increasing literacy rates, improved health care services such as family planning, and poverty has been shown to have positive correlation with fertility rates.5 The links to several research papers and articles have been attached to the topics [Education] and [Poverty rates] below.

For the Kenyan Fertility data, there seems to be declining fertility rates overally. It is also evident that the population growth changes are declinig after the year 1990, to a steady plateau in [2000-2006]. The population size increases throughout. Thus a decrease in fertility rates is accompanied by a decrese in population growth rates.

Mortality rates

Mortality rates of a country are the annual numbe of deaths per 1000 total population.

(%) changes annual mortality rates (%) changes annual mortality rates

A higher mortality rate has a negative relationship with population growth if all other factor such as fertility rates, and migration rates are kept constant.6 An article on a study conducted on Life expectancy and mortality rates can be found here.

The overall mortality rates decrease, all along the years, implying that the country has experienced decreased mortality rates possibly due to improvement of living conditions, life expectancy, affordable health care, and other factors which play an important role in reducing population mortality rates.

Migration rates

This is the net effect of immigration and emmigration on a country’s population. The net migration rate which is commonly calculated as:

Number of Immigrants - Number of Emmigrants

This implies that the NMR7 Abbreviation for: [Net Migration Rate], is positive when the number of immigrants is higher than emmigrants, and is negative when the emmigrants number is higher than immigrants numbers.

(%) changes annual net migration rates (%) changes annual net migration rates

This is calculated for persons including both citizens and non-citizens. Including migration rates in demographic and population modelling studies is important as shown8 The article linked is: Does migration lead to increased population on a study done on UK, which indicates that migration was a major component of population growth.

The net migration rates during the years [1990-1995] were the highest historically, as well as an increase during the years [2010-2013] is witnessed possibly due to the fact that: during the wars in South Sudan in the year 2013, majority of south Sudanese fleeing war came to seek refuge in Kenya which could be interestng to monitor in Kenyan population numbers9 Link to article: Kenya Refugee Response

Human Development Index(HDI)

According to Wikipedia10 Link to the Wikipedia article on Human Development Index Wikipedia/HDI, The Human Development index is a composite statistic comprising the life expectancy, education and per Capita income. A higher HDI generally implies that the conditions in a given country are better i.e. high standards of living, decent education and healthcare.

Historical Human Development Index Historical Human Development Index

(%) changes in Historical Human Development Index (%) changes in Historical Human Development Index

This will in turn impact the population growth of a country by attracting more immigrants. Several research studies have been conducted on impacts of population growth on HDI, as opposed to the converse relationship.

From the above chart, it is evident that there is a positive relationship between the % changes in HDI, and the population growth rate, which was to be expected since the HDI is a measure of how good the living conditions in a country are, and we expect that the better the living conditions, the higher the population grows.

Labour Force Participation Rate

This is an estimate of an economy’s work force11 More on definitions: investopedia.com. A higher Labour force participation rate implies that there is a higher population.

Labour force participation rate for ages 15-24 Labour force participation rate for ages 15-24

(%) changes in Labour force participation rate for ages 15-24 (%) changes in Labour force participation rate for ages 15-24

This is because a rising population is accompanied by a higher labour participation rate. This would in turn lead to reduction in real wages, although that relationship is beyond the scope of this study.

For the above plot, we seek to establish the relationship between the labour force participation rate 5 years ago, and the population size now, since effects of labour force participation rates are usually not felt immediately by population e.g. an increase in female labour force participation rate is usually associated with a decrease in population growth rates, although the results might not have immediate effects on population growth. The relationship is a negative one whereby, an increase in labour force participation rate leads to decreased population

Unemployment rate

It is well established in several studies and analyses that high population growth in turn has an inverse relationship with economic variables such as unemployment rates.

Unemployment rates (%) in kenya Unemployment rates (%) in kenya

(%) changes Unemployment rates (%) changes Unemployment rates

It has been observed that higher populations tend to have higher unemployment rates as well, because an increase in labor force in turn leads to an increased competition ffor the available resources a nation has and thus leading to unemployment.12 Links to articlesArticle 1 Article 2]

There has been a decrease in unemployment rates in the early 2000-2010, however there is an evident sharp spike in unemployment rates during the year 2020-2022, due to the COVID-19 pandemic. The relationship between unemployment rates and per capita GDP is a negative relationship whereby, an increase in standards of living(as measured using GDP per capita) is associated with a decrease in unemployment rate. This however doesnt suggest a cause-and-effect relationship. The relationship between unemployment rates and population growth is a positive relationship, since an increase in unemployment rates is associated with an increase in population growth. Various studies suggest this is a cause-and-effect relationship, since unemployment would lead to inability to afford birth-control and family-planning options.

Education spending

Education spending has a positive impact on literacy rates of a country, since it is through the increased allocation of a country’s spending to education that schools reach the vast majority of a country. Mark Montgomery, an economics professor notes that the higher the education level goes up, the lower the fertility rates. The relationship is an inverse one as he notes. Thus education leads to lower birth rates, and this slows down population growth in the long run.

The education spending rate(% of GDP) The education spending rate(% of GDP)

In kenya, several key education milestones achieved were:13 More information on educational achievements in kenya under the reign of Mwai Kibaki can be found at: this link

These key achievements must have come at an increased education spending rate by the government. The population increase in later years 2015-2020 did not lead to an increase in government spending on Education, possibly because the increased population implies an increased tax income for the government to be able to provide affordable education.

Health care spending

Increased health care spending implies making more affordable health care services within the reach of the common mwananchi, constructing more hospitals and dispensaries in marginalized areas.

The healthcare spending rate (% of GDP) The healthcare spending rate (% of GDP)

Increased health care spending is a major driver of population growth, this is because good health care leads to lower infant mortality rates, and maternal mortality rates, as well as it prolongs the longevity of the elderly, who could not earlier on afford health care services.

The healthcare spending rate (% of GDP) and mortality rates The healthcare spending rate (% of GDP) and mortality rates

The healthcare spending rte (% of GDP) and life expectancy The healthcare spending rte (% of GDP) and life expectancy

The elderly and infants are among the population with highest demand for health care services, thus, improving health care services goes a long way in increasing life expectancy in a nation.14 For further studies NIH research covers this in the context of first-world nation, although the sme conclusions apply to developing nations: Implications of Population Change

In kenya(2013), under the reign of Uhuru Keyatta, the Linda Mama programme on Free maternity was introduced in 2013. The Kenyan UHC program began in 2018 as a pilot program (Ministry of Health of Kenya 2018)15 The official statement from ministry of health in kenyaUHC Program, which is evident in the increase in cost of healthcare funding from the year 2017-2018.

From the plot above, the relationship between the population growth rates and healthcare funding is a positive one, where increased population would lead the government to expand their health care services in order to cater for the growing population. Increased health care funding would lead to affordable healthcare services to citizens, which would have an effect of increasing population in later years, due to better standards of living.

Infant and maternal mortality rate

According to NIH (National Library of Medicine)16 The NIH research article on child survival bias can be found at: How does infant mortality affect birth rates?, reduction of infant mortality(children under 5 years old), leads to reduction of fertility, with some delay.

Annual Infant Mortality Rate Annual Infant Mortality Rate

Annual Maternal Mortality Rate(Per 100K Live Births) Annual Maternal Mortality Rate(Per 100K Live Births)

More formally, the Child survival hypothesis, states that “if child mortality is reduced, then eventually fertility reduction follows, with the net effect of lower growth of population”. This could be interesting to confirm with the Kenyan population growth data.

Increased health care spending by the government has a negative relationship with maternal and infant mortality rates, whereby: an increased spending in health care services leads to a decrease in maternal and infant mortality. This is because by increasing government funding to health care sector, more funds are utilized in public ediucation on maternity, building more clinics, dispensaries and hospitals in order to reach more citizens. This will have a positive impact of lowering maternal and infant mortality.

The per capita GDP, which is a common measure for living standards in a country, is usually assumed to have an inverse relationship with mortality rates for infants and mothers, since increased living stadards usually imply increased access to essential services such as health care, however from the look of the above charts, there seems to be a positive linear relationship, whereby an increase in (%) change in per capita GDP, leads to increased changes in infant and maternal mortality, however, the relationship is not significant due to prescence of outliers in the per capita dataset.

The relationship between the maternal and infant mortality rates to the population growth rate is a negative relationship whereby: an increase in infant and maternal mortality rates leads to decreased population growth rates, although the effects are not observable instantly. This relationship does not come out clearly in the context of infant mortality rates, but is seen more evidently in maternal mortality rates against the population growth rate chart.

Suicide rates

Studies on suicide rates and socio-economic variables17 The most notable one being the research article by NIH on Suicide in the world: Toward a population increase theory of suicide, one of which was conducted on global economy suicide rates and population growth indicators suggest that suicide rates are positively correlated with modernization, further the study suggests that suicide rates are negatively correlated with population growth indicators, but positively correlated with quality of life indicators.

Poverty rates

A research done on population growth and poverty in developing nations pointed out that many aspects of poverty actually increase fertility rates, high infant mortality possibly due to lack of access to decent health care, and little-to-no education among women and female children.

This also goes in same thinking with the notion that18 The research article can be read in full at UNFPA article suggests that poverty rates is influenced by, and also influences population growth.19 The research article by NIH can be found at Population growth and poverty in the developing world

The above plots show that an increase in GDP per Capita, leads to a decrease in poverty rates, which is expected, since when standards of living improve in a country, then it implies more people are crossing above the poverty line. The graphs also illustrate that increased poverty rates lead to increased fertility rates, which is in-line with the research findings by NIH above.

Electricity access

The percentage of citizens with electricity access annually. The percentage of citizens with electricity access annually.

Research done on electricity access and population growth indicates that there is a significant positive impact of electricity access to the life expectancy, quality of living and the socio-economic well being of communities.20 The research paper further tackles this The effects of energy use on infant mortality rates in Africa

The relationship between electricity access and Although the relationship between electricity access and life expectancy is a positive linear relationship, whereby, an increase in % of population with electricity access is associated with increased life epxectancy. Although this relationship might seem far-fetched, increased electrcitiy access is essential for education and health care services to reach the citizens in all the nation, and it is through these services that life expectancy increases.

Crime rates

Increased crime rates activities and especially in urban areas could trivially be explained using interesting researches such as on universe2521 This interesting study on The explosive Growth and Demise of a Mouse Population.

The chart of crime rates per 100K population in kenya, annually. The chart of crime rates per 100K population in kenya, annually.

However, crime rates have been shown to be significantly affected by unemployment rates, whereby high unemployment rates lead to higher crime rates, and vice versa. Increased population especially in urban areas have been studies to have highe crime rates as compared to less populated areas, since in high population environments, there is competition for resources, such as jobs, and this would lead to unemployment, which further leads to increased crime rates.

From the above plots, the relationship between crime rates and unemployment rates is a positive one as expected, where increasing unemployment rates is associated with higher crime rates.

CO2 Emissions

Research conducted on effect of CO_2 emissions on infant mortality rates in Africa during the duration 1999-2014 suggest that there is a significant relationshiop between a rise in CO_2 emissions and infant mortality.22 The study can be found at: The effects of energy use on infant mortality rates in Africa.

The chart of crime rates per 100K population in kenya, annually. The chart of crime rates per 100K population in kenya, annually.

The study also shows that high degrees of pollution as measured using energy indicators such as CO2 emissions have significant positive correlation with increased mortality rates which further impact population growth.

It is evident that the relationship between the emissions of CO_2 (metric tons per capita) has a positive relationship with infant and general population mortality rates, where an increase in CO_2 emissions leads to an increase in mortality rates, both for infants and general population, as concluded in the research paper above.

MODELLING POPULATION CHANGES AND SIZE

Population projection is the process of computing future forecasts of the population size, and changes. It is necessary to carry out population projection, since the forecasts would be of interest to various stakeholders such as the United Nations, Food programmes committees, governments, business people etc. in the following ways:

In this section we explore several population projection techniques and finally compare them to ascertain which techniques yield the best forecasts with minimal errors.

Modelling Population size using demographic models

In this section we explore the three major models used in demographic studies, which are:

The 1-year geometric growth rates. The plots for the arithmetic and exponential weights are omitted since they are similar for one year rates.

Based on the chart on population growth in Kenya, it would be safe to assume that the population of Kenya has exhibited a stable geometric growth, and thus modelling the population size using a geometric model is more favourable as compared to the arithmetic model. However, it is interesting to note that for one year changes, the three mathematical models give an approximate equal answer, but for longer time frames(>1), the models will give different estimates for rate of growth and Population forecasts.

Population as a stochastic process

A stochastic process is a model for a time-dependent random phenomenon. Modelling population or population growth as a stochastic process is useful, since population data is a time indexed random variable, and there are a lot of tools within the field of stochastic process which could be used to model population size, and changes in order to better describe the population dynamics. In this section we will not directly model population, rather we will model population changes annually, since it is much easier to study and construct models for the increments of a process rather than the process itself.


In this section we will fit several models for stochastic processes, starting with a martingale process which could be the simplest stochastic process model. A martingale is simply a stochastic process where the most optimal estimate for the future state of the process is the process’ current state. Under this model, we will determine next year’s forecast using the previous year’s increment, making this process the simplest one. The motivation for using such a model, where we assume that population increments each year will be 0, is that since population increment is affected by various factors which add to the population e.g fertility rates, immigration rates, and other factors which subtract from the population such as mortality rates and emigration rates, then their net effect is 0, and that population does not change at all.


This section will cover MA(q) models23 Moving average models, where we will experiment with Simple and Weighted moving average models, both in an expanding window fashion as well as a rolling window fashion, in order to better determine which model is better suited to our population data.


In this section, AR(p) models24 Auto-regressive models are implemented to model the population changes. An AR(p) model is given by the following:

\[P_t = \beta_0 + \beta_1 * P_{t-1}+ \beta_2 * P_{t-2} +...+ \beta_p * P_{t-p} + \epsilon_t\]

where:

\(P_t\) : The estimate of the population increment at time t.

\(P_{t-i}\) : The population increments in the year \(t-i\).

\(\beta_i\) : The coefficient of the model for the \(i^{th}\) year.

\(\epsilon_t\) : The random error component, which simply accounts for unexplaind variation of the estimate \(P_t\).

The model fitted in this section is an AR(1) model which implies, that the population change now, is a function of the recent year population change plus some constant and random shock. The AR(p) models are fitted with-intercept, and the analysis employs both both an expanding window AR(p) process as well as a rolling window type.


This section covers modelling population increments using a poisson model.25 The Poisson distribution is a common discrete distribution used for counting processes and other processes, with the unique characteristic that the distribution’s mean and variance are equal, and that it is strictly defined for positive integer only. We make a strong assumption that over the period, Kenya has not had a population decrease and will not have one in the near future, since it is a developing nation, thus we use the poisson distribution to model population changes.. The poisson distribution is a discrete distribution for positive-valued integers which is given by the following probability mass function: \[f(X;\lambda) = \frac{e^{- \lambda} \lambda^x}{x!}, \hspace{1 mm}x = 0,1,2,...\] whereby for our analysis, \(x\) is the population increments.

The poisson distribution is fitted to the population increments both in an expanding window and a rolling window: a 5-year and a 10-year window. An estimate for the population sizes over the years is calculated and compared to the actual population.


Modelling population using a linear model

In this section we explore the population dynamics using a linear regression model where the population growth/changes are a function of several independent variables such as: Fertility rates, mortality rates, migration rates, Life expectancy, HDI etc. The linear regression models fitted are of the form:

\[P_t = \beta_0 + \beta_1 * X_{1(t-1)} + ... + \beta_k * X_{k(t-1)} + \epsilon_t\]

where:

\(P_t\) : The population increment(estimate) for time t.

\(\beta_i\) : The coefficients of the regression models.

\(X_{it}\) : The explanatory variables chosen, which are lagged by one period.

\(\epsilon_t\) : The random error component, which is accounts for unexplained variation in \(P_t\)

We fit several models while comparing their explanatory power, and their goodness of fit(using the R-squared statistic). To select good models we resort to using step-wise selection in order to get the variables which are good predictors for the population increments while keeping the model parsimonious.

The first model is illustrated below:

term estimate std.error statistic p.value
(Intercept) 0.0133013 0.0005364 24.796367 0.0000000
Fertility Rate 0.0048569 0.0001570 30.940720 0.0000000
Mortality Rate -0.0009798 0.0000674 -14.526138 0.0000000
Net Migration Rate 0.0007144 0.0002749 2.598868 0.0118404

From the model above, It is evident that the fertility rates, and net migration rates have a positive linear relationship with the population growth rate,while the mortality rate has a negative relationship with the population growth rate as illustrated with their model coefficients. This is intuitive from our previous visualizations where we illustrated the results graphically. In this model, all the terms are significant, and the R squared statistic from this model is 0.9589, indicating that ~96% of total variability in the population growth rate is explained by this model, which makes it a decent starting point.

We now include a fourth variable, the Human Development index, where we saw from the exploratory analysis above that it has a positive linear relationship with growth rates, here we include the HDI as a predictor in our model, and perform step-wise selection which makes us end with only Fertility rates amd HDI as the best predictors for our model.

term estimate std.error statistic p.value
(Intercept) -0.0273351 0.0032585 -8.38882 0
Fertility Rate 0.0063553 0.0002654 23.94238 0
HDI 0.0480980 0.0039560 12.15832 0

We also apply the step-wise selection procedure during the model fitting in order to select the best predictors, while observing model parsimonity, which leads to the omission of net migration rate and mortality rates from the mode. The model fitted shows that the HDI and fertility rates have a positive linear relationship with the population changes. The model has an adjusted R squared statistic of ~0.98 indicating that ~98% variability in population changes is explained by the model. Thus the HDI changes is a good predictor of population changes.

Comparison of models

In this section we compare all the models fitted in order to gauge which model best explains population increments with the highest precision. We will use the mean squared error(MSE), and the tracking error which is simply the standard deviation of the difference between the population estimate and the actual population to determine the best population process model.

The tracking error is given by:

\[\sqrt{Var(\hat{P_t} - P_t)}\] where

\(P_t\) : The actual population.

\(\hat{P_t}\) : The estimated population.

We will not include the Expanding SMA population, since they are equivalent to the expanding poisson distribution estimates.

The linear model used, is the first linear model where the population changes are a function of fertility rates, mortality rates and net migration rates.

Year Expected Geometric Martingale Expanding WMA Rolling SMA(5) Rolling WMA(5) AR(1) AR(1)-weighted Expanding Poisson Rolling(5) Poisson Rolling(10) Poisson Estimated LM Actual Population
1960 8116820 8109533 8084924 8090606 8096759 8114645 8118911 8073274 8086198 NA NA 8120080
1961 8374251 8366537 8337773 8346564 8353041 8371062 8375264 8324412 8341975 8324412 8377680 8377696
1962 8643485 8635312 8602042 8614190 8621034 8639223 8643356 8586872 8609369 8586872 8649849 8647011
1963 8924984 8916326 8878276 8894060 8901289 8919695 8923756 8861198 8888975 8865639 8934495 8928511
1964 9219175 9210011 9166952 9186670 9194273 9212892 9216877 9147876 9181302 9156924 9232154 9222692
1965 9526566 9516873 9468565 9492505 9500461 9519256 9523162 9447401 9486855 9461284 9540878 9530173
1966 9847905 9837654 9783747 9812191 9820498 9839415 9843239 9760400 9806264 9779397 9863641 9851444
1967 10183545 10172715 10112982 10146193 10154853 10173984 10177721 10087361 10140004 10111788 10201002 10187478
1968 10534974 10523512 10457293 10495571 10504648 10523805 10527454 10429285 10489108 10459500 10553956 10539894
1969 10904501 10892310 10818404 10862170 10871838 10890982 10894534 10787846 10855374 10824269 10924236 10910675
1970 11294500 11281456 11198412 11248271 11258787 11278142 11281588 11165091 11241035 11208217 11311974 11301394
1971 11706105 11692113 11598939 11655638 11667214 11687222 11690549 11562625 11647844 11613009 11720631 11713048
1972 12139697 12124702 12020966 12085368 12098005 12118811 12122008 11981442 12076860 12039681 12151244 12146068
1973 12595096 12579088 12464865 12537786 12551258 12572762 12575824 12421945 12528505 12488647 12604298 12600797
1974 13072550 13055526 12930921 13012977 13026990 13048853 13051776 12884450 13003016 12960232 13080150 13077341
1975 13571907 13553885 13419179 13510674 13524989 13547102 13549885 13369031 13500248 13454507 13575764 13575907
1976 14093481 14074473 13929801 14030809 14045299 14067479 14070120 13875872 14020112 13971653 14094309 14096263
1977 14636564 14616619 14462487 14572906 14587473 14609853 14612351 14404705 14562074 14511362 14635553 14638890
1978 15202405 15181517 15017715 15137454 15152094 15174266 15176623 14956006 15126530 15074112 15199994 15205374
1979 15793779 15771858 15597141 15726289 15741218 15763017 15765229 15531396 15715258 15661546 15789285 15797776
1980 16413258 16390178 16202918 16341863 16357449 16379271 16381329 16132983 16330605 16275765 16403481 16417197
1981 17060905 17036618 16836164 16985455 17001982 17024605 17026496 16761878 16973839 16917789 17045643 17063876
1982 17736028 17710555 17497075 17657398 17674801 17698298 17700013 17418299 17645204 17587738 17716141 17736326
1983 18435276 18408776 18184025 18355813 18373560 18398001 18399540 18100687 18343003 18283896 18413205 18431761
1984 19154464 19127196 18894032 19077038 19094311 19119203 19120574 18806155 19063906 19003187 19134067 19146400
1985 19888747 19861039 19623092 19816124 19832070 19856822 19858044 19530801 19803237 19741454 19865028 19877083
1986 20635651 20607766 20367886 20569060 20583073 20606706 20607803 20271378 20556967 20495241 20611889 20622560
1987 21395996 21368037 21127129 21334296 21346383 21368224 21369217 21026610 21323453 21263164 21373161 21382112
1988 22169639 22141664 21900101 22111269 22121874 22142567 22143465 21795770 22101818 22044461 22148261 22153676
1989 22953082 22925240 22684669 22898059 22907573 22928203 22929010 22576752 22889901 22836838 22934850 22935092
1990 23744071 23716508 23478606 23692830 23701333 23721629 23722358 23367356 23685647 23637793 23726627 23724579
1991 24541242 24514066 24280092 24494078 24501403 24520966 24521631 24165774 24487608 24445197 24525691 24521703
1992 25345610 25318827 25088721 25301531 25307736 25326160 25326772 24971579 25295806 25258476 25331348 25326078
1993 26156839 26130453 25904136 26114871 26120293 26138170 26138733 25784395 26109997 26077187 26143419 26136216
1994 26972269 26946354 26724823 26932724 26937546 26955558 26956074 26602715 26928566 26899842 26960109 26950513
1995 27790180 27764810 27549151 27753597 27757772 27775617 27776096 27424916 27749985 27724945 27776300 27768296
1996 28610894 28586079 28376462 28577039 28580455 28597558 28598010 28250330 28573830 28552104 28594644 28589451
1997 29434889 29410606 29206680 29403000 29405747 29422199 29422628 29078857 29400263 29381484 29415007 29415659
1998 30265744 30241867 30041595 30233575 30236175 30251778 30252185 29912231 30231318 30215031 30239122 30250488
1999 31109010 31085317 30884951 31073342 31076641 31091659 31092033 30754107 31071223 31056704 31070635 31098757
2000 31970813 31947026 31741772 31928405 31933382 31948548 31948866 31609410 31925847 31911946 31933684 31964557
2001 32854461 32830357 32616309 32803809 32811232 32827785 32828016 32482312 32800231 32785417 32814282 32848564
2002 33757019 33732571 33509248 33700386 33710157 33729320 33729437 33373501 33695275 33678017 33713076 33751739
2003 34679747 34654914 34421574 34618955 34630450 34650698 34650698 34283950 34612120 34590833 34631102 34678779
2004 35631281 35605819 35358140 35564437 35577431 35596903 35596778 35218439 35555965 35529024 35573130 35635271
2005 36618144 36591763 36324710 36542573 36557534 36577255 36576976 36182650 36532734 36498821 36560928 36624895
2006 37642002 37614519 37325054 37556962 37574599 37596326 37595856 37180315 37545918 37504384 37583082 37649033
2007 38701809 38673171 38360560 38609126 38629427 38653590 38652905 38212823 38596445 38547281 38641027 38705932
2008 39792501 39762831 39429368 39696770 39718594 39744997 39744088 39278373 39682160 39625612 39732984 39791981
2009 40908503 40878030 40527709 40814621 40836380 40863801 40862679 40373277 40798688 40735283 40855264 40901792
2010 42042556 42011603 41649990 41955096 41975248 42002756 42001446 41492046 41938960 41870092 41991804 42030676
2011 43190717 43159560 42791355 43111832 43129325 43155398 43153933 42629907 43096576 43024486 43147764 43178274
2012 44357206 44325872 43951435 44284122 44299071 44322065 44320477 43786495 44270503 44197702 44322804 44343467
2013 45540103 45508660 45129073 45470974 45484045 45505969 45504259 44960671 45459206 45388458 45515710 45519981
2014 46727710 46696495 46317803 46665581 46676895 46700075 46698251 46146063 46655655 46589821 46720068 46700055
2015 47910722 47880129 47509638 47859707 47868460 47891467 47889571 47334794 47851400 47792898 47917127 47878336
2016 49086346 49056617 48699092 49047868 49052951 49073308 49071388 48521437 49041093 48991341 49111633 49051534
2017 50253480 50224732 49882811 50226186 50227371 50244713 50242805 49702666 50221677 50181228 50300166 50221142
2018 51418639 51390750 51062370 51396677 51395297 51409239 51407364 50880013 51394953 51364061 51484356 51392565
2019 52591312 52563988 52243363 52567081 52565350 52577073 52575221 52058973 52567414 52545895 52669698 52573973
2020 53782539 53755381 53434217 53748756 53749055 53760295 53758431 53247845 53749638 53735972 53869140 53771296
2021 54995887 54968619 54641036 54949888 54953891 54967602 54965673 54452646 54949836 54941250 55084376 54985698
2022 56227527 56200100 55865012 56172530 56180230 56197916 56195884 55674556 56170258 56163427 56316308 56215221

The mean squared error plot for the various population processes is shown below:

The tracking error for the population processes is shown below:

CONCLUSION

As can be seen from the mean squared error comparison plot for the various population process models I chose for this analysis, where the processes are ranked from largest to smallest estimation error. The population when assuming the 1-year expected geometric rate then followed by the martingales process have the lowest mean squared error, while the process with the highest error is the expanding poisson process(equivalent to the expanding SMA(q) process)

For the tracking error statistic, the process which closely tracks the population with the minimal volatility is the martingale population process, followed by the expected geometric population process.

Thus for simple explanatory relationships of population changes, we prefer to use a linear model with the fertility, mortality and net migration rates as explanatory variables, since for the other variables, the relationship was not significant. It is also important to note that the variables e.g. unemployment rates and others do not have a direct relationship with the population changes, but they do have direct relationships with other explanatory variables such as fertility rates(which have direct relationships with population growth). Also the data for certain variables was not found to be complete and hence they could not be included in the study such as poverty rates, and suicide rates.

Thus for simple population projection, we prefer the geometric mathematical model, closely followed by the martingales model for population changes.

RECOMMENDATIONS

Future work could explore constructing linear models on population changes with other variables of interest affecting population. The linear models used could be extended to polynomial effects models, and log-transformed models, which could possibly have some explanatory power over the simple linear model used.

Other variables such as political stability index, development index, wealth index, religion indexes, and literacy rates indexes could be included in future work, when the data will be sufficient to do meaningful analysis.

REFERENCES

  1. Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., … & Bristow, V. L. (1996). Stochastic processes (Vol. 2). New York: Wiley.

  2. Olive, D. J. (2017). Multiple linear regression. In Linear regression (pp. 17-83). Springer, Cham.

  3. https://en.wikipedia.org/wiki/Human_Development_Index

  4. https://en.wikipedia.org/wiki/Tracking_error