The scientific study of the characteristics and nature of human population is called Demography. The main aim of this article is to study the population growth, as well as related factors which could be of interest when dealing with population data. This article aims to analyze Kenyan population from the year 1960 to 2019, using visualizations which will capture the population dynamics, as well as, analyze possible factors which influence population growth from my perspective.
The rest of the study is outlined as follows:
The second section focuses on data visualization in order to bring out the necessary relationships between the variables chosen and population. This second section also looks to explore possible relationships between the variables themselves, in order to bring out the aspect of dependence between the variables while giving a meaningful economic, social and demographic explanation as to the existence of certain relationships between the variables.
The third section focuses on modelling population as well as projection, first using common mathematical models for population forecasting, and modelling. The section then introduces modelling population as a stochastic process, using various models such as: a geometric growth rate model, Auto-regressive model, Moving average model, stochastic process models and linear regression models. The section also looks to explore population growth modelling using common statistical distributions such as the poisson distribution. The section finally explores modelling population using linear models by using the discussed variables during section 2.
The data used to conduct this analysis was fetched from trusted sources such as the UN open data website, and the open data vendor Macrotrends. Since the data is not in aggregated format, there could be certain factors with complete data, while other factors have missing data for certain years.1 See the following websites for additional information on the data sources, variable descriprions and other related queries United Nations and Macrotrends.com, and descriptions. The variables used in the analysis include:
Population : This gives the annual population, as well as population growth rate(%) in Kenya, as derived from census exercises, The population data ranges from 1950 through to 2022.
GDP : This factor gives the annual Gross Domestic Product, per capita and growth(%) for Kenya during the years 1960 to 2020.
Fertility rates : This variable gives the fertility rates, as well as annual change(%) during the period 1950-2022
Mortality rates : This variable gives the mortality rates/death rates as well as annual change(%) during the period 1950-2022.
Migration rates : This variable gives the net migration rate of kenya as well as annual change(%) during the period 1950-2022.
Life Expectancy : This variable gives the Life expectancy and its annual growth rate(%) during the period 1950-2022.
HDI : This gives Kenya’s Human development Index as well as global ranking, during the years 1990, to 2019.
Labour Force Participation Rate : This factor gives the labour force participation rate for individuals aged 15-24 years, as well as the annual change(%) during the years 1990-2019.
Unemployment rate : This variable gives the unemployment rate for Kenya as a percentage as well as the annual change(%) during the years 1991-2020
Education spending : This variable gives Kenya’s education spending as a percentage of the GDP as well as the annual change in education spending during the years 1982-2018.
Health care spending : This variable contains data on Kenya’s healthcare spending as a (%) of GDP, as well as per Capita GDP($) during the years 2000-2018.
Infant and maternal mortality rate : The data on infant and maternal mortality rates gives the infant mortality rates, as well as maternal mortality rates per 100K live births as well as their annual change(%) for the years 2000-2017 for maternal mortality rates, and 1950-2021 for the Infant mortality rates.
Suicide rates : This data gives the suicide rates, both total, male and female suicide rates from the year 2000-2019 in percentage terms.
Poverty rates : This factor gives data on Kenya’s poverty rates measured as (%) of population living under US $5.50 Per Day, as well as the annual change(%), the data is given in five-year frequency and interpolation would be necessary in order to make the dataset complete. The data is given from 1992-2015.
Electricity access : This variable gives the (%) of population with electricity access in Kenya and annual change(%) during the years 1993-2019.
Crime rates : This variable gives the crime rates in Kenya per 100K Population as well as the annual change(%) during the duration 2004-2019.
CO2 Emissions : This variable gives the emissions from Kenya as Kilotons of co2, and metric tons per capita during the duration 1960-2018.
In studying the population growth and dynamics, as well as the factors affecting population growth, i decided to include the above mentioned variables (as possibly related to population growth) due to the following reasons, for each variable:
The kenyan population size over the years is shown to be increasing steadily.
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
---|---|---|---|---|---|---|
Statistic | 6076758 | 10539894 | 20622560 | 24214503 | 35635271 | 56215221 |
GDP and its related GDP per capita are measures of economic growth of a county. The relationship between economic and population growth has been studied in various literature on demographic studies. (Tsen and Furuoka, 2005)2 See the full paper Tsen and Fukuroka(2005) argued that there exists an equilibrium positive relationship between the economic growth as measured by GDP/capita and population growth. Several other researches such as (Dao, 2012)3 See the full paper Population and Economic Growth in Developing Countries found that there exists a linear dependence between GDP per capita and population growth. In Kenya, (Thuku, Paul, and Almadi, 2013)4 See the full paper Thuku, Paul, and Almadi(2013) studied the relationship between economic growth and population growth and established that there exists a positive linear relationship between the two. Thus adding it as a variable in population studies seems only relevant.
(%) changes for GDP, and Per Capital
GDP
Fertility rates explore the level of reproduction in a given society. The fertility rate at a given age is the number of children born alive to women of that age during the year as a proportion of the average annual population of women of the same age.
Annual Fertility rates
(%) changes in annual fertility
rates
A higher fertility rate has a positive correlation with population growth in the long run, keeping other factors constant. Factors such as literacy rates have been shown to be negatively related to fertility rates, and research in both developed and developing nations show that fertility rates have been on a decrease with increasing literacy rates, improved health care services such as family planning, and poverty has been shown to have positive correlation with fertility rates.5 The links to several research papers and articles have been attached to the topics [Education] and [Poverty rates] below.
For the Kenyan Fertility data, there seems to be declining fertility rates overally. It is also evident that the population growth changes are declinig after the year 1990, to a steady plateau in [2000-2006]. The population size increases throughout. Thus a decrease in fertility rates is accompanied by a decrese in population growth rates.
Mortality rates of a country are the annual numbe of deaths per 1000 total population.
(%) changes annual mortality
rates
A higher mortality rate has a negative relationship with population growth if all other factor such as fertility rates, and migration rates are kept constant.6 An article on a study conducted on Life expectancy and mortality rates can be found here.
The overall mortality rates decrease, all along the years, implying that the country has experienced decreased mortality rates possibly due to improvement of living conditions, life expectancy, affordable health care, and other factors which play an important role in reducing population mortality rates.
This is the net effect of immigration and emmigration on a country’s population. The net migration rate which is commonly calculated as:
Number of Immigrants - Number of Emmigrants
This implies that the NMR7 Abbreviation for: [Net Migration Rate], is positive when the number of immigrants is higher than emmigrants, and is negative when the emmigrants number is higher than immigrants numbers.
(%) changes annual net migration
rates
This is calculated for persons including both citizens and non-citizens. Including migration rates in demographic and population modelling studies is important as shown8 The article linked is: Does migration lead to increased population on a study done on UK, which indicates that migration was a major component of population growth.
The net migration rates during the years [1990-1995] were the highest historically, as well as an increase during the years [2010-2013] is witnessed possibly due to the fact that: during the wars in South Sudan in the year 2013, majority of south Sudanese fleeing war came to seek refuge in Kenya which could be interestng to monitor in Kenyan population numbers9 Link to article: Kenya Refugee Response
According to Wikipedia10 Link to the Wikipedia article on Human Development Index Wikipedia/HDI, The Human Development index is a composite statistic comprising the life expectancy, education and per Capita income. A higher HDI generally implies that the conditions in a given country are better i.e. high standards of living, decent education and healthcare.
Historical Human Development
Index
(%) changes in Historical Human
Development Index
This will in turn impact the population growth of a country by attracting more immigrants. Several research studies have been conducted on impacts of population growth on HDI, as opposed to the converse relationship.
From the above chart, it is evident that there is a positive relationship between the % changes in HDI, and the population growth rate, which was to be expected since the HDI is a measure of how good the living conditions in a country are, and we expect that the better the living conditions, the higher the population grows.
This is an estimate of an economy’s work force11 More on definitions: investopedia.com. A higher Labour force participation rate implies that there is a higher population.
Labour force participation rate for
ages 15-24
(%) changes in Labour force
participation rate for ages 15-24
This is because a rising population is accompanied by a higher labour participation rate. This would in turn lead to reduction in real wages, although that relationship is beyond the scope of this study.
For the above plot, we seek to establish the relationship between the labour force participation rate 5 years ago, and the population size now, since effects of labour force participation rates are usually not felt immediately by population e.g. an increase in female labour force participation rate is usually associated with a decrease in population growth rates, although the results might not have immediate effects on population growth. The relationship is a negative one whereby, an increase in labour force participation rate leads to decreased population
It is well established in several studies and analyses that high population growth in turn has an inverse relationship with economic variables such as unemployment rates.
Unemployment rates (%) in
kenya
(%) changes Unemployment
rates
It has been observed that higher populations tend to have higher unemployment rates as well, because an increase in labor force in turn leads to an increased competition ffor the available resources a nation has and thus leading to unemployment.12 Links to articlesArticle 1 Article 2]
There has been a decrease in unemployment rates in the early 2000-2010, however there is an evident sharp spike in unemployment rates during the year 2020-2022, due to the COVID-19 pandemic. The relationship between unemployment rates and per capita GDP is a negative relationship whereby, an increase in standards of living(as measured using GDP per capita) is associated with a decrease in unemployment rate. This however doesnt suggest a cause-and-effect relationship. The relationship between unemployment rates and population growth is a positive relationship, since an increase in unemployment rates is associated with an increase in population growth. Various studies suggest this is a cause-and-effect relationship, since unemployment would lead to inability to afford birth-control and family-planning options.
Education spending has a positive impact on literacy rates of a country, since it is through the increased allocation of a country’s spending to education that schools reach the vast majority of a country. Mark Montgomery, an economics professor notes that the higher the education level goes up, the lower the fertility rates. The relationship is an inverse one as he notes. Thus education leads to lower birth rates, and this slows down population growth in the long run.
The education spending rate(% of
GDP)
In kenya, several key education milestones achieved were:13 More information on educational achievements in kenya under the reign of Mwai Kibaki can be found at: this link
The government under Mwai Kibaki introduced the Free Primary Education(FPE) in January 2003, where the government would cover the cost of tuition for the students. This was meant to enable disadvantages children have access to education. This is evident in the spike seen on the graphs.
In his second term, the president Mwai Kibaki, rolled out the Free Day Secondary Education in 2008, to offer tuition free Day secondary education in order to enhance retention rate among learners.
These key achievements must have come at an increased education spending rate by the government. The population increase in later years 2015-2020 did not lead to an increase in government spending on Education, possibly because the increased population implies an increased tax income for the government to be able to provide affordable education.
Increased health care spending implies making more affordable health care services within the reach of the common mwananchi, constructing more hospitals and dispensaries in marginalized areas.
The healthcare spending rate (% of
GDP)
Increased health care spending is a major driver of population growth, this is because good health care leads to lower infant mortality rates, and maternal mortality rates, as well as it prolongs the longevity of the elderly, who could not earlier on afford health care services.
The healthcare spending rate (% of GDP)
and mortality rates
The healthcare spending rte (% of GDP)
and life expectancy
The elderly and infants are among the population with highest demand for health care services, thus, improving health care services goes a long way in increasing life expectancy in a nation.14 For further studies NIH research covers this in the context of first-world nation, although the sme conclusions apply to developing nations: Implications of Population Change
In kenya(2013), under the reign of Uhuru Keyatta, the Linda Mama programme on Free maternity was introduced in 2013. The Kenyan UHC program began in 2018 as a pilot program (Ministry of Health of Kenya 2018)15 The official statement from ministry of health in kenyaUHC Program, which is evident in the increase in cost of healthcare funding from the year 2017-2018.
From the plot above, the relationship between the population growth rates and healthcare funding is a positive one, where increased population would lead the government to expand their health care services in order to cater for the growing population. Increased health care funding would lead to affordable healthcare services to citizens, which would have an effect of increasing population in later years, due to better standards of living.
According to NIH (National Library of Medicine)16 The NIH research article on child survival bias can be found at: How does infant mortality affect birth rates?, reduction of infant mortality(children under 5 years old), leads to reduction of fertility, with some delay.
Annual Infant Mortality Rate
Annual Maternal Mortality Rate(Per 100K
Live Births)
More formally, the Child survival hypothesis, states that “if child mortality is reduced, then eventually fertility reduction follows, with the net effect of lower growth of population”. This could be interesting to confirm with the Kenyan population growth data.
Increased health care spending by the government has a negative relationship with maternal and infant mortality rates, whereby: an increased spending in health care services leads to a decrease in maternal and infant mortality. This is because by increasing government funding to health care sector, more funds are utilized in public ediucation on maternity, building more clinics, dispensaries and hospitals in order to reach more citizens. This will have a positive impact of lowering maternal and infant mortality.
The per capita GDP, which is a common measure for living standards in a country, is usually assumed to have an inverse relationship with mortality rates for infants and mothers, since increased living stadards usually imply increased access to essential services such as health care, however from the look of the above charts, there seems to be a positive linear relationship, whereby an increase in (%) change in per capita GDP, leads to increased changes in infant and maternal mortality, however, the relationship is not significant due to prescence of outliers in the per capita dataset.
The relationship between the maternal and infant mortality rates to the population growth rate is a negative relationship whereby: an increase in infant and maternal mortality rates leads to decreased population growth rates, although the effects are not observable instantly. This relationship does not come out clearly in the context of infant mortality rates, but is seen more evidently in maternal mortality rates against the population growth rate chart.
Studies on suicide rates and socio-economic variables17 The most notable one being the research article by NIH on Suicide in the world: Toward a population increase theory of suicide, one of which was conducted on global economy suicide rates and population growth indicators suggest that suicide rates are positively correlated with modernization, further the study suggests that suicide rates are negatively correlated with population growth indicators, but positively correlated with quality of life indicators.
A research done on population growth and poverty in developing nations pointed out that many aspects of poverty actually increase fertility rates, high infant mortality possibly due to lack of access to decent health care, and little-to-no education among women and female children.
This also goes in same thinking with the notion that18 The research article can be read in full at UNFPA article suggests that poverty rates is influenced by, and also influences population growth.19 The research article by NIH can be found at Population growth and poverty in the developing world
The above plots show that an increase in GDP per Capita, leads to a decrease in poverty rates, which is expected, since when standards of living improve in a country, then it implies more people are crossing above the poverty line. The graphs also illustrate that increased poverty rates lead to increased fertility rates, which is in-line with the research findings by NIH above.
The percentage of citizens with
electricity access annually.
Research done on electricity access and population growth indicates that there is a significant positive impact of electricity access to the life expectancy, quality of living and the socio-economic well being of communities.20 The research paper further tackles this The effects of energy use on infant mortality rates in Africa
The relationship between electricity access and Although the relationship between electricity access and life expectancy is a positive linear relationship, whereby, an increase in % of population with electricity access is associated with increased life epxectancy. Although this relationship might seem far-fetched, increased electrcitiy access is essential for education and health care services to reach the citizens in all the nation, and it is through these services that life expectancy increases.
Increased crime rates activities and especially in urban areas could trivially be explained using interesting researches such as on universe2521 This interesting study on The explosive Growth and Demise of a Mouse Population.
The chart of crime rates per 100K
population in kenya, annually.
However, crime rates have been shown to be significantly affected by unemployment rates, whereby high unemployment rates lead to higher crime rates, and vice versa. Increased population especially in urban areas have been studies to have highe crime rates as compared to less populated areas, since in high population environments, there is competition for resources, such as jobs, and this would lead to unemployment, which further leads to increased crime rates.
From the above plots, the relationship between crime rates and unemployment rates is a positive one as expected, where increasing unemployment rates is associated with higher crime rates.
Research conducted on effect of CO_2
emissions on infant
mortality rates in Africa during the duration 1999-2014 suggest that
there is a significant relationshiop between a rise in CO_2
emissions and infant mortality.22 The study can be found at: The
effects of energy use on infant mortality rates in Africa.
The chart of crime rates per 100K
population in kenya, annually.
The study also shows that high degrees of pollution as measured using energy indicators such as CO2 emissions have significant positive correlation with increased mortality rates which further impact population growth.
It is evident that the relationship between the emissions of CO_2 (metric tons per capita) has a positive relationship with infant and general population mortality rates, where an increase in CO_2 emissions leads to an increase in mortality rates, both for infants and general population, as concluded in the research paper above.
Population projection is the process of computing future forecasts of the population size, and changes. It is necessary to carry out population projection, since the forecasts would be of interest to various stakeholders such as the United Nations, Food programmes committees, governments, business people etc. in the following ways:
In this section we explore several population projection techniques and finally compare them to ascertain which techniques yield the best forecasts with minimal errors.
In this section we explore the three major models used in demographic studies, which are:
The 1-year geometric growth rates. The plots for the arithmetic and exponential weights are omitted since they are similar for one year rates.
Based on the chart on population growth in Kenya, it would be safe to assume that the population of Kenya has exhibited a stable geometric growth, and thus modelling the population size using a geometric model is more favourable as compared to the arithmetic model. However, it is interesting to note that for one year changes, the three mathematical models give an approximate equal answer, but for longer time frames(>1), the models will give different estimates for rate of growth and Population forecasts.
A stochastic process is a model for a time-dependent random phenomenon. Modelling population or population growth as a stochastic process is useful, since population data is a time indexed random variable, and there are a lot of tools within the field of stochastic process which could be used to model population size, and changes in order to better describe the population dynamics. In this section we will not directly model population, rather we will model population changes annually, since it is much easier to study and construct models for the increments of a process rather than the process itself.
In this section we will fit several models for stochastic processes, starting with a martingale process which could be the simplest stochastic process model. A martingale is simply a stochastic process where the most optimal estimate for the future state of the process is the process’ current state. Under this model, we will determine next year’s forecast using the previous year’s increment, making this process the simplest one. The motivation for using such a model, where we assume that population increments each year will be 0, is that since population increment is affected by various factors which add to the population e.g fertility rates, immigration rates, and other factors which subtract from the population such as mortality rates and emigration rates, then their net effect is 0, and that population does not change at all.
This section will cover MA(q) models23 Moving average models, where we will experiment with Simple and Weighted moving average models, both in an expanding window fashion as well as a rolling window fashion, in order to better determine which model is better suited to our population data.
In this section, AR(p) models24 Auto-regressive models are implemented to model the population changes. An AR(p) model is given by the following:
\[P_t = \beta_0 + \beta_1 * P_{t-1}+ \beta_2 * P_{t-2} +...+ \beta_p * P_{t-p} + \epsilon_t\]
where:
\(P_t\) : The estimate of the population increment at time t.
\(P_{t-i}\) : The population increments in the year \(t-i\).
\(\beta_i\) : The coefficient of the model for the \(i^{th}\) year.
\(\epsilon_t\) : The random error component, which simply accounts for unexplaind variation of the estimate \(P_t\).
The model fitted in this section is an AR(1) model which implies, that the population change now, is a function of the recent year population change plus some constant and random shock. The AR(p) models are fitted with-intercept, and the analysis employs both both an expanding window AR(p) process as well as a rolling window type.
This section covers modelling population increments using a poisson model.25 The Poisson distribution is a common discrete distribution used for counting processes and other processes, with the unique characteristic that the distribution’s mean and variance are equal, and that it is strictly defined for positive integer only. We make a strong assumption that over the period, Kenya has not had a population decrease and will not have one in the near future, since it is a developing nation, thus we use the poisson distribution to model population changes.. The poisson distribution is a discrete distribution for positive-valued integers which is given by the following probability mass function: \[f(X;\lambda) = \frac{e^{- \lambda} \lambda^x}{x!}, \hspace{1 mm}x = 0,1,2,...\] whereby for our analysis, \(x\) is the population increments.
The poisson distribution is fitted to the population increments both in an expanding window and a rolling window: a 5-year and a 10-year window. An estimate for the population sizes over the years is calculated and compared to the actual population.
In this section we explore the population dynamics using a linear regression model where the population growth/changes are a function of several independent variables such as: Fertility rates, mortality rates, migration rates, Life expectancy, HDI etc. The linear regression models fitted are of the form:
\[P_t = \beta_0 + \beta_1 * X_{1(t-1)} + ... + \beta_k * X_{k(t-1)} + \epsilon_t\]
where:
\(P_t\) : The population increment(estimate) for time t.
\(\beta_i\) : The coefficients of the regression models.
\(X_{it}\) : The explanatory variables chosen, which are lagged by one period.
\(\epsilon_t\) : The random error component, which is accounts for unexplained variation in \(P_t\)
We fit several models while comparing their explanatory power, and their goodness of fit(using the R-squared statistic). To select good models we resort to using step-wise selection in order to get the variables which are good predictors for the population increments while keeping the model parsimonious.
The first model is illustrated below:
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 0.0133013 | 0.0005364 | 24.796367 | 0.0000000 |
Fertility Rate |
0.0048569 | 0.0001570 | 30.940720 | 0.0000000 |
Mortality Rate |
-0.0009798 | 0.0000674 | -14.526138 | 0.0000000 |
Net Migration Rate |
0.0007144 | 0.0002749 | 2.598868 | 0.0118404 |
From the model above, It is evident that the fertility rates, and net migration rates have a positive linear relationship with the population growth rate,while the mortality rate has a negative relationship with the population growth rate as illustrated with their model coefficients. This is intuitive from our previous visualizations where we illustrated the results graphically. In this model, all the terms are significant, and the R squared statistic from this model is 0.9589, indicating that ~96% of total variability in the population growth rate is explained by this model, which makes it a decent starting point.
We now include a fourth variable, the Human Development index, where we saw from the exploratory analysis above that it has a positive linear relationship with growth rates, here we include the HDI as a predictor in our model, and perform step-wise selection which makes us end with only Fertility rates amd HDI as the best predictors for our model.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.0273351 | 0.0032585 | -8.38882 | 0 |
Fertility Rate |
0.0063553 | 0.0002654 | 23.94238 | 0 |
HDI | 0.0480980 | 0.0039560 | 12.15832 | 0 |
We also apply the step-wise selection procedure during the model fitting in order to select the best predictors, while observing model parsimonity, which leads to the omission of net migration rate and mortality rates from the mode. The model fitted shows that the HDI and fertility rates have a positive linear relationship with the population changes. The model has an adjusted R squared statistic of ~0.98 indicating that ~98% variability in population changes is explained by the model. Thus the HDI changes is a good predictor of population changes.
In this section we compare all the models fitted in order to gauge which model best explains population increments with the highest precision. We will use the mean squared error(MSE), and the tracking error which is simply the standard deviation of the difference between the population estimate and the actual population to determine the best population process model.
The tracking error is given by:
\[\sqrt{Var(\hat{P_t} - P_t)}\] where
\(P_t\) : The actual population.
\(\hat{P_t}\) : The estimated population.
We will not include the Expanding SMA population, since they are equivalent to the expanding poisson distribution estimates.
The linear model used, is the first linear model where the population changes are a function of fertility rates, mortality rates and net migration rates.
Year | Expected Geometric | Martingale | Expanding WMA | Rolling SMA(5) | Rolling WMA(5) | AR(1) | AR(1)-weighted | Expanding Poisson | Rolling(5) Poisson | Rolling(10) Poisson | Estimated LM | Actual Population |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1960 | 8116820 | 8109533 | 8084924 | 8090606 | 8096759 | 8114645 | 8118911 | 8073274 | 8086198 | NA | NA | 8120080 |
1961 | 8374251 | 8366537 | 8337773 | 8346564 | 8353041 | 8371062 | 8375264 | 8324412 | 8341975 | 8324412 | 8377680 | 8377696 |
1962 | 8643485 | 8635312 | 8602042 | 8614190 | 8621034 | 8639223 | 8643356 | 8586872 | 8609369 | 8586872 | 8649849 | 8647011 |
1963 | 8924984 | 8916326 | 8878276 | 8894060 | 8901289 | 8919695 | 8923756 | 8861198 | 8888975 | 8865639 | 8934495 | 8928511 |
1964 | 9219175 | 9210011 | 9166952 | 9186670 | 9194273 | 9212892 | 9216877 | 9147876 | 9181302 | 9156924 | 9232154 | 9222692 |
1965 | 9526566 | 9516873 | 9468565 | 9492505 | 9500461 | 9519256 | 9523162 | 9447401 | 9486855 | 9461284 | 9540878 | 9530173 |
1966 | 9847905 | 9837654 | 9783747 | 9812191 | 9820498 | 9839415 | 9843239 | 9760400 | 9806264 | 9779397 | 9863641 | 9851444 |
1967 | 10183545 | 10172715 | 10112982 | 10146193 | 10154853 | 10173984 | 10177721 | 10087361 | 10140004 | 10111788 | 10201002 | 10187478 |
1968 | 10534974 | 10523512 | 10457293 | 10495571 | 10504648 | 10523805 | 10527454 | 10429285 | 10489108 | 10459500 | 10553956 | 10539894 |
1969 | 10904501 | 10892310 | 10818404 | 10862170 | 10871838 | 10890982 | 10894534 | 10787846 | 10855374 | 10824269 | 10924236 | 10910675 |
1970 | 11294500 | 11281456 | 11198412 | 11248271 | 11258787 | 11278142 | 11281588 | 11165091 | 11241035 | 11208217 | 11311974 | 11301394 |
1971 | 11706105 | 11692113 | 11598939 | 11655638 | 11667214 | 11687222 | 11690549 | 11562625 | 11647844 | 11613009 | 11720631 | 11713048 |
1972 | 12139697 | 12124702 | 12020966 | 12085368 | 12098005 | 12118811 | 12122008 | 11981442 | 12076860 | 12039681 | 12151244 | 12146068 |
1973 | 12595096 | 12579088 | 12464865 | 12537786 | 12551258 | 12572762 | 12575824 | 12421945 | 12528505 | 12488647 | 12604298 | 12600797 |
1974 | 13072550 | 13055526 | 12930921 | 13012977 | 13026990 | 13048853 | 13051776 | 12884450 | 13003016 | 12960232 | 13080150 | 13077341 |
1975 | 13571907 | 13553885 | 13419179 | 13510674 | 13524989 | 13547102 | 13549885 | 13369031 | 13500248 | 13454507 | 13575764 | 13575907 |
1976 | 14093481 | 14074473 | 13929801 | 14030809 | 14045299 | 14067479 | 14070120 | 13875872 | 14020112 | 13971653 | 14094309 | 14096263 |
1977 | 14636564 | 14616619 | 14462487 | 14572906 | 14587473 | 14609853 | 14612351 | 14404705 | 14562074 | 14511362 | 14635553 | 14638890 |
1978 | 15202405 | 15181517 | 15017715 | 15137454 | 15152094 | 15174266 | 15176623 | 14956006 | 15126530 | 15074112 | 15199994 | 15205374 |
1979 | 15793779 | 15771858 | 15597141 | 15726289 | 15741218 | 15763017 | 15765229 | 15531396 | 15715258 | 15661546 | 15789285 | 15797776 |
1980 | 16413258 | 16390178 | 16202918 | 16341863 | 16357449 | 16379271 | 16381329 | 16132983 | 16330605 | 16275765 | 16403481 | 16417197 |
1981 | 17060905 | 17036618 | 16836164 | 16985455 | 17001982 | 17024605 | 17026496 | 16761878 | 16973839 | 16917789 | 17045643 | 17063876 |
1982 | 17736028 | 17710555 | 17497075 | 17657398 | 17674801 | 17698298 | 17700013 | 17418299 | 17645204 | 17587738 | 17716141 | 17736326 |
1983 | 18435276 | 18408776 | 18184025 | 18355813 | 18373560 | 18398001 | 18399540 | 18100687 | 18343003 | 18283896 | 18413205 | 18431761 |
1984 | 19154464 | 19127196 | 18894032 | 19077038 | 19094311 | 19119203 | 19120574 | 18806155 | 19063906 | 19003187 | 19134067 | 19146400 |
1985 | 19888747 | 19861039 | 19623092 | 19816124 | 19832070 | 19856822 | 19858044 | 19530801 | 19803237 | 19741454 | 19865028 | 19877083 |
1986 | 20635651 | 20607766 | 20367886 | 20569060 | 20583073 | 20606706 | 20607803 | 20271378 | 20556967 | 20495241 | 20611889 | 20622560 |
1987 | 21395996 | 21368037 | 21127129 | 21334296 | 21346383 | 21368224 | 21369217 | 21026610 | 21323453 | 21263164 | 21373161 | 21382112 |
1988 | 22169639 | 22141664 | 21900101 | 22111269 | 22121874 | 22142567 | 22143465 | 21795770 | 22101818 | 22044461 | 22148261 | 22153676 |
1989 | 22953082 | 22925240 | 22684669 | 22898059 | 22907573 | 22928203 | 22929010 | 22576752 | 22889901 | 22836838 | 22934850 | 22935092 |
1990 | 23744071 | 23716508 | 23478606 | 23692830 | 23701333 | 23721629 | 23722358 | 23367356 | 23685647 | 23637793 | 23726627 | 23724579 |
1991 | 24541242 | 24514066 | 24280092 | 24494078 | 24501403 | 24520966 | 24521631 | 24165774 | 24487608 | 24445197 | 24525691 | 24521703 |
1992 | 25345610 | 25318827 | 25088721 | 25301531 | 25307736 | 25326160 | 25326772 | 24971579 | 25295806 | 25258476 | 25331348 | 25326078 |
1993 | 26156839 | 26130453 | 25904136 | 26114871 | 26120293 | 26138170 | 26138733 | 25784395 | 26109997 | 26077187 | 26143419 | 26136216 |
1994 | 26972269 | 26946354 | 26724823 | 26932724 | 26937546 | 26955558 | 26956074 | 26602715 | 26928566 | 26899842 | 26960109 | 26950513 |
1995 | 27790180 | 27764810 | 27549151 | 27753597 | 27757772 | 27775617 | 27776096 | 27424916 | 27749985 | 27724945 | 27776300 | 27768296 |
1996 | 28610894 | 28586079 | 28376462 | 28577039 | 28580455 | 28597558 | 28598010 | 28250330 | 28573830 | 28552104 | 28594644 | 28589451 |
1997 | 29434889 | 29410606 | 29206680 | 29403000 | 29405747 | 29422199 | 29422628 | 29078857 | 29400263 | 29381484 | 29415007 | 29415659 |
1998 | 30265744 | 30241867 | 30041595 | 30233575 | 30236175 | 30251778 | 30252185 | 29912231 | 30231318 | 30215031 | 30239122 | 30250488 |
1999 | 31109010 | 31085317 | 30884951 | 31073342 | 31076641 | 31091659 | 31092033 | 30754107 | 31071223 | 31056704 | 31070635 | 31098757 |
2000 | 31970813 | 31947026 | 31741772 | 31928405 | 31933382 | 31948548 | 31948866 | 31609410 | 31925847 | 31911946 | 31933684 | 31964557 |
2001 | 32854461 | 32830357 | 32616309 | 32803809 | 32811232 | 32827785 | 32828016 | 32482312 | 32800231 | 32785417 | 32814282 | 32848564 |
2002 | 33757019 | 33732571 | 33509248 | 33700386 | 33710157 | 33729320 | 33729437 | 33373501 | 33695275 | 33678017 | 33713076 | 33751739 |
2003 | 34679747 | 34654914 | 34421574 | 34618955 | 34630450 | 34650698 | 34650698 | 34283950 | 34612120 | 34590833 | 34631102 | 34678779 |
2004 | 35631281 | 35605819 | 35358140 | 35564437 | 35577431 | 35596903 | 35596778 | 35218439 | 35555965 | 35529024 | 35573130 | 35635271 |
2005 | 36618144 | 36591763 | 36324710 | 36542573 | 36557534 | 36577255 | 36576976 | 36182650 | 36532734 | 36498821 | 36560928 | 36624895 |
2006 | 37642002 | 37614519 | 37325054 | 37556962 | 37574599 | 37596326 | 37595856 | 37180315 | 37545918 | 37504384 | 37583082 | 37649033 |
2007 | 38701809 | 38673171 | 38360560 | 38609126 | 38629427 | 38653590 | 38652905 | 38212823 | 38596445 | 38547281 | 38641027 | 38705932 |
2008 | 39792501 | 39762831 | 39429368 | 39696770 | 39718594 | 39744997 | 39744088 | 39278373 | 39682160 | 39625612 | 39732984 | 39791981 |
2009 | 40908503 | 40878030 | 40527709 | 40814621 | 40836380 | 40863801 | 40862679 | 40373277 | 40798688 | 40735283 | 40855264 | 40901792 |
2010 | 42042556 | 42011603 | 41649990 | 41955096 | 41975248 | 42002756 | 42001446 | 41492046 | 41938960 | 41870092 | 41991804 | 42030676 |
2011 | 43190717 | 43159560 | 42791355 | 43111832 | 43129325 | 43155398 | 43153933 | 42629907 | 43096576 | 43024486 | 43147764 | 43178274 |
2012 | 44357206 | 44325872 | 43951435 | 44284122 | 44299071 | 44322065 | 44320477 | 43786495 | 44270503 | 44197702 | 44322804 | 44343467 |
2013 | 45540103 | 45508660 | 45129073 | 45470974 | 45484045 | 45505969 | 45504259 | 44960671 | 45459206 | 45388458 | 45515710 | 45519981 |
2014 | 46727710 | 46696495 | 46317803 | 46665581 | 46676895 | 46700075 | 46698251 | 46146063 | 46655655 | 46589821 | 46720068 | 46700055 |
2015 | 47910722 | 47880129 | 47509638 | 47859707 | 47868460 | 47891467 | 47889571 | 47334794 | 47851400 | 47792898 | 47917127 | 47878336 |
2016 | 49086346 | 49056617 | 48699092 | 49047868 | 49052951 | 49073308 | 49071388 | 48521437 | 49041093 | 48991341 | 49111633 | 49051534 |
2017 | 50253480 | 50224732 | 49882811 | 50226186 | 50227371 | 50244713 | 50242805 | 49702666 | 50221677 | 50181228 | 50300166 | 50221142 |
2018 | 51418639 | 51390750 | 51062370 | 51396677 | 51395297 | 51409239 | 51407364 | 50880013 | 51394953 | 51364061 | 51484356 | 51392565 |
2019 | 52591312 | 52563988 | 52243363 | 52567081 | 52565350 | 52577073 | 52575221 | 52058973 | 52567414 | 52545895 | 52669698 | 52573973 |
2020 | 53782539 | 53755381 | 53434217 | 53748756 | 53749055 | 53760295 | 53758431 | 53247845 | 53749638 | 53735972 | 53869140 | 53771296 |
2021 | 54995887 | 54968619 | 54641036 | 54949888 | 54953891 | 54967602 | 54965673 | 54452646 | 54949836 | 54941250 | 55084376 | 54985698 |
2022 | 56227527 | 56200100 | 55865012 | 56172530 | 56180230 | 56197916 | 56195884 | 55674556 | 56170258 | 56163427 | 56316308 | 56215221 |
The mean squared error plot for the various population processes is shown below:
The tracking error for the population processes is shown below:
As can be seen from the mean squared error comparison plot for the various population process models I chose for this analysis, where the processes are ranked from largest to smallest estimation error. The population when assuming the 1-year expected geometric rate then followed by the martingales process have the lowest mean squared error, while the process with the highest error is the expanding poisson process(equivalent to the expanding SMA(q) process)
For the tracking error statistic, the process which closely tracks the population with the minimal volatility is the martingale population process, followed by the expected geometric population process.
Thus for simple explanatory relationships of population changes, we prefer to use a linear model with the fertility, mortality and net migration rates as explanatory variables, since for the other variables, the relationship was not significant. It is also important to note that the variables e.g. unemployment rates and others do not have a direct relationship with the population changes, but they do have direct relationships with other explanatory variables such as fertility rates(which have direct relationships with population growth). Also the data for certain variables was not found to be complete and hence they could not be included in the study such as poverty rates, and suicide rates.
Thus for simple population projection, we prefer the geometric mathematical model, closely followed by the martingales model for population changes.
Future work could explore constructing linear models on population changes with other variables of interest affecting population. The linear models used could be extended to polynomial effects models, and log-transformed models, which could possibly have some explanatory power over the simple linear model used.
Other variables such as political stability index, development index, wealth index, religion indexes, and literacy rates indexes could be included in future work, when the data will be sufficient to do meaningful analysis.
Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., … & Bristow, V. L. (1996). Stochastic processes (Vol. 2). New York: Wiley.
Olive, D. J. (2017). Multiple linear regression. In Linear regression (pp. 17-83). Springer, Cham.