Supplementary Information from Exploring the High-Resolution Mapping of Gender Disaggregated Development Indicators. Bosco, C.; Alegana, V. A.; Bird, T.; Pezzulo, C.; Bengtsson, L.; Sorichetta, A.; Steele, J.; Hornby, G.; Ruktanonchai, C. W.; Ruktanonchai, N. W.; Wetter, E.; and Tatem, A. J. .
Supplementary Information from Exploring the High-Resolution Mapping of Gender Disaggregated Development Indicators [link]Paper  doi  abstract   bibtex   
[Excerpt: Datasets] The Demographic and Health Surveys (DHS) is a program of national household surveys implemented across a large number of LMICs. The DHS Program collects and analyses data on population demographic and health characteristics through more than 300 surveys in over 90 countries. The gender-disaggregated data we investigated in this report come from DHS datasets. [] [...] [Models specification] [::Bayesian model specification] The Gaussian Function (GF) in INLA is represented as a Gaussian Markov Random Function (GMRF). Computations in INLA are carried out using the GMRF by approximating a set of spatial-temporal random function with weighted sum of basis functions. The advantage of computation using the GMRF as approximations to GF with Matérn covariance is due to the Markovian property of the former resulting in sparse matrices that are computationally efficient. [...] [::Artificial Neural Networks specification] An artificial neuron is a computational model inspired by natural neurons. Natural neurons receive signals through synapses located on the dendrites or membrane of the neuron. When the signals received are strong enough (surpass a certain threshold), the neuron is activated and emits a signal though the axon. This signal might be sent to another synapse, and might activate other neurons. [] The complexity of real neurons is highly abstracted when modelling artificial neurons. These basically consist of inputs (like synapses), which are multiplied by weights (strength of the respective signals), and then computed by a mathematical function which determines the activation of the neuron. Another function computes the output of the artificial neuron . An ANN is implemented by a system of interconnected nodes. Information propagates through nodes transforming the inputs in intermediate derived signals up to generate the final outputs. The internal nodes are called neurons and define the ANN hidden layers. Each node is a processing element propagating weighted inputs received from other nodes [...] [Selection of geospatial covariate layers] For obtaining a more appropriate combination of covariates to produce high-resolution prediction maps for each of the modelled indicators, a sensitivity analysis using a jackknife approach was carried out [...]. The jackknife analysis consists of dropping one observation at a time from one set of data, and calculating the estimate each time. It was developed by Maurice Quenouille, (1949, 1956) and John Tukey (1958) expanded on the technique and proposed the name "jackknife". [] Within the modelling architectures, categorical covariates with more than two levels were recoded into a number of separate dichotomous variables in order for the results to be interpretable. All covariates were also normalized to make all variables have a mean of zero and unit variance. [...] [Semantic Array programming] Managing heterogeneous arrays of data and data transformation models in a systematic and structured way is a challenging task. The multiplicity of model families, covariates and modelled quantities in this work required the support of a common, flexible and scalable modelling architecture. The applied modelling architecture is based on the Semantic Array Programming (SemAP) paradigm (de Rigo, 2012; 2015). Array programming (AP) emerged as a way to reduce the gap between mathematical notation and algorithm implementations by promoting arrays (vectors, matrices, tensors) as atomic quantities with compact manipulating operators (Iverson, 1980). Atomicity here implies that even a large array of data is managed as a single logical piece of information. For example, a regional-scale gridded layer may be managed by AP languages as if it were a single variable instead of a large matrix of elements.A disciplined use of AP (Iverson, 1980) may allow nontrivial data-processing to be expressed with very concise expressions (Taylor, 2003) and a potentially simpler control flow. However, this capability for abstraction and simplification of AP may be limited by the very same generality of AP data structures-multi-dimensional arrays where the value of some elements may be infinite or not-a-number (IEEE 754 standard) or even complex-valued (de Rigo, 2015). The Semantic Array Programming paradigm has been introduced for supporting a disciplined semantics-aware implementation of AP concepts and methods, with additional systematic semantic checks for the semantic correctness of the chain of modelling blocks (de Rigo, 2012). [] This is why our computational modelling methodology follow the SemAP paradigm by combining concise implementation of the model with its conceptual subdivision in semantically enhanced abstract modules. [...] [Results] This section presents the results for the gender-disaggregated indicator mapping addressed in this project. We organize the presentation of results by indicator, at gender disaggregated level, in the following order: literacy, stunting in children, use of modern contraception methods. For each indicator, the results of a first exploratory analysis are presented with gender disaggregated histograms showing the basic statistical distribution of the indicator at cluster level and a scatter plot of the predicted versus observed data both in training and validation. We then present the results of the covariate selection exercise, detailing which covariates were selected as the optimum performing set for the given indicator for each country at gender disaggregated level and, for each indicator having an associated modelling explained variance higher than 0.3, we show maps of the survey clusters and the indicator value at each cluster, maps of the predicted proportion of modelled indicators and the level of uncertainty associated with these maps in each pixel, and finally the quantile-quantile (QQ) plot in training and validation. The maps reported in the following paragraphs are: male and female literacy rate in Nigeria and Kenya, female literacy rate in Tanzania, male and female stunting in Nigeria and the proportion of women using modern contraception methods in Nigeria and Tanzania. [...] [] [...]
@report{boscoSupplementaryInformationExploring2017,
  title = {Supplementary {{Information}} from {{Exploring}} the High-Resolution Mapping of Gender Disaggregated Development Indicators},
  author = {Bosco, Claudio and Alegana, Victor A. and Bird, Tomas and Pezzulo, Carla and Bengtsson, Linus and Sorichetta, Alessandro and Steele, Jessica and Hornby, Graeme and Ruktanonchai, Corrine W. and Ruktanonchai, Nick W. and Wetter, Erik and Tatem, Andrew J.},
  date = {2017},
  institution = {{figshare, Digital Science}},
  location = {{Cambridge, Massachusetts, United States}},
  doi = {10.6084/m9.figshare.4775374},
  url = {https://doi.org/10.6084/m9.figshare.4775374},
  abstract = {[Excerpt: Datasets] The Demographic and Health Surveys (DHS) is a program of national household surveys implemented across a large number of LMICs. The DHS Program collects and analyses data on population demographic and health characteristics through more than 300 surveys in over 90 countries. The gender-disaggregated data we investigated in this report come from DHS datasets.

[] [...]

[Models specification]

[::Bayesian model specification] The Gaussian Function (GF) in INLA is represented as a Gaussian Markov Random Function (GMRF). Computations in INLA are carried out using the GMRF by approximating a set of spatial-temporal random function with weighted sum of basis functions. The advantage of computation using the GMRF as approximations to GF with Matérn covariance is due to the Markovian property of the former resulting in sparse matrices that are computationally efficient. [...]

[::Artificial Neural Networks specification] An artificial neuron is a computational model inspired by natural neurons. Natural neurons receive signals through synapses located on the dendrites or membrane of the neuron. When the signals received are strong enough (surpass a certain threshold), the neuron is activated and emits a signal though the axon. This signal might be sent to another synapse, and might activate other neurons. [] The complexity of real neurons is highly abstracted when modelling artificial neurons. These basically consist of inputs (like synapses), which are multiplied by weights (strength of the respective signals), and then computed by a mathematical function which determines the activation of the neuron. Another function computes the output of the artificial neuron . An ANN is implemented by a system of interconnected nodes. Information propagates through nodes transforming the inputs in intermediate derived signals up to generate the final outputs. The internal nodes are called neurons and define the ANN hidden layers. Each node is a processing element propagating weighted inputs received from other nodes [...] 

[Selection of geospatial covariate layers]

For obtaining a more appropriate combination of covariates to produce high-resolution prediction maps for each of the modelled indicators, a sensitivity analysis using a jackknife approach was carried out [...]. The jackknife analysis consists of dropping one observation at a time from one set of data, and calculating the estimate each time. It was developed by Maurice Quenouille, (1949, 1956) and John Tukey (1958) expanded on the technique and proposed the name "jackknife". 

[] Within the modelling architectures, categorical covariates with more than two levels were recoded into a number of separate dichotomous variables in order for the results to be interpretable. All covariates were also normalized to make all variables have a mean of zero and unit variance. [...]

[Semantic Array programming]

Managing heterogeneous arrays of data and data transformation models in a systematic and structured way is a challenging task. The multiplicity of model families, covariates and modelled quantities in this work required the support of a common, flexible and scalable modelling architecture. The applied modelling architecture is based on the Semantic Array Programming (SemAP) paradigm (de Rigo, 2012; 2015). Array programming (AP) emerged as a way to reduce the gap between mathematical notation and algorithm implementations by promoting arrays (vectors, matrices, tensors) as atomic quantities with compact manipulating operators (Iverson, 1980). Atomicity here implies that even a large array of data is managed as a single logical piece of information. For example, a regional-scale gridded layer may be managed by AP languages as if it were a single variable instead of a large matrix of elements.A disciplined use of AP (Iverson, 1980) may allow nontrivial data-processing to be expressed with very concise expressions (Taylor, 2003) and a potentially simpler control flow. However, this capability for abstraction and simplification of AP may be limited by the very same generality of AP data structures-multi-dimensional arrays where the value of some elements may be infinite or not-a-number (IEEE 754 standard) or even complex-valued (de Rigo, 2015). The Semantic Array Programming paradigm has been introduced for supporting a disciplined semantics-aware implementation of AP concepts and methods, with additional systematic semantic checks for the semantic correctness of the chain of modelling blocks (de Rigo, 2012).

[] This is why our computational modelling methodology follow the SemAP paradigm by combining concise implementation of the model with its conceptual subdivision in semantically enhanced abstract modules. [...]

[Results]

This section presents the results for the gender-disaggregated indicator mapping addressed in this project. We organize the presentation of results by indicator, at gender disaggregated level, in the following order: literacy, stunting in children, use of modern contraception methods. For each indicator, the results of a first exploratory analysis are presented with gender disaggregated histograms showing the basic statistical distribution of the indicator at cluster level and a scatter plot of the predicted versus observed data both in training and validation. We then present the results of the covariate selection exercise, detailing which covariates were selected as the optimum performing set for the given indicator for each country at gender disaggregated level and, for each indicator having an associated modelling explained variance higher than 0.3, we show maps of the survey clusters and the indicator value at each cluster, maps of the predicted proportion of modelled indicators and the level of uncertainty associated with these maps in each pixel, and finally the quantile-quantile (QQ) plot in training and validation. The maps reported in the following paragraphs are: male and female literacy rate in Nigeria and Kenya, female literacy rate in Tanzania, male and female stunting in Nigeria and the proportion of women using modern contraception methods in Nigeria and Tanzania. [...]

[] [...]},
  keywords = {*imported-from-citeulike-INRMM,~INRMM-MiD:c-14332189,artificial-neural-networks,education,food-security,indicators,kenya,literacy,mapping,nigeria,population-growth,poverty,semantic-array-programming,spatial-disaggregation,statistics,stunting,tanzania}
}
Downloads: 0