Skip to main content

Can we predict the burden of acute malnutrition in crisis-affected countries? Findings from Somalia and South Sudan

Abstract

Background

Sample surveys are the mainstay of surveillance for acute malnutrition in settings affected by crises but are burdensome and have limited geographical coverage due to insecurity and other access issues. As a possible complement to surveys, we explored a statistical approach to predict the prevalent burden of acute malnutrition for small population strata in two crisis-affected countries, Somalia (2014–2018) and South Sudan (2015–2018).

Methods

For each country, we sourced datasets generated by humanitarian actors or other entities on insecurity, displacement, food insecurity, access to services, epidemic occurrence and other factors on the causal pathway to malnutrition. We merged these with datasets of sample household anthropometric surveys done at administrative level 3 (district, county) as part of nutritional surveillance, and, for each of several outcomes including binary and continuous indices based on either weight-for-height or middle-upper-arm circumference, fitted and evaluated the predictive performance of generalised linear models and, as an alternative, machine learning random forests.

Results

We developed models based on 85 ground surveys in Somalia and 175 in South Sudan. Livelihood type, armed conflict intensity, measles incidence, vegetation index and water price were important predictors in Somalia, and livelihood, measles incidence, rainfall and terms of trade (purchasing power) in South Sudan. However, both generalised linear models and random forests had low performance for both binary and continuous anthropometric outcomes.

Conclusions

Predictive models had disappointing performance and are not usable for action. The range of data used and their quality probably limited our analysis. The predictive approach remains theoretically attractive and deserves further evaluation with larger datasets across multiple settings.

Peer Review reports

Background

In settings affected by crises due to armed conflict, community violence, displacement and/or food insecurity, acute malnutrition is a prominent public health threat that, at the individual level, presents a short-term mortality risk, exacerbates endemic and epidemic infectious diseases and worsens long-term developmental outcomes. Acute malnutrition prevalence among children is also a key summative indicator of crisis severity, as it reflects the wider situation of food security, livelihoods and the public health and social environment [1]. For the purpose of this paper, and in accordance with current Unicef guidance, we refer to acute malnutrition (also commonly known as wasting) as the occurrence of two partially overlapping presentations: marasmus, characterised by a recent and severe weight loss, and the rarer but more lethal oedematous form (kwashiorkor). Anthropometric indices including weight-for-height or -length, middle-upper arm circumference (MUAC) and presence of bilateral pitting oedema may be combined into continuous indicators (e.g. weight-for-height/length Z-score, relative to the mean of a well-nourished reference population: WHZ) or dichotomised based on thresholds to classify children as severely or moderately acutely malnourished (SAM, MAM), and, at the population level, compute prevalence estimates [2]. Such information helps to assess progress towards national and global targets, identify an appropriate package of food security and nutritional services, estimate resources needed (e.g. treatment caseload), monitor the performance of services and detect changes in crisis severity as part of early warning systems such as the integrated food security phase classification (IPC) [3,4,5].

Cross-sectional anthropometric surveys among children 6 to 59 months old (mo) are an important component of nutritional surveillance in crisis settings, along with facility-based and programmatic data [6]. Over the past decade, considerable progress has been made to standardise methods and analysis of these surveys. In particular, the Standardised Monitoring and Assessment of Relief and Transitions (SMART) project [7] provides generic study protocols and aides for survey design, training and quality control, as well as the bespoke Emergency Nutrition Software for sample selection, data entry and analysis. SMART surveys, usually implemented at a small geographic scale (e.g. districts or individual camps), are the most common population-based method to measure malnutrition burden in humanitarian response. However, SMART surveys are somewhat burdensome in terms of human and financial resources, require several weeks to plan, implement and report on, and may have limited geographic reach due to insecurity or other access constraints, thereby resulting in potentially biased, untimely, and/or insufficiently granular information. Otherwise put, surveys alone may not adequately support early detection of deteriorating situations and efficient resource allocation [8]. More recently, COVID-19 related restrictions temporarily curtailed SMART survey implementation, just as the pandemic was expected to contribute to a projected doubling in the global population facing food insecurity crisis conditions, and, consequently, a substantial increase in acute malnutrition burden [9].

To complement small-scale nutrition surveys and other surveillance data, and in order to reduce the burden of repeated surveys while also generating timely information on a more regular basis at operationally useful geographical resolution, we explored the performance of predictive statistical models of acute malnutrition burden in Somalia and South Sudan, two crisis-affected countries prominently affected by service access constraints, food insecurity and malnutrition.

Methods

Study design

We used a combination of existing datasets collected for programmatic purposes by humanitarian and government actors (see below) to develop and evaluate country-specific models to predict various anthropometric indicators at the resolution of one month and a single administrative level 2 unit (district in Somalia, county in South Sudan), hereafter referred to as a ‘stratum’.

Drawing from an a priori causal framework of factors leading to acute malnutrition (Additional file 1, Figure S5), we identified potential predictor variables collected at the desired resolution and merged these with individual child-level data from SMART surveys designed to be representative of single strata. We fitted various candidate models to a training data subset, and evaluated their predictive accuracy on a validation data subset, as well as on cross-validation.

Study population and timeframe

For Somalia (including Somaliland and Puntland), we sourced predictor and anthropometric survey data from January 2014 to December 2018 inclusive. During this period, Somalia’s population rose from about 12.8 M to 14.5 M [10]. Surveys were done in 22 (29%) of Somalia’s 75 districts. For South Sudan, the analysis spanned January 2015 to April 2018, and featured surveys from 63 (80%) of the country’s 79 counties, as per 2013 administrative borders. South Sudan’s population declined from 10.2 M to 9.7 M during the period, reflecting refugee movements to neighbouring countries [11].

Data sources

Anthropometric surveys

We accessed reports and raw datasets of 177 SMART surveys from South Sudan (two were excluded due to very unusual values, leaving 175 analysis-eligible), and 167 from Somalia (82 were excluded: 76, mainly done before 2016, were representative of livelihood zones rather than districts, and thus could not be coupled with predictor data; five appeared to have followed a non-representative sampling design; one had no available dataset, leaving 85 analysis-eligible). For each survey, we inspected the report to identify any possible bias sources and, in particular, any reported restriction of the effective sampling frame due to insecurity or inaccessibility (e.g. if a report stated that two out of 12 boma, South Sudan’s administrative level 3 unit, could not be included in the sample, we approximated the sampling coverage as 10/12 ≈ 83%). We also rescaled the ENA software-reported quality score for the survey (a composite of several indicators including proportion of outlier values, digit preference and properties of the distribution of observed values, ranging from 0% = best to 50% = worst [12]) to a 0–100% range, where best = 100%. We reanalysed all surveys by converting the raw anthropometric readings (weight, height or length, age, MUAC) into z-score indices as per the World Health Organization 2006 standardised anthropometric distributions using the anthro package in R, flagging and excluding all observations with missing values, <  > 5 z-scores from the mean and/or outside the allowed age range (6-59mo). Lastly, we classified all children into severe acute malnutrition (SAM) or global acute malnutrition (GAM) according to two alternative definitions: (i) bilateral oedema and/or weight-for-height (WHZ) < 3Z (SAM) or < 2Z (GAM); (ii) bilateral oedema and/or MUAC < 115 mm (SAM) or < 125 mm (GAM) [13]. We fitted generalised linear models (binomial for SAM and GAM, gaussian otherwise) with standard errors adjusted for cluster design to verify concordance with point estimates and 95% confidence intervals (CI) contained in the survey reports.

Predictors

We developed a causal framework of acute malnutrition (Additional file 1, Figure S5) based on existing evidence and plausibility reasoning. We used this framework to identify factors potentially predicting the outcomes of interest. We searched for candidate predictor data representing these factors online and through contacts with humanitarian actors in both Somalia and South Sudan, the main desirable characteristics of datasets being stratification by stratum and month, and that data be generated routinely for programmatic purposes, i.e. realistically available without further primary data collection. Most datasets had already been sourced as part of similar projects to retrospectively estimate mortality in both countries [10, 11]. Candidate predictors for both Somalia and South Sudan are detailed in Tables 1 and 2, respectively. Each predictor dataset was subjected to data cleaning to remove obvious errors. We excluded predictors that were missing for ≥ 30% of strata or ≥ 30% of months. Remaining completeness problems were resolved through interpolation (humanitarian presence), manual imputation (missing market data points were attributed a weighted average of the geographically nearest market’s value and the mean of all other non-missing markets, with 0.7 and 0.3 weights respectively) and automatic imputation using the mice R package [14] (water price, SAM and MAM treatment quality). To reduce stochastic noise in the time series, we computed three-month window rolling means for all time-varying predictors and applied moderate local spline smoothing to terms of trade or market price variables. Where appropriate, we computed per-population rates using stratum-month population figures previously estimated as part of mortality estimation projects for each country. Briefly, these combine available base estimates (census projections in South Sudan; quality-weighted averages of four alternative sources in Somalia), natural growth assumptions and data on refugee as well as internal displacement to and from each stratum, by month.

Table 1 Candidate predictor datasets, Somalia
Table 2 Candidate predictor datasets, South Sudan

While for both countries data on food security and nutritional therapeutic services were available (Tables 1 and 2) and moderately predictive (data not shown), we ultimately decided to exclude them as candidate predictors for two reasons: (i) we considered that improved prediction could plausibly result in better targeting of these humanitarian services, which in turn would result in improved nutrition, a reverse-causal effect whose future size the model might fail to predict; and (ii) we assumed that end-users would benefit from a model that could be used to predict malnutrition burden even where none of these services were available, e.g. due to access constraints.

Predictive models

We explored two prediction approaches, as follows.

Generalised linear modelling

We first split the data by period into a training set (consisting of approximately the chronologically first 70% of the data) and a ‘holdout’ (i.e. validation) set (the most recent 30%). For each anthropometric indicator, we fitted generalised linear models (GLM) to individual child observations in the training dataset, with robust standard errors to account for the cluster sampling design of most surveys, a quasi-binomial distribution for binary outcomes (SAM, GAM) and a gaussian distribution for continuous outcomes (WHZ, MUAC), which we did not transform as they were normally distributed. We specified model weights as the product of survey quality score and survey sample coverage.

After visual inspection, we categorised continuous predictors, and selected categorical versus continuous versions of these based on linearity of the association and the smallest-possible Chi-square (for binary outcomes) or F-test (continuous outcomes) p-value testing whether the univariate model provided better fit than a null model. We also used this p-value to select among candidate lags for each predictor; however, we modelled climate variables (rainfall, Normalised Difference Vegetation Index or NDVI) as either the means of the two trimesters, or the mean over the semester prior to each survey observation. We then fitted models consisting of all possible combinations of predictors, and shortlisted the best 10% based on predictive accuracy (lowest mean square error, MSE) of model predictions, relative to observations in the holdout dataset. Predictions were compared with observations by first aggregating all individual-child predictions as yielded by the models to the stratum-month level (as a mean SAM or GAM prevalence, or the mean of continuous anthropometric outcomes, in that stratum-month).

We manually selected the best fixed effects model among these based on relative accuracy on holdout data, accuracy on external data simulated through leave-one-out cross-validation (LOOCV) [18], the plausibility of observed associations, and model parsimony (while the latter characteristic is relatively unimportant for prediction, in practice we wished to avoid users of the model having to collect a large amount of predictor data). Lastly, we explored plausible two-way interactions.

We also fitted mixed models (with stratum as a random effect, given that in both countries surveys were repeated in many districts / counties). The latter, however, offered inconsistent accuracy advantages over fixed effects models on either cross-validation or holdout datasets. Furthermore, we assumed that end users would be most interested in predicting malnutrition prevalence in hard-to-survey districts / counties, i.e. where no a priori random effects would be estimable. For these reasons, we discarded mixed models altogether.

Machine learning

After splitting data as above, we used the ranger package [19] to grow random forest (RF) regression models on the training dataset, aggregated at stratum-month level: this approach makes minimal assumptions about data structure; briefly, it partitions the data according to various randomly generated ‘trees’, where each node is defined by a particular value of one of the predictor variables, with branches being the resulting split in the data; the ‘depth’ of each tree is defined by the number of variables that are used to create nodes; randomness is introduced by the choice of variables to build any given tree, values at which splits occur, and the order of variables in the tree structure. The distribution of the outcome arising from the partitions in each tree is compared to the observed data to determine accuracy. RF averages predictions across a large ensemble of trees. We grew RFs with 1000 trees, using all candidate predictors as above, and computed prediction CIs using a jack-knife estimator [20].

Performance evaluation

For both the GLM and RF approach, we present various metrics of predictive accuracy, for estimation: (i) effective coverage, defined here as the proportion of stratum-months for which the predicted point estimate fell within the 95% or 80%CIs of the observed data; (ii) relative bias, defined as \(\frac{1}{n}\sum_{i=1}^{i=n}\frac{{\widehat{y}}_{i}-{y}_{i}}{{y}_{i}}\), where \(n\) is the number of stratum-months, \({\widehat{y}}_{i}\) the prediction and \({y}_{i}\) the observation for stratum-month \(i\); and (iii) relative precision, namely the mean ratio of predicted stratum-month one-sided 95%CIs to point estimate; and for classification: (iv) sensitivity and (v) specificity of predictions against SAM or GAM prevalence thresholds commonly used in humanitarian response, and adopting observed point estimates as the gold standard. While it is recommended to avoid over-reliance on thresholds and instead examine changes in malnutrition burden over time in light of contextual factors [6], in practice these arbitrary thresholds, introduced about two decades ago [21], are considered when the baseline is unclear to make initial decisions on the most appropriate nutritional and food security interventions package (e.g. management of SAM only versus of SAM and MAM; targeted versus ‘blanket’ of generalised food distributions / cash transfers).

For brevity we present only best models for ‘now-casting’ (i.e. prediction of malnutrition based on data collected up to the present). We also explored models for forecasting malnutrition 3 months into the future (i.e. prediction based on data collected up to 3 months previously), but found that these had low performance (data not shown). All analysis was done using R software [22] through the RStudio [23] platform.

Results

Anthropometric survey patterns

Details of eligible surveys from Somalia are reported in Table 3 and Fig. 1. Most surveys were done in 2016 and 2018 and the majority relied on multi-stage cluster sampling, with a fairly constant sample size range over time. The highest SAM and GAM prevalence, but also the lowest quality scores, were noted in 2017, during a drought-triggered food insecurity crisis. In South Sudan, all surveys relied on cluster sampling, and there was minimal change in average SAM and GAM prevalence over time; quality scores and the proportion of flagged observations suggested higher survey quality in South Sudan than in Somalia (Table 4, Fig. 2).

Table 3 Characteristics of analysis-eligible anthropometric surveys from Somalia. Medians are reported unless noted. Numbers in parentheses indicate the interquartile range
Fig. 1
figure 1

Trends in key survey indicators, Somalia. Each dot represents the point estimate of a single survey. Box plots indicate the median and inter-quartile range, and whiskers the 95% percentile interval

Table 4 Characteristics of analysis-eligible anthropometric surveys from South Sudan. Medians are reported unless noted. Numbers in parentheses indicate the interquartile range
Fig. 2
figure 2

Trends in key survey indicators, South Sudan. Each dot represents the point estimate of a single survey. Box plots indicate the median and inter-quartile range, and whiskers the 95% percentile interval

Performance of Somalia models

GLM model coefficients and performance metrics for Somalia are shown in Table 5: odds ratios, OR < 1 and linear coefficients > 0 indicate a protective effect, and vice versa. One predictor (livelihood) consistently featured in the most predictive models (displaced and pastoralist livelihoods were generally associated with better anthropometric status than for agriculturalists). Armed conflict intensity, measles occurrence over the previous trimester, terms of trade, NDVI over the previous semester and average market price of water were useful predictors for some but not all anthropometric outcomes. Generally, predictive performance was low: models yielded mostly upward-biased predictions that fell within the observed survey 95%CIs for only 17% to 80% of stratum-months, depending on the outcome; while denominators were very small, only the model for GAM (WFH + oedema) reached a moderate combination of sensitivity and specificity to classify prevalence as per the 15% threshold. Graphs of predictions versus observations support this pattern; Fig. 3 shows results for SAM (WFH + oedema), while remaining graphs are in the Additional file 1.

Table 5 Performance of predictive generalised linear models in Somalia for real-time estimation, by acute malnutrition outcome
Fig. 3
figure 3

GLM-predicted versus observed SAM (WFH + oedema) prevalence, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate an absolute deviance of predictions of up to ±1% (darkest shade), ±2% and ±3% (lightest shade). Vertical dotted lines denote commonly used SAM prevalence thresholds

RF models had similar performance to the GLM approach. For GAM (WFH + oedema: binary outcome), relative bias, relative precision and 95%CI coverage were +10.1% and + 31.6%, ± 23.0% and ± 17.7%, and 59.6% and 56.7% on LOOCV and holdout data, respectively, with a sensitivity and specificity on LOOCV of 72.0% and 59.1% for the 15% prevalence threshold. The most important variables for prediction were measles incidence, NDVI, terms of trade and water price (Additional file 1). For WFH (continuous outcome), relative bias, relative precision and 95%CI coverage were + 7.1% and + 29.5%, ± 19.1% and ± 13.1%, and 57.4% and 30.0% on LOOCV and holdout data, respectively (Additional file 1).

Performance of South Sudan models

Table 6 shows GLM predictions for South Sudan. Here, the most significant associations were with livelihood type, total rainfall and terms of trade. Predictive performance was also low (Fig. 4), with coverage no better than 82% across all outcomes and no instance of high sensitivity and specificity for classification.

Table 6 Performance of predictive generalised linear models in South Sudan, by acute malnutrition outcome
Fig. 4
figure 4

GLM-predicted versus observed SAM (WFH + oedema) prevalence, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate an absolute deviance of predictions of up to ±1% (darkest shade), ±2% and ±3% (lightest shade). Vertical dotted lines denote commonly used SAM prevalence thresholds

RF models had far better fit to the training data than GLMs, but performed similarly on cross-validation and holdout data. The most important variables were livelihood, terms of trade, uptake of measles vaccination and total rainfall (Additional file 1).

Discussion

In this study we combined a range of previously collected, anthropometric household survey data with a range of potential population-level predictor datasets quantifying theoretical factors causally associated with acute malnutrition burden in crisis settings, to explore whether key quantities such as SAM or GAM prevalence could be estimated through prediction, as a complement to ground surveys. Resulting predictive models based on either GLM or machine learning approaches had disappointing performance in both Somalia and South Sudan across several anthropometric outcomes. Generally, predictive accuracy was better for outcomes based on WFH than on MUAC, but even for the former our models would not, in our opinion, provide actionable information.

Models to predict acute malnutrition risk at the individual or household level exist [24, 25]. While we did not search the literature systematically due to insufficient resources, we are aware of only two other population-level predictive studies. Osgood-Zimmerman et al. [26] produced gridded maps of various anthropometric indicators for all of Sub-Saharan Africa based on periodic countrywide surveys (e.g. Demographic and Health Surveys) and > 20 geospatial remotely sensed or previously estimated predictors; Mude et al. [27] predicted with reasonable accuracy MUAC across time and space in northern Kenya based on village-level data collected for food security surveillance by the Arid Lands Resource Management Project, with predictors including the characteristics of observed MUAC data themselves, cattle herd dynamics, extent of food aid, climate and season. At least one further research project is ongoing (https://www.actionagainsthunger.org/meriam). Bosco et al. [28] have used geospatial and remotely sensed covariates to map stunting prevalence, while Lentz et al. [29] have also demonstrated the potential of a GLM-based approach for predicting food insecurity in Malawi. We have previously used the same datasets as in this study to develop reasonably predictive models of population-level death rate (a farther-downstream and thus potentially even more multifactorial outcome), albeit only for retrospective estimation [10, 11].

Given the above, we expected better predictive performance. It is plausible that additional data on factors causally associated with acute malnutrition, including infant and young child feeding practices, use of food security coping strategies, dietary diversity, access to water, sanitation and hygiene services and health service utilisation would have improved prediction: these data are sometimes generated in crisis settings through cross-sectional surveys, but to our knowledge are not typically available at the granular level required for our predictive problem. It is also likely that problems with available data quality constrained model accuracy. Non-differential error or misclassification arising from measurement problems (e.g. imprecise child anthropometric measurements) and data entry errors would generally reduce model goodness-of-fit and bias estimated associations towards the null: observed-versus-predicted graphs generally suggest ‘regression dilution’ [30], a phenomenon whereby predictions align around an underestimated linear slope, consistent with high noise in predictor variables. Differential error may also have affected model accuracy in various ways. For example, the predictive value of certain variables would have been dampened if anthropometric surveys had systematically underestimated acute malnutrition in the very locations where those predictors exhibited their most extreme values, as might be plausible for surveys done in very remote, insecure locations and thus constrained by time, local staff competency or the need to exclude unreachable communities from the effective sampling frame. We attempted to mitigate such bias by down-weighting lower-quality surveys with evidence of sampling frame selection bias, but models without this weight were not substantively different (data not shown). Pragmatically, these data quality limitations illustrate the challenges of prediction based on data not collected for research.

Our study aim was not to explore associations: as such, we focussed on accuracy and, for example, ignored significant effect modifications that did not improve prediction. Observed GLM associations and variable importance metrics for RF are nonetheless informative. Measles incidence and rainfall or NDVI had plausible associations with most outcomes in both countries, while water price had a very strong association in Somalia. Terms of trade, however, were important in South Sudan but marginal in Somalia. We saw inconsistent associations with forced displacement or armed conflict intensity, though these have been documented elsewhere [31], and, critically, rainfall abnormalities (as opposed to total precipitation) were not an important predictor in any model. A recent review of 90 studies concludes that acute malnutrition is understudied relative to chronic malnutrition (stunting); the review also finds that, while adequate rainfall during the growing season has been associated with less acute malnutrition, relationships with drought and armed conflict are inconclusive [32]. Indeed, the interplay of unusual climate events and armed conflict has proved challenging for food security prediction [33]. More generally, our and others’ findings underscore the context-specific complexity of causal pathways leading to acute malnutrition. They may also reflect the relative noisiness of different datasets, i.e. their accuracy.

Aside from data limitations, our analysis does not thoroughly explore available predictive methods. Among GLM-based approaches, it is possible that different transformations of outcomes or predictors, as well as methods to identify the most informative variables, such as lasso regression, could have yielded improved performance. Among machine learning methods, boosted regression trees could have reduced bias. We note however that these methods would need to yield very considerable improvements over those we used in order to produce useful predictions.

Conclusions

This analysis suggests that predictive modelling for acute malnutrition burden in crisis settings may not be an immediately viable alternative to ground surveys, at least in the countries studied. Given the potential benefit of such an approach [5], we nonetheless recommend further study, possibly in other settings, using larger datasets and more advanced machine learning methods (boosted regression trees, support vectors, neural networks) and/or Bayesian frameworks. To facilitate such research, as well as other publicly beneficial analyses, humanitarian actors should systematically make key datasets, including but not limited to anthropometric surveys, publicly available in curated, accessible form [34]. These include, but are not limited to, service data from different sectors (e.g. outpatient consultations; vaccination coverage; anthropometric screening data among outpatient children and pregnant women; admissions and exit outcomes for management of acute malnutrition; water availability and quality; coverage of excreta disposal; food security service beneficiaries and Kcal equivalents); market data (e.g. staple prices); morbidity and mortality surveillance data; cross-sectional surveys measuring food security, dietary diversity and infant and young child feeding practices; protection assessments; surveys of perceptions of affected populations; humanitarian presence and activity who-does-what-where matrices; and alternative data on insecurity (e.g. incidents monitored by the UN country team) or humanitarian access (e.g. road safety). A simple principle could be to publish all data barring any whose public availability could place humanitarian actors or affected people at unacceptable risk; aggregation and anonymisation may mitigate such risks. Lastly, any studies to date to predict population-level nutrition burden should be synthesised to identify actionable evidence and guide further analysis.

Availability of data and materials

The data that support the findings of this study are available from various United Nations and non-governmental agencies, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not all publicly available. Data are however available from the authors upon reasonable request and with permission of the above agencies. We have uploaded curated R scripts and all Somalia data on https://github.com/francescochecchi/acute_malnutrition_predictive_models.

Abbreviations

ACLED:

Armed Conflict Location & Event Data Project

AWSD:

Aid Worker Security Database

CI:

Confidence interval

ENA:

Emergency Nutritional Assessment software

FEWS NET:

Famine Early Warning Systems Network

FSNAU:

Food Security and Nutrition Analysis Unit – Somalia

GAM:

Global acute malnutrition

GLM:

Generalised linear model

IPC:

Integrated Food Security Phase Classification

LOOCV:

Leave-one-out cross-validation

MAM:

Moderate acute malnutrition

mo:

Months old

MUAC:

Middle-upper arm circumference

MSE:

Mean square error

NDVI:

Normalised Difference Vegetation Index

RF:

Random forest

SAM:

Severe acute malnutrition

SMART:

Standardised Monitoring and Assessment of Relief and Transitions

UN:

United Nations

WHZ:

Weight-for-height Z score

References

  1. Young H, Borrel A, Holland D, Salama P. Public nutrition in complex emergencies. Lancet. 2004;364:1899–909.

    Article  Google Scholar 

  2. Young H, Jaspars S. The meaning and measurement of acute malnutrition in emergencies. Humanitarian Practice Network. 2006;44:1–60.

    Google Scholar 

  3. Tuffrey, V. A perspective on the development and sustainability of nutrition surveillance in low-income countries. BMC Nutr. 2016;2(15): 1–18. https://doi.org/10.1186/s40795-016-0054-x.

  4. Tuffrey V, Hall A. Methods of nutrition surveillance in low-income countries. Emerg Themes Epidemiol. 2016;13:4.

    Article  Google Scholar 

  5. Maxwell D, Hailey P. Towards Anticipatory Information Systems and Action. Tufts - Feinstein International Center.

  6. Checchi F, Warsame A, Treacy-Wong V, Polonsky J, van Ommeren M, Prudhon C. Public health information in crisis-affected populations: a review of methods and their use for advocacy and action. Lancet. 2017;390:2297–313.

    Article  Google Scholar 

  7. Standardised Monitoring and Assessment of Relief and Transitions (SMART). Measuring Mortality, Nutritional Status, and Food Security in Crisis Situations: SMART Methodology. https://smartmethodology.org/. Accessed 14 Feb 2021.

  8. Maxwell D, Hailey P, Spainhour Baker L, Kim JJ. Constraints and complexities of information and analysis in humanitarian emergencies: evidence from Yemen. Feinstein International Center: Tufts University and Centre for Humanitarian Change; 2019.

    Google Scholar 

  9. Global Report on Food Crises 2021. Global Network against Food Crises, Food Security Information Network. 2021.

    Google Scholar 

  10. Warsame A, Frison, Severine, Gimma A, Checchi F. Retrospective estimation of mortality in Somalia, 2014–2018: a statistical analysis - Somalia. ReliefWeb. 2020. https://reliefweb.int/report/somalia/retrospective-estimation-mortality-somalia-2014-2018-statistical-analysis. Accessed 11 Jan 2021.

  11. Checchi F, Testa, Adrienne, Warsame, Abdihamid, Quach, Le, Burns, Rachel. Estimates of crisis-attributable mortality in South Sudan, December 2013- April 2018: A statistical analysis - South Sudan. ReliefWeb. 2018. https://reliefweb.int/report/south-sudan/estimates-crisis-attributable-mortality-south-sudan-december-2013-april-2018. Accessed 11 Jan 2021.

  12. Erhardt J. Emergency Nutrition Assessment (ENA) Software for SMART. 2020.

    Google Scholar 

  13. Frison S, Kerac M, Checchi F, Prudhon C. Anthropometric indices and measures to assess change in the nutritional status of a population: a systematic literature review. BMC nutrition. 2016;2:76.

    Article  Google Scholar 

  14. van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011;45:1–67.

    Article  Google Scholar 

  15. Funk C, Peterson P, Landsfeld M, Pedreros D, Verdin J, Shukla S, et al. The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Scientific Data. 2015;2:150066.

    Article  Google Scholar 

  16. Raleigh C, Linke A, Hegre H, Karlsen J. Introducing ACLED: an armed conflict location and event dataset: special data feature. J Peace Res. 2010;47:651–60.

    Article  Google Scholar 

  17. South Sudan Livelihood Zones and Descriptions. Washington, DC: Famine Early Warning Systems Network. https://fews.net/sites/default/files/documents/reports/Livelihoods%20Zone%20Map%20and%20Descriptions%20for%20South%20Sudan.pdf. Accessed 13 Sep 2021.

  18. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005;21:3301–7.

    CAS  Article  Google Scholar 

  19. Wright MN, Ziegler A. ranger : a fast implementation of random forests for high dimensional data in C++ and R. J Stat Soft. 2017;77:1.

    Article  Google Scholar 

  20. Wager S, Hastie T, Efron B. Confidence intervals for random forests: The Jackknife and the Infinitesimal Jackknife. J Mach Learn Res. 2014;15:1625–51.

    PubMed  PubMed Central  Google Scholar 

  21. World Health Organization. The management of nutrition in major emergencies. Geneva: WHO; 2000.

    Google Scholar 

  22. R Core Team. A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2020.

    Google Scholar 

  23. RStudio Team. RStudio: Integrated Development Environment for R. Boston: RStudio, PBC; 2020.

    Google Scholar 

  24. Mukuku O, Mutombo AM, Kamona LK, Lubala TK, Mawaw PM, Aloni MN, et al. Predictive model for the risk of severe acute malnutrition in children. J Nutr Metab. 2019;2019:1–7.

    Article  Google Scholar 

  25. Islam MM, Alam M, Tariquzaman M, Kabir MA, Pervin R, Begum M, et al. Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model. BMC Public Health. 2013;13:11.

    Article  Google Scholar 

  26. Osgood-Zimmerman A, Millear AI, Stubbs RW, Shields C, Pickering BV, Earl L, et al. Mapping child growth failure in Africa between 2000 and 2015. Nature. 2018;555:41–7.

    CAS  Article  Google Scholar 

  27. Mude AG, Barrett CB, McPeak JG, Kaitho R, Kristjanson P. Empirical forecasting of slow-onset disasters for improved emergency response: An application to Kenya’s arid north. Food Policy. 2009;34:329–39.

    Article  Google Scholar 

  28. Bosco C, Alegana V, Bird T, Pezzulo C, Bengtsson L, Sorichetta A, et al. Exploring the high-resolution mapping of gender-disaggregated development indicators. J R Soc Interface. 2017;14:20160825.

    Article  Google Scholar 

  29. Lentz EC, Michelson H, Baylis K, Zhou Y. A data-driven approach improves food insecurity crisis prediction. World Dev. 2019;122:399–409.

    Article  Google Scholar 

  30. Hutcheon JA, Chiolero A, Hanley JA. Random measurement error and regression dilution bias. BMJ. 2010;340:c2289–c2289.

    Article  Google Scholar 

  31. Iacoella F, Tirivayi N. Child nutrition during conflict and displacement: evidence from areas affected by the Boko Haram insurgency in Nigeria. Public Health. 2020;183:132–7.

    CAS  Article  Google Scholar 

  32. Brown ME, Backer D, Billing T, White P, Grace K, Doocy S, et al. Empirical studies of factors associated with child malnutrition: highlighting the evidence about climate and conflict shocks. Food Sec. 2020;12:1241–52.

    Article  Google Scholar 

  33. Krishnamurthy PK, Choularton RJ, Kareiva P. Dealing with uncertainty in famine predictions: how complex events affect food security early warning skill in the Greater Horn of Africa. Glob Food Sec. 2020;26:100374.

    Article  Google Scholar 

  34. Maxwell D, Gottlieb G, Coates J, Radday A, Kim J, Venkat A, et al. Humanitarian Information Systems: Anticipating, Analyzing, and Acting in Crisis. Tufts - Feinstein International Center. https://fic.tufts.edu/research-item/the-constraints-and-complexities-of-information-and-analysis/. Accessed 14 Feb 2021.

Download references

Acknowledgements

We are grateful to Anna Carnegie for project management support, to Adrienne Testa for collection of South Sudan datasets used for a separate study and to Claire Dooley for analysis script review and spotting an important error. In both countries, we express gratitude to governmental agencies and partners who collected and/or contributed data used for this analysis.

Disclaimer

Geographical names and boundaries presented in this report are used solely for the purpose of producing scientific estimates, and do not necessarily represent the views or official positions of the authors, the London School of Hygiene and Tropical Medicine, any of the agencies that have supplied data for this analysis, or the donor. The authors are solely responsible for the analyses presented here, and acknowledgment of data sources does not imply that the agencies or individuals providing data endorse the results of the analysis.

Funding

The analysis was funded by the United Nations Children’s Fund. SF, AW and FC were also partly funded by UK Research and Innovation as part of the Global Challenges Research Fund, grant number ES/P010873/1. Collection of predictor data was funded by the United States Institute of Peace (South Sudan) and the UK Foreign, Commonwealth and Development Office (FCDO; formerly Department for International Development) through the Research for Evidence Division (RED) for the benefit of developing countries (Somalia). The views expressed and information contained in his paper are solely those of the authors and are not necessarily those of or endorsed by UNICEF, FCDO, and USIP, none of which can accept no responsibility for such views or information or for any reliance placed on them.

Author information

Authors and Affiliations

Authors

Contributions

FC designed the methods, managed data, did statistical analysis and wrote this paper. SF collected and managed data and wrote this paper. MN helped design the study and coordinated data collection. All other authors collected and managed data. All authors reviewed, edited and approved the final manuscript.

Corresponding author

Correspondence to Francesco Checchi.

Ethics declarations

Ethics approval and consent to participate

All data were previously collected for routine humanitarian response and/or public health service provision purposes, and were either in the public domain or shared in fully anonymised format. Caregivers of anthropometric survey participants provided verbal informed consent as per SMART protocol standard procedures. The study was approved by the Ethics Committee of the London School of Hygiene & Tropical Medicine (ref. 15334/RR/14437) and the Somali Ministry of Health and Human Services’ Research and Ethics Committee (ref. MOH&HS/DGO/0701/May/2019). No reply was received to a related application to the ethics committee of South Sudan’s Ministry of Health. All analyses were performed in accordance with relevant guidelines and regulations, including the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Figure S5. Causal framework for acute malnutrition among children, used to identify potential predictors. Figure S6. GLM-predicted versus observed SAM (MUAC + oedema) prevalence, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used SAM prevalence thresholds. Figure S7. GLM-predicted versus observed GAM (WFH + oedema) prevalence, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used GAM prevalence thresholds. Figure S8. GLM-predicted versus observed GAM (MUAC + oedema) prevalence, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used GAM prevalence thresholds. Figure S9. GLM-predicted versus observed mean WFH, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote potentially useful thresholds. Figure S10. GLM-predicted versus observed mean MUAC, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote potentially useful thresholds. Table S7. Performance of random forest models in Somalia, by acute malnutrition outcome. Figure S11. RF-predicted versus observed GAM (WFH + oedema) prevalence, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used GAM prevalence thresholds. Figure S12. RF-predicted versus observed mean WFH, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote potentially useful thresholds. Figure S13. GLM-predicted versus observed SAM (MUAC + oedema) prevalence, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used SAM prevalence thresholds. Figure S14. GLM-predicted versus observed GAM (WFH + oedema) prevalence, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used GAM prevalence thresholds. Figure S15. GLM-predicted versus observed GAM (MUAC + oedema) prevalence, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used GAM prevalence thresholds. Figure S16. GLM-predicted versus observed mean WFH, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote potentially useful thresholds. Figure S17. GLM-predicted versus observed mean MUAC, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote potentially useful thresholds. Table S8. Performance of random forest models in South Sudan, by acute malnutrition outcome. Figure S18. RF-predicted versus observed GAM (WFH + oedema) prevalence, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote commonly used GAM prevalence thresholds. Figure S19. RF-predicted versus observed mean WFH, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate different absolute deviance of predictions. Vertical dotted lines denote potentially useful thresholds.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Checchi, F., Frison, S., Warsame, A. et al. Can we predict the burden of acute malnutrition in crisis-affected countries? Findings from Somalia and South Sudan. BMC Nutr 8, 92 (2022). https://doi.org/10.1186/s40795-022-00563-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40795-022-00563-2

Keywords

  • Malnutrition
  • Acute malnutrition
  • Wasting
  • Undernutrition
  • South Sudan
  • Somalia
  • Food insecurity
  • Crisis
  • Humanitarian
  • Prediction
  • Statistical model