Record linkage
Data were manually cleaned, and duplicates were removed from each key informant’s list. Linkage criteria (Additional file 1: Table S1) were developed to establish, for each site, which lists a given decedent was included in. Individuals' names in Yemen usually consist of first name, father’s name, grandfather’s name and lastly tribe name or area/ district of origin. Records with missing first name, gender, age, and/or year of death were excluded from analysis. The year and age at death were averaged if informants reported discordant values.
Capture-recapture analysis
Capture-recapture analysis examines the overlap among lists \(L\) to estimate the number of individuals (in this case decedents) who have not been captured by any list. This estimate, summed to the number of individuals appearing on at least one list, provides the total. In this study, lists consisted for each site of the records collected from specific categories of informants. After record linkage, site data consisted of two, three or four lists (see below and Table 1).
In a two-list scenario, each decedent \(x\in \{\mathrm{1,2},3\dots N\}\) has status \({x}_{10}\) if named within \({L}_{1}\) only, \({x}_{01}\) if named within \({L}_{2}\) only, \({x}_{11}\) if named by both lists and \({x}_{00}\) if not captured by either list. The resulting contingency table consists of four cells, \(n_{10}\), \(n_{01}\), \(n_{11}\) and\(n_{00}\), the last of which is unknown. We used the simple Chapman estimator to estimate the total number of deaths as \(\hat{N} = n_{10} + n_{01} + n_{11} + \hat{n}_{00} = \left( {\frac{{\left( {n_{10} + 1} \right)\left( {n_{01} + 1} \right)}}{{\left( {n_{11} + 1} \right)}}} \right){-} 1\); we computed a confidence interval (CI) as \([e^{{ - z_{\alpha /2} \hat{\sigma }_{0.5} }} \varphi ,e^{{z_{\alpha /2} \hat{\sigma }_{0.5} }} \varphi\)], where \(\varphi ={n}_{10}+{n}_{01}-{n}_{11}-0.5+\frac{\left({n}_{10}-{n}_{11}+ 0.5\right)\left({n}_{01}-{n}_{11}+ 0.5\right)}{\left({n}_{11}+ 0.5\right)}\), \({z}_{\alpha /2}\) is the normal distribution quantile for a given significance level of interest (1.96 for \(\alpha =0.05\) or 95%CI) and\({\widehat{\sigma }}_{0.5}=\sqrt{\frac{1}{{n}_{11}+0.5}+\frac{1}{({n}_{10}-{n}_{11}+0.5)}+\frac{1}{({n}_{10}-{n}_{11}+0.5)}+\frac{{n}_{11}+0.5}{({n}_{10}-{n}_{11}+0.5)({n}_{10}-{n}_{11}+0.5)}}\), as per Sadinle [23].
In a three-list scenario, the overlap among lists \(L\) may be represented by eight alternative candidate log-linear Poisson models, each of which features terms for the probability of appearing on any given list, as well as two-way interaction terms representing potential dependencies among lists: these models range from one with no interaction terms to a model featuring all the two-way interactions \({L}_{1}\times {L}_{2}\), \({L}_{2}\times {L}_{3}\) and \({L}_{1}\times {L}_{3}\). We wished to also include in the models an exposure (period before and during the COVID-19 pandemic in Yemen) and potential confounding variables (age, gender). To allow for continuous covariates, we used Rossi et al. [24] parametrisation of log-linear models, whereby the dataset is expanded to feature, for each individual, all potential list statuses (\({x}_{000}, {x}_{100}, {x}_{101}, {x}_{001}, {x}_{110}, {x}_{101}, {x}_{011}, {x}_{111}\)); an outcome of 1 for the actual status, missing for status \({x}_{000}\) and 0 otherwise; and any covariate values. The model, once fit, is used to predict \({\widehat{x}}_{000}\), interpretable as each individual’s contribution to \({\widehat{n}}_{000}\), the estimate of uncaptured deaths (i.e. \({\widehat{n}}_{000}=\sum_{i=1}^{N}{\widehat{x}}_{000}\)); this quantity may of course be stratified by exposure stratum. This estimation framework can easily be extended to the four-list scenario, which however entails a larger set of models, featuring both three-list and/or (hierarchically non-redundant) two-list interactions.
While conventional capture-recapture analysis selects the best-fitting among candidate models, we adapted Rossi et al.’s suggested approach for averaging multiple models [25]. First, we screened out models that did not fit (e.g. due to sparse overlap among lists), yielded an implausible \({\widehat{n}}_{000(0)}\) (defined as ≥ 10 times the number of listed deaths) or featured a likelihood-ratio test p-value ≥ 0.60 when compared to the saturated model (indicating potential overfitting). At this stage, we also assessed whether to retain any potential confounder covariates, based on likelihood-ratio tests compared to the no-confounder model and inspection of estimates with and without the confounder. For each shortlisted model \(i\in \left\{\mathrm{1,2},3\dots K\right\}\), we then computed a weight (equivalent to a Bayesian posterior probability) between 0 and 1 \({w}_{i}=\frac{{e}^{-{\Delta }_{i}/2}}{\sum_{i=1}^{K}{e}^{-{\Delta }_{i}/2}}\), where \({\Delta }_{i}={AIC}_{i}-{AIC}_{\mathrm{min}}\), i.e. the difference between the model’s Akaike Information Criterion (AIC) and the lowest AIC among all shortlisted models (the AIC is a goodness-of-fit indicator that rewards predictive accuracy and parsimony, i.e. the fewest possible model terms). Finally, we computed a weighted average estimate of \({\widehat{n}}_{000(0)}=\sum_{i=1}^{K}{w}_{k}{\widehat{n}}_{000\left(0\right),i}\). We present results overall and by period.
Alternative list groupings
Capture-recapture analysis is unfeasible when the overlap among the lists is very poor, causing models not to fit (see above). To avoid these problems, for our main analysis we grouped lists together that plausibly reflected similar sources of community information: community leaders’ and their spouses’ lists were combined in sites A4, T1 and T3; community leaders’ and senior citizens’ lists were combined in sites T1, T2, T3 and T4; community leaders’ and imams’ lists were combined in sites T1, T2 and T3. This yielded between two and four grouped lists per site (Table 1).
As an alternative analysis, we (i) combined imams’ and burial preparers’ lists across all sites to form one list and (ii) added records obtained from teachers and senior citizens to the community leaders’ lists.
Population estimates and death rates
We previously estimated the all-age population of Yemen between 2014 and 2021, by month and sub-district (administrative level 3), using a combination of pre-crisis census, geospatial projections by WorldPop (available at 100 m2 resolution, and resulting from extensively validated predictive statistical models [26]) and displacement flow data. Details are provided in Checchi et al. [27]. However, only two sites (T2, T5) consisted of entire subdistricts. In remaining sites, researchers attempted to collect GPS coordinates of all corners of the site (e.g. road boundaries) using their phones. We overlaid these polygons, and the boundaries of the surrounding subdistricts, onto WorldPop projections for 2017 (the mid-point of the data collection period) to compute the approximate proportion of the subdistrict’s population that fell within the site. We then multiplied our subdistrict estimates by this proportion to estimate the site’s population. Site boundaries, and thus populations, were unresolved for two sites (A1, T3). We used population estimates to compute average death rates by period (pre-pandemic and pandemic). We specifically present crude death rates among people aged ≥ 15yo (CDR15 +), per all-age population: this may be thought of as an age-specific fraction of the all-age CDR.