A systematic review of monitoring and evaluation indicators for sexual and reproductive health in humanitarian settings

Objective To conduct a comprehensive mapping of published indicators for monitoring and evaluation (M&E) of sexual and reproductive health (SRH) services and outcomes in humanitarian settings. Methods A systematic search of the peer-reviewed and grey literature published between January 2008 and May 2018 was conducted to identify all references describing indicator sets for M&E of SRH services and outcomes in humanitarian settings. The databases MEDLINE, Web of Science, and Global Health, as well as 85 websites of relevant organizations involved in humanitarian response were searched. Characteristics of identified indicator sets and data from individual indicators was extracted. Findings Of 3278 records identified, 20 met the review’s inclusion criteria and 9 existing indicator sets were identified. A total of 179 relevant indicators were included in the mapping, and removal of duplicates yielded 132 unique indicators. Twenty-seven percent fell within the maternal health domain, followed by the HIV/AIDS domain (26%) and the gender-based violence domain (23%). The distribution of indicators by type (process/output, outcome, impact) was balanced overall but varied substantially across domains. The most commonly used data collection platforms were facility-based systems or population-based surveys. Domains covered and indicator definitions were inconsistent across indicator sets. Conclusion Results demonstrate the need to standardize data collection efforts for M&E of SRH services and outcomes in humanitarian settings and to critically appraise the extent to which different domains should be covered. A core list of indicators is essential for assessing response status over time as well as across countries. Electronic supplementary material The online version of this article (10.1186/s13031-019-0221-1) contains supplementary material, which is available to authorized users.


Background
In line with target 3.7 of the Sustainable Development Goals (SDGs), access to sexual and reproductive health (SRH) services, including maternal health services, is crucial to ensure health and well-being of all people at all ages, and is a human right [1]. Yet ensuring access to SRH services is particularly challenging in humanitarian settings, given the collapse of health systems, limited quality of care and availability of human resources, as well as the increased vulnerabilities associated with conflict and displacement.
According to the Inter-agency Field Manual for Reproductive Health in Crisis, a humanitarian setting is "... one in which an event or series of events has resulted in a critical threat to the health, safety, security or well-being of a community or other large group of people. The coping capacity of the affected community is overwhelmed and external assistance is required. This can be the result of events such as armed conflicts, natural disasters, epidemics or famine, and often involves population displacement [2]".
The Inter-Agency Working Group (IAWG) for reproductive health in crises provides guidance on six main objectives around the minimum initial service package (MISP) for reproductive health in crisis [2]. The MISP is a set of priority activities intended to be implemented immediately at the onset of crisis. The MISP also forms part of the Sphere Project's minimum standards for humanitarian assistance [3]. Despite these established international standards for basic service provision in humanitarian settings, there remains no consensus around monitoring and evaluation (M&E) frameworks or sets of indicators to assess adequacy of SRH service provision in humanitarian settings as well their respective impacts on associated morbidity and mortality. Moreover, as time passes after the initial onset of an emergency and the setting passes into extended (or protracted) stages of crisis, service provision should move towards more comprehensive coverage of SRH needs [2]. Although M&E indicators and standards play an important role in guiding the transition to more comprehensive service provision, there are currently no widespread standards regarding core indicators that should be collected in extended stages in emergency settings versus those for acute stages.
Valid, timely, and reliable monitoring and evaluation data is essential for guiding effective humanitarian response as well as ensuring the accountability of all actors involved. Yet, often even the minimal needed data is unavailable [4]. Improving data availability and quality in humanitarian settings will require the commitment and willingness of the humanitarian actors across diverse agencies and organizations to invest in the time, effort and platforms to allow for the needed data to be collected. It will also require an openness for greater consistency in data collection, analysis, and use [4], in order to ensure comparability across settings and to demonstrate performance expectations for implementing organizations [5].
Given the need for increased focus on and consistency in the M&E of SRH services in humanitarian settings, the World Health Organization's (WHO) Department of Reproductive Health and Research, in collaboration with the Department of Maternal, Child and Adolescent Health as well as numerous partner organizations and agencies, has committed to guide a collaborative and consultative review process. Ultimately, the goal is to propose a standardized set of core indicators for M&E of SRH services and outcomes in acute and extended humanitarian settings, and to provide guidance on their use. Initiated in April 2018 and expected to conclude in 2020, the review process consists of identifying current M&E indicators and mechanisms for SRH in humanitarian settings and convening in-depth stakeholder consultations to: assess their adequacy; standardize definitions and data collection procedures; and select and prioritize indicators for inclusion in a set of recommended indicators.
The process began with a systematic literature review conducted to identify current M&E indicators. An initial technical consultation which convened a wide variety of experts and other stakeholders was then held in December of 2018. The final step in the review process will involve field testing of standardized indicators and accompanying implementation recommendations in a variety of settings impacted by differing types and stages of humanitarian crises (April 2019-June 2020). Field testing will assess feasibility and allow for finalization of the core indicator sets across the different SRH domains, including establishing subsets specific to acute and extended stages of emergency.

Main text
This paper seeks to describe the systematic literature review, which began this multi-year process and is intended to improve quality and consistency in the M&E of SRH services in humanitarian settings. This literature review served as the first step in the broader process and was conducted to describe and assess existing indicators published in the peer-reviewed and grey literature for SRH services and outcomes in humanitarian settings. Thus, it aimed to achieve the following objectives: 1. Identify existing indicator sets described within the peer-reviewed and grey literature, which are intended for the monitoring and evaluation of SRH services and outcomes in humanitarian settings. 2. Examine all relevant individual indicators within each set in order to assess the relative coverage of different SRH domains and topics, the relative frequency of indicator types (i.e. process, output, outcome, or impact), and to identify commonly occurring indicators.

Methods
This review was conducted in accordance with the preferred reporting items for systematic review and metaanalysis protocols (PRISMA-P) guidance [6].

Eligibility criteria
References  [2]. The reason for including only references that addressed multiple (two or more) domains was due to the fact that even at the most minimal (such as the service package described in the MISP) SRH service provision in humanitarian settings must cover multiple domains. This inclusion criteria ensured that indicator sets identified in the review were those intended for assessing multi-domain SRH service packages, as opposed to siloed programs focused on a single domain. Date criteria were applied to ensure that materials retrieved reflected up-to-date practices and perspectives on monitoring and evaluation as well as of SRH.

Information sources
Databases searched for peer-reviewed literature included: MEDLINE/PubMed, Web of Science, and Global Health. To identify grey-literature and online resources, a manual search was conducted of the websites of organizations that work extensively in humanitarian settings and/or do extensive work in the area of SRH.

Search strategy
For the database search, search terms were selected by identifying relevant medical subject headings (MeSH) and keyword terms for the following concepts: sexual, reproductive, and maternal health; humanitarian settings; and M&E. The initial search was constructed in PubMed using "OR" to link terms for the same concept, and the term "AND" to link the groups of terms for different concepts. This was then translated into the correct syntax for the other two databases. Filters were applied to all searches to retrieve articles published in English since January 1st, 2008. The full search syntax for each database is available in the Additional file 1.
For the online search, an initial list of 60 organizations was compiled based on a list of participating agencies within the WHO Global Health Cluster. As potentially relevant web content and documents were identified while searching the websites of these organizations, the names of additional organizations mentioned (for example, collaborating partners on an initiative, or co-authors on a document) were recorded. The websites of these additional organizations were then searched as well. In total, 85 websites were searched (see Additional file 1 for complete list).

Data management and selection process
Title, abstracts and other reference information for hits identified via the database search were downloaded to EndNote, and then exported in spreadsheet format. During the online search, all potentially relevant references were either downloaded as PDFs or saved as screenshots, and the bibliographic information for each (title, date, author, etc.) was entered into a spreadsheet. Two reviewers then independently screened the titles and abstracts of all peer-reviewed references and screened online references. Discrepancies in decisions about whether to include or exclude a particular reference were resolved through discussion. Next, the full-text of all references included during the initial round of screening were retrieved and reviewed. During this round of screening, reasons for exclusion were recorded and the list of references to include in the review was finalized.

Data extraction & synthesis
First, metadata for indicator sets described was extracted from all references selected for inclusion during screening. This included: citation and name of indicator set, intended setting and stage of emergency, SRH domains examined, data sources used for indicators, and supporting resources available. Data for individual indicators were then extracted only for indicators that met the following criteria: 1) were specific to the health sector, 2) fell into one of the six SRH domains addressed by MISP objectives, and 3) could be defined in terms of specific, objective, and comparable numerators and denominators. These criteria were applied because the goal of this review was to identify indicators that would be comparable over time, across settings, and across emergency types. Finally, detailed information was extracted for each relevant indicator within the indicator sets identified. This included: source, domain, topic, name of indicator, definition, data source, and data collection method. Additionally, indicators were compared to those included in the monitoring and evaluation frameworks for the SDGs, the Global Strategy (GS) for Women's Children's and Adolescents' Health, and WHO's 100 Core Health Indicators [7][8][9]. Finally, indicators were classified by type (process/output, outcome, and impact), in line with the WHO Health Emergencies Program (work stream 4 on standardized indicator sets for acute and protracted event monitoring).

Search results
As shown in Fig. 1, 3,470 records were retrieved from the database search, which resulted in 3155 unique hits after duplicates were removed. An additional 123 potentially relevant records were identified through online searching, yielding a total of 3278 records for screening. Of these, 3237 were excluded during the initial round of screening, and another 21 were excluded during full text screening. In total, 20 references were included in the analysis [3,[10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. From these 20 references, 9 existing indicator sets were identified. Finally, 179 relevant indicators from the indicator sets identified were included in the mapping. Removal of duplicates yielded 132 unique indicators. Table 1 describes the 9 indicators sets that were identified from references included in the review (details for each reference are available in the Additional file 1). Table 2 provides the full list of unique indicators identified, organized by domain.

Coverage of SRH domains and topics within domains
As shown in Table 1, all indicator sets included indicators on MH, and all but one included indicators on GBV. Domains with the least coverage were those reporting on ARH and CAC. When looking at individual indicators, a similar trend emerged. The majority (27%) of the 132 unique indicators identified fell within the MH domain, followed by the HIV domain (26%) and the GBV domain (23%). Domains with the least coverage were ARH (3%) and CAC (3%).
For all domains other than ARH and CAC, indicators were also broken down by topic. Distributions by topic are shown in Fig. 2. Topics with the greatest coverage overall were prevention of mother-to-child transmission (PMTCT) (n = 16) within the HIV domain, and occurrence of violence (n = 11) within the GBV domain. Within other domains topics with the most coverage were STI service availability and STI incidence and prevalence (both n = 4), MH emergency care (n = 5), and use of contraception (n = 6).
The domain with the greatest breadth (number of different topics) was MH which had indicators covering 11 different topics. The number of indicators per topic was low, however, with topics covered by between 2 and 5 indicators. In contrast, the domains of GBV and HIV each included fewer topics (8 and 7, respectively) but had more indicators clustered within specific topics (occurrence of violence and PMTCT). The STI and FP domains had the fewest topics-3 and 4 respectively.

Indicator types
Overall the distribution of indicators by type (i.e. process/ output, outcome, or impact) was fairly balanced, with the majority classified as outcome (41%), followed by impact (30%), and then by process/output (30%). When disaggregated by domain, as shown in Fig. 3, distributions of indicators by type varied substantially across domains. The greatest number of Impact indicators were in the GBV domain (n = 16), followed by the MH domain (n = 12). Numbers of outcome indicators were greatest in the HIV domain (n = 20) and in the MH domain (n = 15). These two domains also included the greatest number of process/ output indicators (n = 11 and n = 9, respectively).

Intended context for use
Of the 9 indicator sets identified, 6 were intended for use in all humanitarian settings, 1 was designed specifically for conflict-affected settings, 1 was designed for post-disaster settings in the United States, and 1 was intended for use with displaced populations in both camp and urban settings, with separate versions available for the two settings. Regarding stage of emergency, 6 indicator sets were intended for use during both acute and extended stages, 2 were intended specifically for the acute stage, and 1 was intended for extended or protracted stages. Interestingly, of the 6 indicator sets that indicated that they were appropriate for both acute and extended emergency stages, none specified which of the indicators included were appropriate during which stages.

Data sources used
The majority of indicators (n = 65) used data only from facilities, meaning data obtained directly from facility records, entered into reporting systems by facility staff, or collected during facility assessments. Fifty indicators used data only from the affected population, obtained via population-based surveys. Five indicators could be calculated using data from either facilities or affected populations, depending on which definition was used for the indicator. For example, 'complete antenatal care (ANC) coverage' could be obtained using facility data when defined as, "percentage of total number of live births in which the mother made at least four ANC visits during the antenatal period at the time of delivery at facility," but would require data from a population-based survey to calculate when defined as, "percentage of all women whose most recent pregnancy ended in a live birth or stillbirth in the last two years who received at least three ANC care visits by a trained provider." Aside from the indicators drawing on facility or population data, three indicators used data from program records. This includes, for example, the indicator on clean delivery kit coverage-this is intended to be calculated using data from the program distributing the kits on the total number distributed. Two indicators used data obtained directly from service providers regarding their knowledge and training. For seven indicators, it was unclear what data source should be used, and the set they were included in did not specify.

Frequently occurring indicators & overlap with priority indicators
As shown in Tables 2 and 3, a total of 33 indicators appear in multiple sets. Of these, however, only 20 have definitions which are consistent across sets. As shown in Tables 2 and 4, 28 indicators overlapped with those included in the monitoring frameworks for the SDGs or the Global Strategy (GS), or in the WHO's 100 Core Health Indicators (Core WHO). Less than half (only 11)                  [15]. Regardless, these discrepant findings in the numbers and types of M&E indicators across the different SRH domains suggest the need for critically appraising the extent to which these domains should be covered during routine monitoring and evaluation, and whether development of additional indicators may be needed for adequate coverage for SRH in humanitarian settings. Notable in their absence from the literature were indicator sets from many of the organizations that commonly implement relief efforts in emergency settings. Despite searching the websites of 85 organizations (many of which implement relief efforts), only one indicator set published by an implementing agency was identified [26]. This indicates that many organizations that provide SRH services in humanitarian settings do not make their M&E frameworks or indicator sets available in the public domain. Consequently, it is difficult to know which indicators are actually regularly used and reported on [4].
Broadly, our findings concur with the conclusions of Checchi et al. [4] in their review of public health information methods for crisis-affected populations. They assert the need for a common set of crisis-specific public health indicators, as well as establishment of a single health information platform for use in emergencies and a global data repository to store and analyze the data collected [4]. These needs underlie the consultative review process led by the WHO's Department of Reproductive Health and Research (HRP) which aims to establish a recommended core set of SRH indicators for humanitarian settings.
Findings from this literature review have fed directly into the WHO's consultative review process. The identified indicators, especially the 28 that overlapped with one or more of the priority indicator sets (i.e. either the SDG, GS or Core WHO indicators) (Table 4), served as a starting basis for the review process during a Technical Consultation with experts and stakeholder convened in December 2018. A report describing the progress of the consultative process, including results regarding indicator prioritization and standardization, were circulated to all partners who participated in the Technical Consultation and will be made available on the HRP website (https://www.who.int/ reproductivehealth/publications/) upon incorporation of input from partners.
The prioritization process during the Technical Consultation revolved around selecting those indicators identified in this review which appeared in multiple sets and also overlapped with SDG, and/or GS, and/or WHO 100 Core Indicators. Along with prioritization, the review process focused on standardization-resolving inconsistencies across indicators sets and establishing clearly defined numerators, denominators, and data collection guidance for each indicator based on input from SRH experts and other stakeholders. Additionally, given the lack of indicator sets from implementing agencies identified in this review (as noted above), representatives from key implementing agencies participated in the Technical Consultation and contributed information about their internal M&E indicators and processes.
Ongoing areas of focus during the WHO's consultative review process are indicator coverage and feasibility of usage. As this literature review demonstrates, coverage of existing indicators across domains varies substantially, and differs by indicator set. This raises questions regarding what the coverage of a core set of indicators should be, and what is most realistic. For example, this literature review identified few indicators in domains such as ARH and CAC-domains often associated with pertinent socio-political challenges that might prevent or hamper data collection. Another major feasibility question is not only whether the data collection for obtaining certain indicators would be logistically possible, but also whether it would be politically and bureaucratically feasible, making harmonization of indicators across settings difficult. Results from this literature review also indicate an uneven balance of indicators by data source, with the vast majority drawing on data from facilities or population-based surveys. Yet indicators drawing on other data sources, such as community-based indicators, may be more appropriate and informative for assessing services provided at levels beyond the health facility. Finally, for some SRH domains, such as HIV and GBV, crucial services are often provided by separate programs specific to these domains which are distinct from SRH services and programs. Therefore, ensuring appropriate coverage of the HIV and GBV domains within a core set of indicators will require multi-sector collaboration on the indicator selection process. These and other issues related to establishing a core SRH indicator set for humanitarian settings will continue to be explored during stakeholder consultations and via field-testing to assess indicator feasibility via collection of real-time data across varying humanitarian contexts.
In addition to the indicators identified, this review's descriptions of the data collection tools, processes, and guidance that currently exist in association with each indicator set could be useful for identifying data collection platforms to scale up and harmonize data collection and reporting of indicators across agencies, settings, and time, as called for by Checchi et al. [4]. The extent to which supporting resources are available for data collection, analysis and reporting currently varies substantially across indicator sets. For instance, the indicator set identified to have the most extensive set of supporting resources is the UNHCR's Health Information System Standards and Indicators, which is part of the Twine system (accessible at http://twine.unhcr.org/ app/). The Twine system not only includes data collection tools and data entry templates, but also provides a mechanism for centralized reporting and automatic analysis. This is also the only set of indicators that is associated with an established system for ongoing data collection.
There is a need for greater emphasis on monitoring and evaluating SRH in humanitarian settings comprehensively, rather than taking a siloed approach. Only a few M&E studies examining a multi-domain set of SRH indicators were found in the peer-reviewed or grey-literature [14,17,[22][23][24][25]. Instead, many studies focused on one single domain, such as MH or GBV, which hinders a general understanding of the status of SRH services and outcomes in humanitarian settings as a whole. The exception were those studies which examined the MISP implementation [14,17,[23][24][25]. More specifically, the MISP Process Evaluation toolkit could be considered a valuable tool, given its broad coverage of multiple SRH domains across the six main MISP objectives. It should be noted, however, that although this toolkit is valuable, the data generated is focused on assessment of implementation processes, rather than M&E of SRH services and outcomes over time, or across settings.
Several strengths can be attributed to this review. These include its rigorous adherence to the PRISMA guidelines and the in-depth mapping process undertaken to synthesize key information from the 179 indicators identified. Additionally, focusing on the indicators themselves as the unit of analysis allowed for a unique and illuminating analysis. There are several limitations that should be equally noted. First, most of the indicator sets identified were either from guidance bodies (i.e. the Sphere Project, or the IAWG) or peerreview published literature, rather than directly reported from implementing agencies. As discussed above, this makes it difficult to accurately reflect the realities of M&E data collection efforts by the different implementing agencies from the field. Additionally, this review also does not indicate the feasibility and the practicality of collecting particular indicators in particular settings. Instead, feasibility will be assessed via field-testing at later stage in the WHO's consultative review process. Finally, we only included English language studies. However, considering the global nature of this topic, we expect only very few eligible studies are missed by excluding non-English literature.
In addition to the consultative process currently underway, further research is needed to address these gaps, such as supplementing this information with field experience on what is being collected at the field level as well as seeking global consensus and a process of prioritization of a core list of M&E SRH indicators in humanitarian settings. Future studies should systematically examine the extent to which indicators are measuring what should be measured, vs. what can be measured, and which indicators and data collection methods are appropriate for use in which settings. Additionally, iterative participatory consultative processes, engaging a wide variety of stakeholders involved in humanitarian response-particularly those most connected to on-the-ground realities coupled with feasibility assessments-will be an essential component to culminate these efforts to standardize and harmonize indicators and to ensure scale up, accountability and commitment of partners to collecting some or all of the recommended M&E indicators.

Conclusions
The results of this review assert the need for standardizing data collection efforts for M&E of SRH services and outcomes in humanitarian settings. A core list of indicators is essential for assessing response status over time as well as across and within countries. The 28 indicators identified via this review which overlap with either the SDGs, the Global Strategy or the 100 WHO Core indicators have provided the starting basis for an extensive consultative review process which aims to establish a standardized core indicator list. Rigorous reporting on a core list of indicators is a prerequisite for making the investment case that SRH response in humanitarian settings saves lives. Efforts are underway to conceptualize a core set of SRH indicators as well as to test their measurement feasibility. A standardized definition of accountability is a crucial bi-product of these efforts. A commitment by agencies on a core set of indicators requires a more conscious effort as well as willingness to share information and coordinate efforts. This could be possible by scaling up M&E of SRH efforts within the WHO's global health cluster, as it could ensure measurement sustainability, especially so for protracted crises.