Assessing risk of bias
It is critical that the strengths and limitations of the research are appraised.
The questions posed by the guideline will often determine what the most appropriate study design will be to answer that question. It is not enough to make assumptions about the trustworthiness of the evidence based purely on the type of study, such as trusting the evidence of randomised trials or systematic reviews over observational studies (Viswanathan, Patnode et al. 2017). Depending on the type of research question, strong observational studies can at times provide more reliable evidence than flawed randomised trials.
Several different terms are used to talk about the assessment of studies underpinning a guideline — critical appraisal, quality assessment, internal validity — but in this module we use the concept of ‘risk of bias’. Bias refers to factors that can systematically affect the observations and conclusions of the study and cause them to be different from the truth (Higgins, Altman et al. 2011). Studies affected by bias can be inaccurate — for example, finding false positive or false negative effects or associations by over- or under-estimating the true effect. This can, in turn, lead to inappropriate guideline recommendations. Risks of bias are the likelihood that features of the study design or conduct of the study will give misleading results. This can result in wasted resources, lost opportunities for effective interventions or harm to consumers.
Risk of bias assessment requires a degree of methodological expertise and may be conducted by the guideline development group or by experienced researchers as part of a commissioned evidence review. Once complete, the risk of bias assessment can be used to inform the synthesis of the studies’ findings and integrated into the overall assessment of the certainty of the body of evidence. For example, the use of GRADE also incorporates considerations like precision and applicability. See the Synthesising evidence and Assessing certainty of evidence modules for more information.
Factors that can introduce bias are common to many areas of research, including:
- problems with the comparability of participants or populations in a study — selection bias
- factors other than the intervention or exposure of interest that influence the effect estimate — performance bias or confounding
- problems with measurement or classification of exposure or outcomes — detection bias
- missing information — attrition bias or reporting bias.
The specific factors will depend on the kind of studies you are considering in your guideline — for example, clinical trials, cohort studies or animal studies. It is usually impossible to measure whether or not a particular study has been affected by bias; so risk of bias is frequently assessed by looking for features of the design and conduct of the study that have been shown by empirical evidence to minimise the risk. These features may include randomisation in a randomised trial, management of confounding in an observational study, completeness of follow-up and so on (Higgins, Altman et al. 2011; National Research Council 2014; Viswanathan, Patnode et al. 2017).
Systematic reviews are a common source of evidence for guidelines, whether commissioned specifically for the guideline or identified in the published literature. While systematic reviews represent the gold standard in synthesising the available body of evidence, the findings of a systematic review depend strongly on the validity of its included studies. A critical step within a systematic review is to assess the individual included studies for risk of bias. Those assessments must inform the findings and interpretation of the review (Higgins, Altman et al. 2011); and following on from this, the recommendations within the guideline.
In addition, the process of conducting a systematic review may introduce bias and so the review process should be assessed separately as well as in addition to the risk of bias of its included studies. Reviews at high risk of bias might, for example, have problems with the completeness of the search for relevant studies; with systematically and transparently selecting studies for inclusion; or with following their planned analysis of the available results (Whiting, Savović et al. 2016; Shea, Reeves et al. 2017).
Guidelines may also consider overviews of systematic reviews — also called ‘umbrella reviews’ or ‘reviews of reviews’ — which have their own unique features requiring assessment. For example, there may be overlap between systematic reviews that include some of the same primary studies, or multiple systematic reviews of similar topics with discordant results. These may arise for a number of reasons such as methods of synthesis and quality assessment (Pollock, Fernandes et al. 2016; Ballard and Montgomery 2017).
You should plan for all the steps discussed in this module.
What to do
1. Plan your approach
When planning to collect evidence to inform a guideline, as with any research project, the assessment of risk of bias should be planned ahead of time as part of a research protocol. This will reduce the risk of inadvertently introducing bias into the systematic review process by ensuring that you have selected appropriate methods, that they are applied consistently and fairly across all the evidence and that you avoid post hoc decisions based on seeing the studies’ results (Higgins, Altman et al. 2011; Viswanathan, Patnode et al. 2017).
In order to plan the risk of bias assessment, you will need a clear idea of which study designs you are likely to be considering. This may be discussed early in the guideline development process. For example, a very large guideline may determine at the scoping stage that it will only consider existing systematic reviews, or only after the specific, answerable questions for the guideline have been formed (see the Scoping the guideline, Forming the questions and Deciding what evidence to include modules). The need for additional study designs may arise later in the process, at which point good judgement should be used to select appropriate methods.
Remember that you may need to exercise judgement in identifying which study design a particular paper is using. Studies can use varying terminology and unusual design features that may mean the design is not what you expect. Randomised trials can introduce features such as cross-over, cluster-randomisation and multiple-body part comparisons that require additional consideration in assessing risk of bias (Higgins, Deeks et al. 2011). Ensure you have determined the study design based on the reported methods rather than the label assigned by the authors before selecting the most appropriate tool (Reeves, Deeks et al. 2011).
2. Identify an appropriate risk of bias assessment tool
How risk of bias is assessed depends on the questions you are asking and the types of studies available (see the Deciding what evidence to include module). It is important to assess the risk of bias for all included studies, whether this includes systematic reviews, overviews, randomised trials, observational studies, studies investigating exposure, causation or environmental toxicology, animal studies, health economics studies, qualitative studies or any other source of evidence.
To ensure that the assessments are thorough, consistent and as objective as possible, it is not recommended that you create your own assessment tool. Instead, use a published, structured and ideally validated risk of bias assessment tool. While there are many published tools available, some are more useful than others. Using an inappropriate tool, or not using a structured tool at all — for example, using your own custom tool, can lead to assessments that include concepts not relevant to bias, that fail to consider important aspects of bias, or that are inconsistent from study to study.
Where possible, seek methodological advice to select a tool that:
- has been designed to assess the study designs you are including — you may need more than one to consider different designs
- has been developed using a transparent process that draws on the theory and evidence of the impact of bias on research findings and has been tested in practice
- is focused on items related to risk of bias rather than other characteristics, such as applicability
- allows you to show transparency by providing descriptive information to support your assessments
- does not present assessments as a summary numerical score (Viswanathan, Patnode et al. 2017; Whiting, Wolff et al. 2017).
Some good practice tools and sources for further guidance are listed below in Table 1. Others can be found in the general sources of advice on guideline development in the Useful resources section below. There are some study designs for which research into risk of bias assessment is less well established and for which tools meeting the above criteria are not yet available. If that is the case for your guideline, seek methodological advice to select or adapt an alternative tool that will cover most key concepts and provide a consistent structure.
Simple checklists and, in particular, numerical scores can lead to superficial assessments or assessments that are not meaningful to readers. Readers may include guideline users but also guideline developers relying on technical reports developed by external research groups. It is preferable to conduct and report a considered assessment and report it in detail. For example, an assessment of attrition might include considering not only the percentage of participants who completed follow up, but also:
- the reasons for attrition or the length of follow up of participants
- whether enough data are missing to impact the statistical analysis in this particular case
- whether the intention of the study was to assess receiving the intervention or assignment to receive the intervention. This may include receiving a specific screening test compared to receiving an invitation to a population-level screening program, for which an incomplete take-up rate would be expected (Higgins, Altman et al. 2011; Higgins, Savović et al. 2016).
Question/study type | Tool | Year | Source of guidance |
---|---|---|---|
Systematic reviews | ROBIS | 2016 | www.bristol.ac.uk/population-health-sciences/projects/robis/ |
AMSTAR 2 | 2017 | amstar.ca | |
SIGN checklist for SR and meta-analyses | 2014 | https://www.sign.ac.uk/checklists-and-notes.html | |
Overviews of reviews | Ballard & Montgomery checklist | 2017 | (Ballard and Montgomery 2017) |
Randomised trials | Cochrane RoB 2.0 Tool | 2016 | www.riskofbias.info |
SIGN checklist for randomised controlled trials | 2014 | https://www.sign.ac.uk/checklists-and-notes.html | |
Non-randomised studies of interventions (case-control, cohort, etc.) | ROBINS-I | 2016 | www.riskofbias.info |
Newcastle-Ottawa Scale (NOS) | 1999 | www.ohri.ca/programs/clinical_epidemiology/oxford.asp | |
SIGN checklist for case-control and cohort studies | 2014 | https://www.sign.ac.uk/checklists-and-notes.html | |
Prognostic:
| QUIPS | 2013 | methods.cochrane.org/prognosis/our-publications |
PROBAST | 2014 | www.systematic-reviews.com/probast | |
JBI checklist for prevalence studies | 2017 | http://joannabriggs.org/research/critical-appraisal-tools.html | |
Diagnostic | QUADAS-2 | 2011 | www.bristol.ac.uk/population-health-sciences/projects/quadas/quadas-2 |
SIGN checklist | 2014 | https://www.sign.ac.uk/checklists-and-notes.html | |
Qualitative | CASP Qualitative Checklist | 2018 | casp-uk.net/casp-tools-checklists |
JBI Checklist for Qualitative Research | 2017 | http://joannabriggs.org/research/critical-appraisal-tools.html | |
GRADE-CERQual | https://www.cerqual.org/ | ||
Observational studies of exposures* (human epidemiology, wildlife) | Navigation Guide risk of bias checklist | 2018 | https://www.cosmin.nl/ (Woodruff and Sutton 2014) |
OHAT tool | 2015 | Office of Health Assessment and Translation | |
Measurement properties | COSMIN | 2010 | (Mokkink, Terwee et al. 2010) |
In vivo animal studies | Navigation Guide | See above | |
OHAT tool | See above | ||
ORoC Handbook | See above | ||
SciRAP tool | 2018 | Science in Risk Assessment and Policy | |
SYRCLE tool | 2014 | Systematic Review Centre for Laboratory Animal Experimentation | |
CRED | 2016 | www.ecotoxcentre.ch/projects/risk-assessment/cred | |
In vitro studies | OHAT | See above | |
TOXR tool | https://eurl-ecvam.jrc.ec.europa.eu/about-ecvam/archive-publications/toxrtool |
*A review of tools for assessing risk of bias in observational studies of exposures is available.
Selecting the right risk of bias tool
Ongoing research is helping to make it easier for developers to find good practice tools for assessing risk of bias.
A systematic review of tools to assess risk of bias is available. This review is complemented by an interactive online data visualization instrument that allows you to find and compare tools.
Developers still need to make a decision about which tool is best suited for their purpose; however, this resource provides an overview of an individual tool’s strengths and limitations so that guideline developers can make an informed choice. The review assesses the types of study designs for which the tool is applicable, the items in the tool and which domains of bias they assess, the method used to develop the tool and whether and how the tool has been tested.
3. Be aware of related issues
There are a number of related issues that you should be aware of (Table 2, immediately below), including areas of bias that require specific approaches, and issues that are not causes of bias but are often conflated with the risk of bias assessment. Further reading is available on all these factors (Higgins, Altman et al. 2011; National Research Council 2014; Viswanathan, Patnode et al. 2017).
Issue | Description | What to do |
---|---|---|
Funding | Industry sponsorship of included studies (e.g. from pharmaceutical companies or device manufacturers) is associated with more favourable estimates of the intervention effect (Lundh, Lexchin et al. 2017). | Consensus has not yet been reached as to how funding should be addressed within risk of bias assessment: either as a stand-alone bias or through its impact on other methods (e.g. the selection of comparators or selective reporting of results) (Bero 2013; Sterne 2013; Viswanathan, Patnode et al. 2017). Whether or not funding is included in the assessment tool you are using, always report the sources of funding for each study, conduct a thorough search for unpublished evidence (see Identifying the evidence) and consider how funding sources may influence the findings (see Assessing certainty of evidence). |
Applicability | Not a risk of bias. Studies whose results are unlikely to be broadly applicable or useful (e.g. using an unrealistic population with no comorbidities, or a placebo control rather than current best practice). | Do not include in risk of bias assessment. Results may be unbiased even if not widely applicable. Consider later, as part of overall certainty of the evidence (GRADE) (see Assessing certainty of evidence) |
Precision | Not a risk of bias. Studies may not have enough participants (insufficient power) to detect a statistically significant effect, or may also have imprecise results if the results for individual participants are highly variable. Imprecise results have wide confidence intervals or large P values. | Do not include in risk of bias assessment, as the central effect estimate is not biased. Meta-analysis with other studies may increase power. Consider later, as part of overall certainty of the evidence (GRADE) (see Assessing certainty of evidence) |
Reporting biases | Results are more likely to be reported and available for synthesis if they find a positive and/or statistically significant effect. Study level:
| Consider as part of risk of bias assessment. Request additional information from study authors. |
Body of evidence level:
|
Do not include in risk of bias assessment, as these factors apply to studies and outcomes that are missing, not those you are assessing. Consider later, as part of assessment of certainty of evidence (e.g. using GRADE), as syntheses that are missing negative or uncertain results may be biased (see Assessing certainty of evidence). Request additional information from study authors and conduct a thorough search for unpublished studies. | |
Poor reporting | Lack of clarity around methodological features related to bias. | May lead to more ‘unclear’ ratings, but does not require specific assessment. Request additional information from study authors. Note that reporting standards have been developed to guide the reporting of different study designs (e.g. CONSORT, ARRIVE). These tools may be helpful to identify key features, but are not designed to assess risk of bias. |
4. Appraise each study
Before you begin your appraisal process, ensure that everyone involved in the assessments has been trained in the use of the tool(s) and has read any detailed guidance available. It is helpful to pilot the assessment process on a few studies to identify any differences in understanding or difficulties with the process before proceeding (Higgins, Altman et al. 2011; Viswanathan, Patnode et al. 2017).
Best practice is for at least two people to independently assess each included study and then to reach consensus about the final assessment. Different assessors often reach different conclusions about the risk of bias (Hartling 2012), so discussion is helpful to ensure that all the relevant factors have been considered appropriately. For systematic reviews or overviews, the review itself should be appraised and not the individually included studies.
Although some tools will conduct assessments at the whole study level, you should consider performing assessments separately for different outcomes or specific statistical results within a study. This allows for the fact that biases can impact differently on individual outcomes. For example, some outcomes may have more incomplete follow-up than others, such as information on harms that has been inconsistently collected, outcomes requiring invasive testing or outcomes requiring participants to report sensitive personal information or measures that are more subjective and susceptible to bias (Higgins, Altman et al. 2011; Viswanathan, Patnode et al. 2017).
Once the detailed assessment is complete, determine the overall risk of bias, considering all the relevant sources of bias for each outcome within the study. Some tools will assist in reaching this overall conclusion and for others you will need to use your own judgement. Plan your definition of an overall high risk of bias in advance. Some authors will consider an outcome at high risk of bias when any of the items within a tool is assessed at high risk. Others will select specific factors that they consider to be particularly important to define an overall high-risk. Consider whether the likely direction and magnitude of the effect of bias can be predicted; for example, whether there is likely to be a large or small impact and whether it is likely to cause over- or under-estimation of the effect. Impact or effect will not always be possible to identify.
Machine learning in risk of bias assessment
New tools are emerging that use machine learning and text mining to automate the process of risk of bias assessment. For example, RobotReviewer (www.robotreviewer.net) automatically extracts and synthesises study data, but only from randomised controlled trials. To date, these tools have shown a good level of accuracy in identifying relevant text to inform decisions about risk of bias in randomised trials. However, human judgement is still needed to reach conclusions about the level of risk. Supporting human judgement with machine learning could increase the efficiency and speed of the risk of bias assessment process (Marshall, Noel-Storr et al. 2016; Millard, Flach et al. 2016).
5. Report the assessment process
Include details of how you conducted the risk of bias assessment and the findings of your assessments in the guideline documents:
- Describe the methods used: A brief summary of the assessment process can be included in the methods/development section of the guideline itself. This would note that all studies were assessed; the tools used; and how assessments were used to inform inclusion or synthesis decisions. In the technical report, a more detailed description could include copies of, or links to, the tools used for each study design, as well as definitions of an overall high risk of bias at the individual study/outcome and synthesis levels.
- Report the risk of bias assessment: Include individual risk of bias assessment tables or graphs for each included study in the technical report. If not, include summary level information; for example, the overall risk of bias for each study and across studies for each outcome. Most systematic review software packages include structured tables to support risk of bias assessment and reporting, and often incorporate standard tools. Identify where risk of bias assessments were used to make choices about the synthesis of results.
- Report how risk of bias assessments influenced recommendations: In the technical report, identify where risk of bias influenced the assessment of the certainty of the evidence (see the Assessing certainty of evidence module). This information may also appear in the evidence tables in the completed guideline, depending on the level of detail presented.
6. Use your appraisals to inform the guideline
Conducting detailed risk of bias assessments is only useful if they are used to inform the guideline (Higgins, Savović et al. 2016; Viswanathan, Patnode et al. 2017). This can be done in a number of ways, depending on the nature of the literature available and the structure of the guideline. Options available include:
- Conduct sensitivity analysis to consider the potential impact of studies at high risk of bias on your overall conclusions. This can be done quantitatively using meta-analysis, or qualitatively if you are using narrative or qualitative synthesis.
- Exclude studies at high risk of bias from the evidence synthesis. This should be done with caution — provide a rationale and clear criteria for excluding studies. Ensure this is pre-specified in your protocol to avoid introducing bias through post hoc exclusion decisions based on knowledge of a study’s findings.
- Reach an overall conclusion for each outcome, based on all the studies contributing data to that outcome, as to whether the synthesised result is at high risk of bias. This can be informed by pre-specified criteria, such as focusing on specific risk items, or informed by the sensitivity analysis referred to in the previous point.
- Use the overall conclusion to inform the overall assessment of the certainty of the evidence; for example, the GRADE approach summarises the risk of bias, precision, consistency and other factors (see the Assessing certainty of evidence module). This will then be used to inform the strength of the guideline development group’s recommendations.
Updating, adopting or adapting guidelines
Risk of bias assessment is a factor when considering whether to update, adopt or adapt an existing guideline (see the Adopt, adapt or start from scratch module). Consider whether the original guideline considered the risk of bias of the included studies in forming its recommendations. If not, the recommendations may need to be reviewed and the assessment conducted again; however, this is labour-intensive and requires obtaining all the original papers. The methods used in the original critical appraisal may not be clearly reported, which can make this judgement even more difficult. Of 1,046 guidelines assessed for inclusion in the NHMRC Portal to 2013, only 17% linked their guideline recommendations to levels of evidence and references; and only 7% included a replicable description of the evidence review (NHMRC 2014). You may be able to obtain more information by contacting the original guideline developers or publishing organisation.
Case studies: using risk of bias assessments to inform analysis
The Australian Clinical Guidelines for Stroke Management 2017 provide recommendations across the course of stroke care, from preclinical care through management, rehabilitation, discharge planning and care in the community. Conducting new systematic reviews to answer each clinical question was not feasible. For each question in the guideline, the best available study design was selected; that is, a systematic review, followed by a randomised controlled trial, followed by an observational study (depending on the type of question). Where more than one primary study of the same design was available, the study assessed to be at the lowest risk of bias was used. Risk of bias assessments were then used to inform the overall assessment of the certainty of the evidence using GRADE.
The Australian RACGP Guideline for the management of knee and hip osteoarthritis included randomised trials and synthesised these to inform the guideline. Each trial was assessed for risk of bias using the Cochrane risk of bias tool. Definitions were provided about how the individual risk of bias domains would be translated to overall ratings and how the overall risk of bias ratings of multiple studies would be interpreted as part of the GRADE assessment for the overall certainty of the evidence (see the Guidelines' Technical report).
The NICE guideline on People’s experience in adult social care services (2018) included primary literature relevant to the care and support of adults receiving social care in their own homes, residential care and community settings. Much of the evidence was from qualitative or mixed-methods studies. A structured tool was used to assess each study and the detailed assessment of each study was presented in the evidence tables supporting the guideline (see Appendix B of the guideline). The evidence was then qualitatively synthesised to inform overall recommendations.
The NICE Guideline on Physical activity and the environment (2018) used three tools to assess each category of evidence — controlled, uncontrolled, qualitative. They presented the tools in full; provided definitions of how the individual items would be translated to overall ratings; and reported ratings against each item and the overall rating for each included study (see Appendix 3). The overall ratings were then used to inform GRADE assessments of the overall certainty of evidence supporting each clinical question (see Appendix 4).
NHMRC Standards
The following of the NHMRC standards apply to the Assessing risk of bias module:
2. To be transparent guidelines will make publicly available:
2.1. The details and procedures used to develop the guideline.
6. To be evidence informed guidelines will:
6.2. Consider the body of evidence for each outcome (including the quality of that evidence) and other factors that influence the process of making recommendations including benefits and harms, values and preferences, resource use and acceptability.
Guidelines approved by NHMRC must meet the requirements outlined in the Procedures and requirements for meeting the NHMRC standard.
Useful resources
Guidelines for Guidelines: Tools to assess risk of bias
McMaster University GRADE online learning module on risk of bias
Cochrane Handbook for Systematic Reviews of Interventions (includes information on bias in randomised and non-randomised studies, qualitative studies, economics studies and overviews)
Developing NICE guidelines: the manual, Appendix H
SYRINA Framework for integrated assessment of chemical exposure (Vandenberg, Ågerstrand et al. 2016)
QUADAS2 tool to evaluate the risk of bias and applicability of primary diagnostic accuracy studies
References
Hartling, L. (2012). Validity and Inter-Rater Reliability Testing of Quality Assessment Instruments.
Higgins, J., J. Deeks, et al. (2011). Chapter 16: Special topics in statistics. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). J. Higgins and S. Green (Eds), The Cochrane Collaboration. Available from http://handbook.cochrane.org.
Higgins, J. P., J. Savović, et al., Eds. (2016). Revised Cochrane risk of bias tool for randomized trials (RoB 2.0), 20 October 2016. www.riskofbias.info, accessed on 24 May 2018.
Reeves, B., J. Deeks, et al. (2011). Chapter 13: Including non-randomized studies. In: , . . Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). J. Higgins and S. Green (Eds), The Cochrane Collaboration. Available from http://handbook.cochrane.org.
Acknowledgements
NHMRC would like to acknowledge and thank Miranda Cumpston (author), Professor Lisa Bero from the Charles Perkins Centre, The University of Sydney (editor), Professor Sally Green from
Cochrane Australia (editor) and Associate Professor Philippa Middleton from the
South Australian Health and Medical Research Institute (editor) for their contributions to this module.
Version 5.1. Last updated 29 August 2019.
Suggested citation: NHMRC. Guidelines for Guidelines: Assessing risk of bias.
https://nhmrc.gov.au/guidelinesforguidelines/develop/assessing-risk-bias. Last published 29 August 2019