A Bland-Altman diagram (differential diagram) in analytical chemistry or biomedicine is a data tracing method used to analyze the correspondence between two different assays. It is identical to a Tukey Mean Difference diagram,[1] the name by which it is known in other fields, but was popularized in medical statistics by J. Martin Bland and Douglas G. Altman. [2] [3] Suppose that X1, …, X N is a sample of a population N(μ, σ2) with an unknown mean μ and a variance σ2 for N > 1. The sample mean ( overline{X} ) and the variance of the sample S2 are defined as ( overline{X}=sum limits_{i=1}^N{X}_i/N ) and ( {S}^2=sum limits_{i=1}^N{left({X}_i-overline{X}right)}^2/left(N-1right) ). The 100p percentile of the distribution N(μ, σ2) is denoted by θ, where Stevens et al. [14, 29] developed the agreement probability (PoA) method as an alternative to the agreement boundary approach, which has the advantage of taking into account two different types of bias and uneven accuracy between devices. Proportional bias, where the extent of disagreement depends on the actual value in each topic, is considered in addition to additive bias, and this information can be used to clarify the different sources of disagreement if the devices do not match. The PoA method provides a flexible and informative summary of the agreement, but currently the methodology does not adapt to confounding factors (e.g. activity .

B in our study on COPD) and is therefore not yet as widely applicable as other alternatives. For more information about this method, see the supplement file. Barnhart HX, Lokhnygina Y, Kosinski AS, Haber M. Comparison of the correlation coefficient of concordance and the coefficient of individual agreement in conformity assessment. J Biopharm Stat. 2007;17(4):721–38. On the other hand, the limitations of compliance and TDI methods have the advantage of being based on the original unit of measurement and can be compared to a clinically acceptable difference [43]. In the journals of Barnhart et al. [11] and Barnhart [12], the authors point out that it is possible that LoA has 95% of the differences within the clinically acceptable difference, but not yet to conclude an agreement (for example, if one of the limits is outside the CAD). This can happen with distorted data or due to another failure of the normality hypothesis. We agree that this can be a problem when trying to interpret loA and that testing assumptions is especially important when making LoA.

However, we consider the ability of the methodology (and in particular the Bland-Altman diagram) to reveal relative average biases, patterns in the data and thus sources of disagreement is valuable; and that the simple calculation of a TDI or CP summary index can hide this detail. So, when calculating TDI or CP, we recommend that a Bland-Altman graph of paired differences between devices be created to the average, which also shows gross mean distortion and CAD, and we suggest that this provide a solid way to evaluate the match. In particular, outliers or biases in the data can be easily investigated compared to CAD. The LINE and CHARACTER commands can be used to control the appearance of the path. This is illustrated in the sample programs below. Setting certain LINE or CHARACTER parameters to BLANK can be used to omit certain reference lines. As a rule, reference lines for the mean difference and the lower and upper limits of the correspondence are included. However, you have control over each of them. Syntax 1: Bland and Altman first proposed the chord limits method (LoA) more than 30 years ago in their 1986 work [5] as an alternative to correlation-based methods, which they said did not accurately characterize the agreement [19]. The 95% match limits are simply calculated as m ± 2 * ET, where m is the average of the matched differences in readings (e.g. B, the differences in respiratory rate measured simultaneously in the same participant using two different devices) and SD is the standard deviation of the matched differences.

The limitations of the agreement are intended to quantify the dispersion between the paired differences. The wider the limits of the agreement, the more likely the measured values of the devices are to be different, which indicates a lack of agreement between the devices. To formally assess this degree of agreement, the limits are compared to a clinically acceptable difference (CAD): an area in which differences are considered virtually negligible. If the limits are included in the CAD range, it is concluded that the devices match and could be used interchangeably. CAD must be decided before data analysis to avoid distortions in the decision, although strictly speaking, the statistical validity of the method does not require it. The limits of the correspondence are usually shown on a Bland-Altman diagram of the matched differences from the mean values of the matched measured values. We provide a tutorial to help practitioners choose between different methods of evaluating agreement based on a linear mixed model hypothesis. We illustrate the use of five methods in a direct comparison using real data from a study of patients with chronic obstructive pulmonary disease (COPD) and consistent repeated observations of respiratory rate. The methods used were the concordance correlation coefficient, the match limits, the total deviation index, the probability of coverage and the individual agreement coefficient. Barnhart HX, Yow E, Crowley AL, Daubert MA, Rabineau D, Bigelow R, Pencina M, Douglas PS.

Selection of agree indices for the evaluation and improvement of the reproducibility of measurements in a central laboratory. Stat Methods Med Res. 2016;25(6):2939–58 doi.org/10.1177/0962280214534651. In particular, Bland and Altman [1, 2] proposed 95% compliance limits to assess differences between measurements using two methods. The parameters of the Bland-Altman correspondence limits at 95% are the 2.5th percentile and the 97.5th percentile for the distribution of the difference between the matched measurements. To reflect the uncertainty due to sampling error, approximate interval formulas were provided to estimate the two individual percentiles. The large number of citations showed that Bland-Altman analysis has become the main technique for evaluating the correspondence between two clinical measurement methods. But the recent work of Carkeet [19] and Carkeet and Goh [20] has provided detailed discussions in favor of the exact confidence interval compared to the approximate procedure considered in Bland and Altman [1, 2], especially when the sample sizes are small. Further reflections and reviews on the consistency of measurements in the comparative study of methods are available in Barnhart, Haber and Lin [21], Choudhary and Nagaraja [22] and Lin et al. [23].

Carkeet A. Exact parametric confidence intervals for Bland-Altman match limits. Optom Vis Sci. 2015;92:e71–80. Barnhart HX, Haber MJ, Lin LI. An overview of conformity assessment for continuous measures. J Biopharm Stat. 2007;17:529–69.

In particular, Monte Carlo simulation studies were conducted with 10,000 iterations to calculate the simulated coverage probability of exact and approximate confidence intervals for percentiles of a standard normal distribution N(0, 1). The specified sample size has six different sizes: N = 10, 20, 30, 50, 100 and 200. In addition, a total of eight percentile probabilities are examined: p = 0.025, 0.05, 0.10, 0.20, 0.80, 0.90, 0.95 and 0.975. For each replica, the lower and upper confidence limits {( widehat{uptheta} ) L , ( widehat{uptheta} ) U }, {( widehat{uptheta} ) AL , ( widehat{uptheta} ) AU } and {( widehat{uptheta} ) BAL , ( widehat{uptheta} ) BAU } were calculated to construct the unilateral confidence intervals of 95 and 97.5 % and the corresponding bilateral confidence intervals of 90 % and 95 %. The probability of simulated coverage was the proportion of the 10,000 replicas whose confidence interval contained the normal percentile of the population. Then, the relevance of unilateral and bilateral interval procedures is determined by the error = simulated hedging probability – nominal hedging probability. The results are summarized in Tables 1, 2, 3 and 4 for the exact and approximate confidence intervals with bilateral confidence coefficient 1 – α = 0.90 and 0.95 respectively. Bland JM, Altman DG. (1999) Measurement agreement in method comparison studies. Statistical Methods in Medical Research 8, 135-160. The diagram shows a scatter plot of the differences drawn with the mean values of the two measurements. Horizontal lines are drawn at the average difference and at the boundaries of the agreement.

The resulting errors in all three types of confidence intervals show that the exact approach works extremely well for the 96 cases presented in Tables 1, 2, 3 and 4. For the two approximate methods Chakraborti and Li [24] and Bland and Altman [2], the probabilities of coverage of their mutual interval remain quite close to the nominal confidence levels. However, the corresponding approximate unilateral interval procedures do not retain the same desired accuracy unless the sample size is large. Due to the different degree of presumed simplifications, the interval method of Bland and Altman [2] is inferior to that of Chakraborti and Li [24], especially in small sample sizes. .