文章来源:https://www.r-bloggers.com/making-youden-plots-in-r/

Making Youden Plots in R

October 15, 2015

By dtholmes@mail.ubc.ca

(This article was first published on The Lab-R-torian, and kindly contributed to R-bloggers)
118

SHARES

ShareTweet

Background

I was honoured by a site visit by Drs. Yeo-Min Yun and Junghan Song of the Korean Society for Clinical Chemistry a few weeks ago. As both professors are on the organizing committee of the Cherry Blossom Symposium for Lab Automation in Seoul in Spring 2016, their primary motivation for visiting was to discuss mass spectrometry sample prep automation but later we got on to the topic of automated reporting for quality assurance schemes.

Naturally, I was promoting R, R-Markdown and knitr as a good pipeline for automated Quality Assurance (QA) report.

This brought to mind Youden Plots which are used by DGKL in their reports. I like DGKL reports for three reasons:

  1. They are accuracy based against GC-MS when it comes to steroids.

  2. I can see all the other LC-MS/MS methods immediately.

  3. Youden plots look like a target and can be assessed rapidly.

The data for a Youden plot is generated by providing a number of laboratories aliquots from two separate unknown samples, which we will call A and B. Every lab analyzes both samples and a scatter plot of the A and B results are generated–the A results on the (x)–axis and the B results on the (y)–axis. Once this is completed, limits of acceptability are plotted and outliers can be identified.

In Youden's original formulation of the plot (see page 133-1 this online document) he required that the concentrations of the A and B samples be close to one another. As you might guess, in clinical medicine, this is not all that useful because we often want to test more than one part of the analytical range in an external quality assurance (EQA) scheme. One workaround for this is the make a Youden plot of the standard normal variates for the A and B samples, that is to plot (z_b = frac{b_i-bar{b}} {sigma_b}) vs (z_a = frac{a_i-bar{a}} {sigma_a}), where (a_i) and (b_i) are the individual values of the A and B samples from the (i) labs. This has the disadvantage of representing the results in a manner that is not easily assessed from a clinical perspective.

While there are published approaches to coping with this problem, these are out of scope here but I will show you a couple of other ways I have seen Youden plots represented. If you want to see R code to generate the classic Youden Plot, it can be found in thisthis stackoverflow post and below.

Random Data

Let's start by generating some data. For the sake of argument, let's say we are looking at testosterone results in males and measured in nmol/L. Suppose that the A sample has a true concentration of 5.3 nmol/L and the B sample has a true concentration of 16.2 nmol/L. Let's also assume that they are all performed by the same analytical method. If you have looked at EQA reports, you will know that a scatter plot of results for the A and B samples does not typically look like this.

plot of chunk unnamed-chunk-1

The (mock) data abaove are bivariate Gaussian and uncorrelated. In reality we often see something that looks a little more like this:

plot of chunk unnamed-chunk-2

That is, the A and B values are usually correlated.

Rectangular Youden Plots

The most common manner in which you will see a Youden plot prepared is just a box with mean (pm) 2SD and (pm) 3SD limits.

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")
grid()A.mu <- mean(A)
A.sd <- sd(A)
B.mu <- mean(B)
B.sd <- sd(B)#draw a box around the 2SD limit
rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")
#draw a box around the 3SD limit
rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")#draw a diagonal line - which is unnecessary but you will see people do it
lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))
#draw horizontal line
lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))
#draw vertical line
lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))#add a legend
legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-3

Non Parametric

Obviously, if you prefer, you could prepare this Youden plot in a non-parametric fashion by plotting the medians and the non-parametrically calculated 1st, 2.5th, 97.5th, and 99th percentiles. In this case, the code would be:

plot(A,B, xlim = c(0,10), ylim = c(0,30), pch=19, col="#00000080")
grid()A.med <- median(A)
A.quants <- quantile(A, probs=c(0.01,0.025,0.975,0.99))
B.med <- median(B)
B.quants <- quantile(B, probs=c(0.01,0.025,0.975,0.99))#draw a box around the central 95% limit
rect(xleft = A.quants[2], ybottom = B.quants[2], xright = A.quants[3], ytop = B.quants[3], lwd = 2, border = "orange")
#draw a box around the central 99% limit
rect(xleft = A.quants[1], ybottom = B.quants[1], xright = A.quants[4], ytop = B.quants[4], lwd = 2, border = "red")#draw a vertical line
lines(c(A.quants[1],A.quants[4]),c(B.med,B.med))
#draw vertical line
lines(c(A.med,A.med),c(B.quants[1],B.quants[4]))legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-4

Note that if you increase the number of points, this non-parametric plot will not quite converge to look like the parametric one shown above because (mu pm 2sigma) actually encompases 95.45% of the data in a univariate normal distribution, and (mu pm 3sigma) actually encompases 99.73% of the data.

FYI

Be ye not deceived. Even with the non-parametric approach or a parametric approach with the correct (z)-scores, the orange (“Central 95%” or “2SD”) boxes shown above do not house 95% of the data and the red (“Central 99%” or “3SD”) boxes do not house 99% of the data. You can see this pretty easily if you consider the case of uncorrelated data. Let's take 100000 random pairs of uncorrelated Gaussian data with (mu = 10) and (sigma = 1).

Five percent of data points are excluded by the vertical orange lines shown below at the (mu_A pm 1.96 sigma_A) and 5% of data are excluded by the horizontal orange lines positioned at (mu_B pm 1.96 sigma_B).

Points will fall into the one of the 4 areas shaded yellow 0.95 x 0.025 or 2.375 % of the time and points will fall into one of the 4 areas shaded purple 0.025 x 0.025 or 0.0625 % of the time. This means that the “2SD” box actually encloses 100 – 4×2.375 – 4×0.0625 % = 90.25 % of the data.

plot of chunk unnamed-chunk-5

Much more directly, the probability of both A and B falling inside the center square is 0.95×0.95 = 0.9025 = 90.25%.

You can do a random simulation to prove this to yourself:

#generate the data, n=10000
A <- rnorm(10^5,10,1)
B <- rnorm(10^5,10,1)
# convert results to z-scores
scale.df <- data.frame(scA = scale(A), scB = scale(B))
#calculate how many samples are inside the centre box
nrow(subset(scale.df, abs(scA)<1.96 & abs(scB)<1.96))
## [1] 90245

This is pretty darn close to the 90.25% we were expecting.

Elliptical Youden Plots

The rectangular plot shown works (with caveats described) but there is something slightly undesirable about it because a point could be off in the corner, far away from the other data, but still inside the 3SD box. It seems much preferable to encircle the data with an ellipse. Fortunately, there is a built in function to achieve this in the car package which makes the code is very simple. The other nice thing is that the ellipses are actually calculated to house 95% and 99% of the data respectively.

library(car)
dataEllipse(A, B, levels=c(0.95, 0.99), fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col=c("#00000080", "red"), center.pch = FALSE)

plot of chunk unnamed-chunk-8

To generate a two–colour equivalent to what we have above we draw the Youden plot in two stages.

dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE)dataEllipse(A, B, levels = 0.99, fill = FALSE, col = c(NA, "red"), center.pch = FALSE, plot.points = FALSE)legend("topleft", c("Central 95%","Central 99%"), col = c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-9

Now we might want to add the horizontal, vertical lines.

#store the points of the outer ellipse in a matrix
outer.ellipse <- dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), draw = TRUE, center.pch = FALSE)
#plot the innter ellipse
dataEllipse(A, B, levels = 0.95, fill = FALSE, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)#find the value of x that is closest to the mean value of A
vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))
#calculate the vertical distance to centre
vert.dist <- outer.ellipse[,2][vert.index] - B.mu
#draw the line one end of ellipse to the other
lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))#find the value of y that is closest to the mean valye of B
horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))
#calculate the horizonal distance to centre
horiz.dist <- outer.ellipse[,1][horiz.index] - A.mu
#draw the line one end of ellipse to the other
lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))
#add a legend
legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-10

And if you wanted a little more groove you can add shading.

dataEllipse(A, B, levels = 0.99, xlim = c(0,10), ylim = c(0,30), pch = 19, col = c("#00000080", "red"), center.pch = FALSE, plot.points = TRUE, fill = TRUE, fill.alpha = 0.1)
dataEllipse(A, B, levels=0.95 ,xlim = c(0,10), ylim = c(0,30), pch = 19, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE, fill = TRUE, fill.alpha = 0.3)
lines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist), col = "blue")
lines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu), col = "blue")
legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))

plot of chunk unnamed-chunk-11

Build a Function

We can also fold all of this into a function:

youden <- function(A,B,shape){A.mu <- mean(A)A.sd <- sd(A)B.mu <- mean(B)B.sd <- sd(B)plot(A,B, pch=19, col = "#00000080", xlim = c(A.mu - 5*A.sd,A.mu + 5*A.sd), ylim = c(B.mu - 5*B.sd,B.mu + 5*B.sd))grid()if (missing(shape)){shape <- "ellipse"}if (shape == "rectangle"){rect(xleft = A.mu - 2*A.sd, ybottom = B.mu - 2*B.sd, xright = A.mu + 2*A.sd, ytop = B.mu + 2*B.sd, lwd = 2, border = "orange")rect(xleft = A.mu - 3*A.sd, ybottom = B.mu - 3*B.sd, xright = A.mu + 3*A.sd, ytop = B.mu + 3*B.sd, lwd = 2, border = "red")lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd),c(B.mu - 3*B.sd, B.mu + 3*B.sd))lines(c(A.mu - 3*A.sd, A.mu + 3*A.sd), c(B.mu, B.mu))lines(c(A.mu, A.mu), c(B.mu - 3*B.sd, B.mu + 3*B.sd))legend("topleft", c("2SD limit","3SD limit"), col=c("orange","red"), lty=c(1,1))} else if (shape=="ellipse") {outer.ellipse <- dataEllipse(A, B, levels = 0.99, pch = 19, col=c("#00000080", "red"), draw = TRUE, center.pch = FALSE, plot.points = FALSE)dataEllipse(A, B, levels = 0.95, fill = FALSE, pch = 19, col=c("#00000080", "orange"), center.pch = FALSE, plot.points = FALSE, add = TRUE)vert.index <- which.min(abs(A.mu-outer.ellipse[,1]))vert.dist <- outer.ellipse[,2][vert.index] - B.mulines(c(A.mu, A.mu), c(outer.ellipse[,2][vert.index], outer.ellipse[,2][vert.index] - 2*vert.dist))horiz.index <- which.min(abs(B.mu-outer.ellipse[,2]))horiz.dist <- outer.ellipse[,1][horiz.index] - A.mulines(c(outer.ellipse[,1][horiz.index], outer.ellipse[,1][horiz.index] - 2*horiz.dist), c(B.mu,B.mu))legend("topleft", c("Central 95%","Central 99%"), col=c("orange","red"), lty=c(1,1))}
}#use the function
par(mfrow=c(1,2))
youden(A,B, shape = "rectangle")
youden(A,B, shape = "ellipse")

plot of chunk unnamed-chunk-12

Comparison with the Classic Youden Plot

If the data happens to be uncorrelated, and has identical average A and B values, the elliptical approach will generate a (nearly) circular Youden plot according to the original description. The youden.classic() function code shown below is borrowed from the stackoverflow post mentioned above.

youden.classic <- function(A, B){plot(A,B,asp = 1, xlab = "A", ylab = "B", pch=".") mB <- median(B)mA <- median(A)abline(h = mB, v = mA)curve(x-(mA-mB),add=TRUE)d <- mean(A-B)d_prime <- A-B-dr <- 2.45*mean(abs(d_prime))*sqrt(pi)/2t <- seq(0,2*pi,by=0.01)x <- r*cos(t)+mAy <- r*sin(t)+mBlines(x,y)
}#generate 10000 pairs of bivariate data with a mean of 10 for both A and B and 15% CV
A <- rnorm(10000,10,1.5)
B <- rnorm(10000,10,1.5)
youden.classic(A,B)#overlay ellipse
dataEllipse(A, B, levels = 0.95, fill = TRUE, fill.alpha = 0.3, col="orange", center.pch = FALSE, plot.points = FALSE, add = TRUE)

plot of chunk unnamed-chunk-13

Conclusion

There you have it. With a Youden Plot it is easy to separate the sheep from the goats. There are lots of ways that you can dress up your plot to suit your needs. Of course, this could be embedded into an automated EQA report generated with R, Rmarkdown and knitr.

I hope that was helpful.

-Dan

Making Youden Plots in R相关推荐

  1. R可视化ggplot2绘制重叠密度图(Overlay Density Plots)

    R可视化ggplot2绘制重叠密度图(Overlay Density Plots) 目录 R可视化ggplot2绘制重叠密度图(Overlay Density Plots) 创建仿真数据 数据格式变换 ...

  2. unity3d 可视化编程_R编程系列:R中的3D可视化

    unity3d 可视化编程 In the last blog, we have learned how to create "Dynamic Maps Using ggplot2" ...

  3. 在R中创建晶须和盒图

    Box plots in R are a good way to measure and visualize how closely your data is distributed. These a ...

  4. 如何创建多个条形图_在R中创建条形图

    如何创建多个条形图 Bar plots in R are the most frequently used plots in elementary statistics. These consist ...

  5. R语言绘制棒棒糖图(火柴杆图)

    本博客介绍几种利用R语言绘制棒棒糖图(火柴杆图)的方法. 2. 使用原生ggplot方法 最容易也是最简单想到的方法是直接使用ggplot2包进行更新,这里需要使用ggplot本身的特性,通过图层叠加 ...

  6. 如何在R中画出高效美观的相关性分析图

    转载自:http://site.douban.com/182577/widget/notes/12866356/note/267793990/ 没有前情的前情提要: 承蒙船长大人提携 有机会在小站写些 ...

  7. r语言时间序列图_R中的时间序列图

    r语言时间序列图 In this tutorial, we'll be going over how to create time series plots in R. Time series dat ...

  8. r语言ggplot合并图形_R中带有ggplot2的图形

    r语言ggplot合并图形 介绍 (Introduction) R is known to be a really powerful programming language when it come ...

  9. python教学视频r_R Tutorial

    教程列表: Customizing The Look of R Studio (R Tutorial 1.11) Correlation and Covariance in R (R Tutorial ...

最新文章

  1. Linux之软件安装 apt-get
  2. /* * 编程第三题(20分) 打印所有的水仙花数。所谓水仙花数是指一个三位数,其各位数字的立方和等于该数本身。(例153=1*1*1+3*3*3+5*5*5) */
  3. Spring的环绕通知
  4. ba+ii+plus模拟+android,财务计算器(BAII PLUS)
  5. python如何判断一段代码运行是否超出一定时间,如果超出则抛出异常?(检测函数运行是否超时,规定时间内执行,限制时间)eventlet模块 (eventlet.timeout.Timeout)
  6. CodeForces 615C
  7. http代理的脚本http_proxy.py
  8. 前端:CSS/10/伪类选择器,CSS列表属性,CSS边框属性,CSS内边距属性,CSS背景属性
  9. LibreOffice/Calc:带条件判断的求和
  10. angular之service、factory预provider区别
  11. python基础-软件目录开发规范
  12. 进销存管理系统基本功能
  13. apr内存池简单应用
  14. k8s 部署 xxl-job-admin:2.3.0
  15. IO与文件读写---使用Apache commons io包提高读写效率
  16. 卡尔曼滤波算法及C语言实现_源代码
  17. Git学习笔记之时光穿梭机
  18. java web开发(和vue联合开发)
  19. C语言解析wav文件格式
  20. 自学前端简历怎么写?项目怎么学?

热门文章

  1. NLP自然语言处理-英文文本电影影评分类2-pytorch版本
  2. TouchDesigner学习 -TOPs
  3. 调试ASP程序时,遇到程序运行错误时怎么查看具体错误位置呢?
  4. 五招破解跨部门协作难|跨部门协作实践与总结
  5. P2P通信中的NAT/FW穿越
  6. 【电源】之【常用稳压IC大全】
  7. matlab车牌识别错误,求解决Matlab车牌识别
  8. ThinkPad各型号Win7系统恢复光盘镜像下载【官方下载】绝对原版
  9. 802.11 帧封装细节
  10. 计算机专业发展1500字,【关于计算机起源及发展的论文1500字左右,论文形式.】...