`ggstatsplot`介绍

ggstatsplot是ggplot2包的扩展，主要用于创建美观的图片同时自动输出统计学分析结果，其统计学分析结果包含统计分析的详细信息，该包对于经常需要做统计分析的科研工作者来说非常有用。

一般情况下，数据可视化和统计建模是两个不同的阶段。而ggstatsplot的核心思想很简单：将这两个阶段合并为输出具有统计细节的图片，使数据探索更简单，更快捷。

ggstatsplot在统计学分析方面：目前它支持最常见的统计测试类型：t-test / anova，非参数，相关性分析，列联表分析和回归分析。而在图片输出方面：（1）小提琴图（用于不同组之间连续数据的异同分析）；（2）饼图（用于分类数据的分布检验）；（3）条形图（用于分类数据的分布检验）；（4）散点图（用于两个变量之间的相关性分析）；（5）相关矩阵（用于多个变量之间的相关性分析）；（6）直方图和点图/图表（关于分布的假设检验）；（7）点须图（用于回归模型）。

`ggstatsplot`包的常用函数

ggbetweenstats函数

此函数可创建小提琴图，箱形图或两者的混合，主要用于组间或条件之间的连续数据的比较，最简单的函数调用看起来像这样

# loading needed librarieslibrary(ggstatsplot)# for reproducibilityset.seed(123)# plotggstatsplot::ggbetweenstats(  data = iris,  x = Species,  y = Sepal.Length,  messages = FALSE) + # further modification outside of ggstatsplot  ggplot2::coord_cartesian(ylim = c(3, 8)) +  ggplot2::scale_y_continuous(breaks = seq(3, 8, by = 1))

从该图我们可以看出不同种类的iris在 Sepal.Length上有显著差异。但是其实我们可以修改参数，让该图看起来更加富有信息。

library(ggplot2)# for reproducibilityset.seed(123)# let's leave out one of the factor levels and see if instead of anova, a t-test will be runiris2 <- dplyr::filter(.data = iris, Species != "setosa")# let's change the levels of our factors, a common routine in data analysis# pipeline, to see if this function respects the new factor levelsiris2$Species <-  base::factor(    x = iris2$Species,    levels = c("virginica", "versicolor")  )# plotggstatsplot::ggbetweenstats(  data = iris2,  x = Species,  y = Sepal.Length,  notch = TRUE, # show notched box plot  mean.plotting = TRUE, # whether mean for each group is to be displayed  mean.ci = TRUE, # whether to display confidence interval for means  mean.label.size = 2.5, # size of the label for mean  type = "p", # which type of test is to be run  k = 3, # number of decimal places for statistical results  outlier.tagging = TRUE, # whether outliers need to be tagged  outlier.label = Sepal.Width, # variable to be used for the outlier tag  outlier.label.color = "darkgreen", # changing the color for the text label  xlab = "Type of Species", # label for the x-axis variable  ylab = "Attribute: Sepal Length", # label for the y-axis variable  title = "Dataset: Iris flower data set", # title text for the plot  ggtheme = ggthemes::theme_fivethirtyeight(), # choosing a different theme  ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer  package = "wesanderson", # package from which color palette is to be taken  palette = "Darjeeling1", # choosing a different color palette  messages = FALSE)

ggbetweenstats函数

ggbetweenstats函数的功能几乎与ggwithinstats相同。

# for reproducibility and dataset.seed(123)data("iris")ggstatsplot::ggwithinstats(  data = iris,  x = Species,  y = Sepal.Length,  messages = FALSE)

# plotggstatsplot::ggwithinstats(  data = iris,  x = Species,  y = Sepal.Length,  sort = "descending", # ordering groups along the x-axis based on  sort.fun = median, # values of `y` variable  pairwise.comparisons = TRUE,  pairwise.display = "s",  pairwise.annotation = "p",  title = "iris",  caption = "Data from: iris",  ggtheme = ggthemes::theme_fivethirtyeight(),  ggstatsplot.layer = FALSE,  messages = FALSE)

ggscatterstats函数

此函数使用ggExtra :: ggMarginal中的边缘直方图/箱线图/密度/小提琴/ densigram图创建散点图，并在副标题中显示统计分析结果：

ggstatsplot::ggscatterstats(  data = ggplot2::msleep,  x = sleep_rem,  y = awake,  xlab = "REM sleep (in hours)",  ylab = "Amount of time spent awake (in hours)",  title = "Understanding mammalian sleep",  messages = FALSE)

该图表达的是sleep_rem与awake存在相关性，其中X轴为sleep_rem，Y轴为awake。该图中右侧和上方的直方图代表的是数据的分布。该段数据越多，其柱子越高。

# for reproducibilityset.seed(123)# plotggstatsplot::ggscatterstats(  data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),  x = budget,  y = rating,  type = "robust", # type of test that needs to be run  conf.level = 0.99, # confidence level  xlab = "Movie budget (in million/ US$)", # label for x axis  ylab = "IMDB rating", # label for y axis  label.var = "title", # variable for labeling data points  label.expression = "rating < 5 & budget > 100", # expression that decides which points to label  line.color = "yellow", # changing regression line color line  title = "Movie budget and IMDB rating (action)", # title text for the plot  caption = expression( # caption text for the plot    paste(italic("Note"), ": IMDB stands for Internet Movie DataBase")  ),  ggtheme = theme_bw(), # choosing a different theme  ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer  marginal.type = "density", # type of marginal distribution to be displayed  xfill = "#0072B2", # color fill for x-axis marginal distribution  yfill = "#009E73", # color fill for y-axis marginal distribution  xalpha = 0.6, # transparency for x-axis marginal distribution  yalpha = 0.6, # transparency for y-axis marginal distribution  centrality.para = "median", # central tendency lines to be displayed  messages = FALSE # turn off messages and notes)

ggbarstats柱状图

ggbarstats函数主要用于展示不同组之间分类数据的分布问题。比如说说A组患者中，男女的比例是否与B组患者中男女的比例存在异同。

# for reproducibilityset.seed(123)# plotggstatsplot::ggbarstats(  data = ggstatsplot::movies_long,  main = mpaa,  condition = genre,  sampling.plan = "jointMulti",  title = "MPAA Ratings by Genre",  xlab = "movie genre",  perc.k = 1,  x.axis.orientation = "slant",  ggtheme = hrbrthemes::theme_modern_rc(),  ggstatsplot.layer = FALSE,  ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),  palette = "Set2",  messages = FALSE)

该图比较的是不同组之间，分类数据的分布是否存在异同。同样可以修改参数让它显得更加复杂和美观。

gghistostats

如果您希望查看一个变量的分布并通过一个样本测试检查它是否与指定值明显不同，则此功能将允许您这样做。

ggstatsplot::gghistostats(  data = ToothGrowth, # dataframe from which variable is to be taken  x = len, # numeric variable whose distribution is of interest  title = "Distribution of Sepal.Length", # title for the plot  fill.gradient = TRUE, # use color gradient  test.value = 10, # the comparison value for t-test  test.value.line = TRUE, # display a vertical line at test value  type = "bf", # bayes factor for one sample t-test  bf.prior = 0.8, # prior width for calculating the bayes factor  messages = FALSE # turn off the messages)

ggdotplotstats

此函数类似于gghistostats，但在数字变量也有标签时使用。

# for reproducibilityset.seed(123)# plotggdotplotstats(  data = dplyr::filter(.data = gapminder::gapminder, continent == "Asia"),  y = country,  x = lifeExp,  test.value = 55,  test.value.line = TRUE,  test.line.labeller = TRUE,  test.value.color = "red",  centrality.para = "median",  centrality.k = 0,  title = "Distribution of life expectancy in Asian continent",  xlab = "Life expectancy",  messages = FALSE,  caption = substitute(    paste(      italic("Source"),      ": Gapminder dataset from https://www.gapminder.org/"    )  ))

ggcorrmat

ggcorrmat函数主要用于变量之间的相关性分析

# for reproducibilityset.seed(123)# as a default this function outputs a correlalogram plotggstatsplot::ggcorrmat(  data = ggplot2::msleep,  corr.method = "robust", # correlation method  sig.level = 0.001, # threshold of significance  p.adjust.method = "holm", # p-value adjustment method for multiple comparisons  cor.vars = c(sleep_rem, awake:bodywt), # a range of variables can be selected  cor.vars.names = c(    "REM sleep", # variable names    "time awake",    "brain weight",    "body weight"  ),  matrix.type = "upper", # type of visualization matrix  colors = c("#B2182B", "white", "#4D4D4D"),  title = "Correlalogram for mammals sleep dataset",  subtitle = "sleep units: hours; weight units: kilograms")

ggcoefstats

ggcoefstats创建了很多回归系数的点估计值作为带有置信区间的点。

# for reproducibilityset.seed(123)# modelmod <- stats::lm(  formula = mpg ~ am * cyl,  data = mtcars)# plotggstatsplot::ggcoefstats(x = mod)

用其他包绘图，同时用ggstatsplot包展示统计分析结果

# for reproducibilityset.seed(123)# loading the needed librarieslibrary(yarrr)library(ggstatsplot)# using `ggstatsplot` to get call with statistical resultsstats_results <-  ggstatsplot::ggbetweenstats(    data = ChickWeight,    x = Time,    y = weight,    return = "subtitle",    messages = FALSE  )# using `yarrr` to create plotyarrr::pirateplot(  formula = weight ~ Time,  data = ChickWeight,  theme = 1,  main = stats_results)

如图所示，我们使用yarrr包绘制图片，但是同时使用了来自ggstatsplot 包得到的stats_results结果

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。

ggstatsplot介绍

ggstatsplot包的常用函数

ggbetweenstats函数

ggbetweenstats函数

ggscatterstats函数

ggbarstats柱状图

gghistostats

ggdotplotstats

ggcorrmat

ggcoefstats

用其他包绘图，同时用ggstatsplot包展示统计分析结果

`ggstatsplot`介绍

`ggstatsplot`包的常用函数