Package 'microViz' reference manual

Title:	Microbiome Data Analysis and Visualization
Description:	Microbiome data visualization and statistics tools built upon phyloseq.
Authors:	David Barnett [aut, cre]
Maintainer:	David Barnett <[email protected]>
License:	GPL (>= 3)
Version:	0.12.6
Built:	2025-03-16 05:53:09 UTC
Source:	https://github.com/david-barnett/microViz

Add paths connecting points on a ggplot scatterplot

Description

Useful for tracing a few select individuals over time on an ordination plot. Samples in phyloseq must be arranged in order of timepoint for the path connections to be drawn in the correct order! You can arrange the samples in timepoint order with ps_arrange.

Usage

add_paths(
  ggplot,
  id_var,
  id_values,
  mapping = NULL,
  arrow = grid::arrow(length = grid::unit(2, units = "mm")),
  ...
)
add_paths(
  ggplot,
  id_var,
  id_values,
  mapping = NULL,
  arrow = grid::arrow(length = grid::unit(2, units = "mm")),
  ...
)

Arguments

`ggplot`	ggplot scatterplot such as the output of ord_plot
`id_var`	name of variable used to identify grouping of points
`id_values`	values of id_var variable used to identify groups of points to draw
`mapping`	ggplot aesthetics created by aes(), e.g. aes(colour = ?) - group is already set to id_var internally!
`arrow`	arrowhead to add to path, NULL for none
`...`	additional arguments passed to geom_path

Value

ggplot with added layer with geom_path

Examples

library(ggplot2)
data("dietswap", package = "microbiome")

# arrange by timepoint first (or whatever your own time variable is)
dietswap %>%
  ps_arrange(timepoint) %>%
  tax_fix() %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "timepoint", alpha = 0.5, size = 2) %>%
  add_paths(
    id_var = "subject", id_values = c("azl", "byn"),
    mapping = aes(colour = timepoint), linewidth = 1.5
    # size = 1.5 # size instead of linewidth in older ggplot2 versions
  )

# paths do NOT connect points in the correct order without arranging first
dietswap %>%
  tax_fix() %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "timepoint", alpha = 0.5) %>%
  add_paths(
    id_var = "subject", id_values = c("azl", "byn"),
    mapping = aes(colour = timepoint), linewidth = 1.5
    # size = 1.5 # size instead of linewidth in older ggplot2 versions
  ) +
  ggtitle("WRONG PATH ORDER", "use ps_arrange first!")
library(ggplot2)
data("dietswap", package = "microbiome")

# arrange by timepoint first (or whatever your own time variable is)
dietswap %>%
  ps_arrange(timepoint) %>%
  tax_fix() %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "timepoint", alpha = 0.5, size = 2) %>%
  add_paths(
    id_var = "subject", id_values = c("azl", "byn"),
    mapping = aes(colour = timepoint), linewidth = 1.5
    # size = 1.5 # size instead of linewidth in older ggplot2 versions
  )

# paths do NOT connect points in the correct order without arranging first
dietswap %>%
  tax_fix() %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "timepoint", alpha = 0.5) %>%
  add_paths(
    id_var = "subject", id_values = c("azl", "byn"),
    mapping = aes(colour = timepoint), linewidth = 1.5
    # size = 1.5 # size instead of linewidth in older ggplot2 versions
  ) +
  ggtitle("WRONG PATH ORDER", "use ps_arrange first!")

Simple heatmap helper to get a default adjacent side for another annotation

Description

Simple heatmap helper to get a default adjacent side for another annotation

Usage

adjacent_side(side = c("top", "right", "bottom", "left"))
adjacent_side(side = c("top", "right", "bottom", "left"))

Arguments

side

one of "top", "right", "bottom", or "left"

Value

character

Examples

adjacent_side("top")
adjacent_side("top")

Create colored rectangle annotations for categorical data

Description

Similar to anno_simple but with individual boxes!

Usage

anno_cat(
  x,
  which,
  renamer = identity,
  col = distinct_palette(),
  width = NULL,
  height = NULL,
  box_col = "white",
  box_lwd = 0.5,
  border_col = NA,
  border_lwd = 1,
  legend = TRUE,
  legend_title = ""
)
anno_cat(
  x,
  which,
  renamer = identity,
  col = distinct_palette(),
  width = NULL,
  height = NULL,
  box_col = "white",
  box_lwd = 0.5,
  border_col = NA,
  border_lwd = 1,
  legend = TRUE,
  legend_title = ""
)

Arguments

`x`	data vector, treated as categorical
`which`	Whether it is a column annotation or a row annotation?
`renamer`	function renaming variable values for legend
`col`	colors vector, at least as long as unique(x), optionally named by x levels
`width`	grid unit object or NULL
`height`	grid unit object or NULL
`box_col`	colour of boxes around individual cells
`box_lwd`	line width of boxes around individual cells
`border_col`	colour of border around all cells
`border_lwd`	line width of border around all cells
`legend`	generate legend for this annotation (attached as attribute of heatmap, and not automatically included in plot)
`legend_title`	title for legend, if drawn

Value

AnnotationFunction

Examples

library(ComplexHeatmap)
# draw the annotation without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)

grid::grid.newpage()
pushViewport(vp)
cats <- letters[1:4]
draw(anno_cat(cats, which = "row"))

grid::grid.newpage()
pushViewport(vp)
draw(
  anno_cat(
    x = cats, col = structure(names = cats, 1:4), which = "column",
    box_col = "black", box_lwd = 5
  )
)

# developer note #
# list of annotations can be split and ordered (adding NULL makes a list)
# https://jokergoo.github.io/ComplexHeatmap-reference/book/a-list-of-heatmaps.html
# (section #4.8 concatenate-only-the-annotations)
grid::grid.newpage()
pushViewport(vp)
annoList <- rowAnnotation(
  hi = anno_cat(cats, which = "row", border_col = "black")
) +
  NULL
draw(object = annoList, row_split = c(1, 1:3), row_order = 4:1)
pushViewport(viewport(x = 0.6))
draw(anno_cat(cats, "row", legend_title = "abcd") %>% attr("Legend"))
library(ComplexHeatmap)
# draw the annotation without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)

grid::grid.newpage()
pushViewport(vp)
cats <- letters[1:4]
draw(anno_cat(cats, which = "row"))

grid::grid.newpage()
pushViewport(vp)
draw(
  anno_cat(
    x = cats, col = structure(names = cats, 1:4), which = "column",
    box_col = "black", box_lwd = 5
  )
)

# developer note #
# list of annotations can be split and ordered (adding NULL makes a list)
# https://jokergoo.github.io/ComplexHeatmap-reference/book/a-list-of-heatmaps.html
# (section #4.8 concatenate-only-the-annotations)
grid::grid.newpage()
pushViewport(vp)
annoList <- rowAnnotation(
  hi = anno_cat(cats, which = "row", border_col = "black")
) +
  NULL
draw(object = annoList, row_split = c(1, 1:3), row_order = 4:1)
pushViewport(viewport(x = 0.6))
draw(anno_cat(cats, "row", legend_title = "abcd") %>% attr("Legend"))

Convenience function for generating a legend for anno_cat annotations.

Description

Convenience function for generating a legend for anno_cat annotations.

Usage

anno_cat_legend(col, x = NULL, renamer = identity, title = "", ...)
anno_cat_legend(col, x = NULL, renamer = identity, title = "", ...)

Arguments

`col`	vector of colors, named by all levels of data (e.g. x) or not named
`x`	optional: vector of data to pair with unnamed col or check against named col
`renamer`	function applied to generate labels: from names(col) or levels of x
`title`	title of legend
`...`	Arguments passed on to `ComplexHeatmap::Legend` `labels` Labels corresponding to `at`. If it is not specified, the values of `at` are taken as labels. `nrow` For legend which is represented as grids, `nrow` controls number of rows of the grids if the grids are arranged into multiple rows. `ncol` Similar as `nrow`, `ncol` controls number of columns of the grids if the grids are arranged into multiple columns. Note at a same time only one of `nrow` and `ncol` can be specified. `by_row` Are the legend grids arranged by rows or by columns? `grid_height` The height of legend grid. It can also control the height of the continuous legend if it is horizontal. `grid_width` The width of legend grid. It can also control the width of the continuous legend if it is vertical. `gap` If legend grids are put into multiple rows or columns, this controls the gap between neighbouring rows or columns, measured as a `unit` object. `labels_gp` Graphic parameters for labels. `labels_rot` Text rotation for labels. It should only be used for horizontal continuous legend. `border` Color of legend grid borders. It also works for the ticks in the continuous legend. `type` Type of legends. The value can be one of `grid`, `points`, `lines` and `boxplot`. `direction` Direction of the legend, vertical or horizontal? `title_position` Position of title relative to the legend. `topleft`, `topcenter`, `leftcenter-rot` and `lefttop-rot` are only for vertical legend and `leftcenter`, `lefttop` are only for horizontal legend. `title_gap` Gap between title and the legend body.

Value

a ComplexHeatmap Legend class object

Examples

grid::grid.newpage()
ComplexHeatmap::draw(
  anno_cat_legend(
    col = c("ibd" = "blue", "nonibd" = "grey90"),
    renamer = toupper, title = "Hi there, I'm a title"
  )
)
grid::grid.newpage()
ComplexHeatmap::draw(
  anno_cat_legend(
    col = c("ibd" = "blue", "nonibd" = "grey90"),
    renamer = toupper, title = "Hi there, I'm a title"
  )
)

Helper to specify simple comp_heatmap annotation for other sample data

Description

Use this as an argument to sampleAnnotation(), which itself is used by comp_heatmap() as sample_anno argument.

This creates a vector, which sampleAnnotation() interprets as a simple annotation, so then you set colours and legend parameters for each simple annotation as further arguments in sampleAnnotation.

Usage

anno_sample(var, fun = identity, data = NULL, samples = NULL)
anno_sample(var, fun = identity, data = NULL, samples = NULL)

Arguments

`var`	name of variable to use for annotation data
`fun`	function to transform variable `var`
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`samples`	OPTIONAL selection vector of sample names, only set this if providing data argument to override default

Value

vector of values

Examples

# see `?sampleAnnotation()`
# see `?sampleAnnotation()`

Helper to specify comp_heatmap annotation for categorical sample data

Description

Use this as an argument to sampleAnnotation(), which itself is used by comp_heatmap() as sample_anno argument.

Usage

anno_sample_cat(
  var,
  col = distinct_palette(),
  renamer = identity,
  size = grid::unit(5, "mm"),
  legend = TRUE,
  legend_title = "",
  box_col = "white",
  box_lwd = 0.5,
  border_col = NA,
  border_lwd = 1,
  data = NULL,
  samples = NULL,
  which = NULL,
  ...
)
anno_sample_cat(
  var,
  col = distinct_palette(),
  renamer = identity,
  size = grid::unit(5, "mm"),
  legend = TRUE,
  legend_title = "",
  box_col = "white",
  box_lwd = 0.5,
  border_col = NA,
  border_lwd = 1,
  data = NULL,
  samples = NULL,
  which = NULL,
  ...
)

Arguments

`var`	name of variable to use for annotation data
`col`	colors vector, at least as long as unique(x), optionally named by x levels
`renamer`	function to rename levels of variable `var`
`size`	width or height as a grid unit object
`legend`	generate legend for this annotation (attached as attribute of heatmap, and not automatically included in plot)
`legend_title`	title for legend, if drawn
`box_col`	colour of boxes around individual cells
`box_lwd`	line width of boxes around individual cells
`border_col`	colour of border around all cells
`border_lwd`	line width of border around all cells
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`samples`	OPTIONAL selection vector of sample names, only set this if providing data argument to override default
`which`	OPTIONAL indicating if it is a 'column' or a 'row' annotation, only set this if providing data argument to override default
`...`	Arguments passed on to `anno_cat` `x` data vector, treated as categorical `width` grid unit object or NULL `height` grid unit object or NULL

Value

vector of values

Examples

library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- ibd
samples <- phyloseq::sample_names(psq)

# makes a function that takes data, taxa and which (at minimum)
fun <- anno_sample_cat(var = "ibd")

# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "row", samples = samples)

# draw the barplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)

grid::grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)
# A legend is attached by default to anno_cat() output, let's plot that.
pushViewport(viewport(x = 0.75))
draw(attr(heatmapAnnoFunction, "Legend"))

# change some options and specify the data up front
grid::grid.newpage()
pushViewport(vp)
anno_sample_cat(
  data = psq, var = "DiseaseState", samples = samples, which = "column",
  size = grid::unit(5, "cm"), col = distinct_palette(pal = "kelly")
) %>%
  draw()
library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- ibd
samples <- phyloseq::sample_names(psq)

# makes a function that takes data, taxa and which (at minimum)
fun <- anno_sample_cat(var = "ibd")

# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "row", samples = samples)

# draw the barplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)

grid::grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)
# A legend is attached by default to anno_cat() output, let's plot that.
pushViewport(viewport(x = 0.75))
draw(attr(heatmapAnnoFunction, "Legend"))

# change some options and specify the data up front
grid::grid.newpage()
pushViewport(vp)
anno_sample_cat(
  data = psq, var = "DiseaseState", samples = samples, which = "column",
  size = grid::unit(5, "cm"), col = distinct_palette(pal = "kelly")
) %>%
  draw()

Helper to specify heatmap annotation for showing taxa abundance on boxplot

Description

Use this as an argument to taxAnnotation(), which itself is used by cor_heatmap and comp_heatmap as tax_anno argument.

Usage

anno_tax_box(
  undetected = 0,
  only_detected = TRUE,
  trans = "compositional",
  zero_replace = 0,
  use_counts = TRUE,
  size = grid::unit(30, "mm"),
  border = TRUE,
  gp = grid::gpar(fill = "#CCCCCC"),
  ylim = NULL,
  extend = 0.05,
  outline = TRUE,
  box_width = 0.6,
  pch = 1,
  pointsize = grid::unit(0.5, "mm"),
  axis = TRUE,
  ...,
  data = NULL,
  taxa = NULL,
  which = NULL
)
anno_tax_box(
  undetected = 0,
  only_detected = TRUE,
  trans = "compositional",
  zero_replace = 0,
  use_counts = TRUE,
  size = grid::unit(30, "mm"),
  border = TRUE,
  gp = grid::gpar(fill = "#CCCCCC"),
  ylim = NULL,
  extend = 0.05,
  outline = TRUE,
  box_width = 0.6,
  pch = 1,
  pointsize = grid::unit(0.5, "mm"),
  axis = TRUE,
  ...,
  data = NULL,
  taxa = NULL,
  which = NULL
)

Arguments

`undetected`	the value above which taxa are classed as detected/present in a sample
`only_detected`	only plot values for samples where the taxon abundance is > undetected
`trans`	name of transformation suitable for tax_transform, or a function calling tax_transform, and/or tax_scale, (a function must take a phyloseq or psExtra, and return one)
`zero_replace`	zero_replace value for for tax_transform, ignored if trans is a function
`use_counts`	try to retrieve counts from data object?
`size`	width or height as a grid unit object
`border`	Wether draw borders of the annotation region?
`gp`	Graphic parameters for the boxes. The length of the graphic parameters should be one or the number of observations.
`ylim`	Data ranges.
`extend`	The extension to both side of `ylim`. The value is a percent value corresponding to `ylim[2] - ylim[1]`.
`outline`	Whether draw outline of boxplots?
`box_width`	Relative width of boxes. The value should be smaller than one.
`pch`	Point style.
`pointsize`	size of outlier points, as grid::unit() object
`axis`	Whether to add axis?
`...`	Arguments passed on to `ComplexHeatmap::anno_boxplot` `axis_param` parameters for controlling axis. See `default_axis_param` for all possible settings and default parameters.
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`taxa`	OPTIONAL selection vector of taxa (names, numbers or logical), only set this if providing data argument to override default
`which`	OPTIONAL indicating if it is a 'column' or a 'row' annotation, only set this if providing data argument to override default

Value

function or ComplexHeatmap AnnotationFunction object

Examples

library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")
# makes a function that takes data, taxa and which (at minimum)
fun <- anno_tax_box()
# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "column", taxa = taxa)
# draw the barplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)

# let's change some style options and specify the data up front
grid::grid.newpage()
pushViewport(vp)
draw(anno_tax_box(
  data = psq, taxa = taxa, which = "row", pointsize = grid::unit(1, "mm"),
  gp = grid::gpar(fill = "red"), border = FALSE, box_width = 0.2
))

# clear drawings
grid::grid.newpage()
library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")
# makes a function that takes data, taxa and which (at minimum)
fun <- anno_tax_box()
# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "column", taxa = taxa)
# draw the barplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)

# let's change some style options and specify the data up front
grid::grid.newpage()
pushViewport(vp)
draw(anno_tax_box(
  data = psq, taxa = taxa, which = "row", pointsize = grid::unit(1, "mm"),
  gp = grid::gpar(fill = "red"), border = FALSE, box_width = 0.2
))

# clear drawings
grid::grid.newpage()

Helper to specify heatmap annotation for showing taxa abundance density plot

Description

Use this as an argument to taxAnnotation(), which itself is used by cor_heatmap and comp_heatmap as tax_anno argument.

Usage

anno_tax_density(
  undetected = 0,
  only_detected = TRUE,
  trans = "log10p",
  zero_replace = 0,
  use_counts = TRUE,
  size = grid::unit(30, "mm"),
  type = c("lines", "violin", "heatmap"),
  xlim = NULL,
  heatmap_colors = c("white", "forestgreen"),
  joyplot_scale = 1.5,
  border = TRUE,
  gp = grid::gpar(fill = "lightgrey"),
  axis = TRUE,
  ...,
  data = NULL,
  taxa = NULL,
  which = NULL
)
anno_tax_density(
  undetected = 0,
  only_detected = TRUE,
  trans = "log10p",
  zero_replace = 0,
  use_counts = TRUE,
  size = grid::unit(30, "mm"),
  type = c("lines", "violin", "heatmap"),
  xlim = NULL,
  heatmap_colors = c("white", "forestgreen"),
  joyplot_scale = 1.5,
  border = TRUE,
  gp = grid::gpar(fill = "lightgrey"),
  axis = TRUE,
  ...,
  data = NULL,
  taxa = NULL,
  which = NULL
)

Arguments

`undetected`	the value above which taxa are classed as detected/present in a sample
`only_detected`	only plot values for samples where the taxon abundance is > undetected
`trans`	name of transformation suitable for tax_transform, or a function calling tax_transform, and/or tax_scale, (a function must take a phyloseq or psExtra, and return one)
`zero_replace`	zero_replace value for for tax_transform, ignored if trans is a function
`use_counts`	try to retrieve counts from data object?
`size`	width or height as a grid unit object
`type`	Type of graphics to represent density distribution. "lines" for normal density plot; "violine" for violin plot and "heatmap" for heatmap visualization of density distribution.
`xlim`	Range on x-axis.
`heatmap_colors`	A vector of colors for interpolating density values.
`joyplot_scale`	Relative height of density distribution. A value higher than 1 increases the height of the density distribution and the plot will represented as so-called "joyplot".
`border`	Wether draw borders of the annotation region?
`gp`	Graphic parameters for the boxes. The length of the graphic parameters should be one or the number of observations.
`axis`	Whether to add axis?
`...`	Arguments passed on to `ComplexHeatmap::anno_density` `axis_param` parameters for controlling axis. See `default_axis_param` for all possible settings and default parameters.
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`taxa`	OPTIONAL selection vector of taxa (names, numbers or logical), only set this if providing data argument to override default
`which`	OPTIONAL indicating if it is a 'column' or a 'row' annotation, only set this if providing data argument to override default

Value

function or ComplexHeatmap AnnotationFunction object

Examples

library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")
# makes a function that takes data, taxa and which (at minimum)
fun <- anno_tax_density()
# manually specify the density plot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "column", taxa = taxa)

# draw the density plot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)

# let's change some style options and specify the data up front
grid.newpage()
pushViewport(vp)
draw(anno_tax_density(
  data = psq, taxa = taxa, which = "row",
  gp = grid::gpar(fill = "red"), border = FALSE
))

# heatmap type, with alternative transformation and axis_param
grid.newpage()
pushViewport(vp)
draw(anno_tax_density(
  data = psq, taxa = taxa, which = "row", type = "heatmap",
  trans = "log2", zero_replace = "halfmin", axis_param = list(labels_rot = 0)
))

grid.newpage()
library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")
# makes a function that takes data, taxa and which (at minimum)
fun <- anno_tax_density()
# manually specify the density plot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "column", taxa = taxa)

# draw the density plot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)

# let's change some style options and specify the data up front
grid.newpage()
pushViewport(vp)
draw(anno_tax_density(
  data = psq, taxa = taxa, which = "row",
  gp = grid::gpar(fill = "red"), border = FALSE
))

# heatmap type, with alternative transformation and axis_param
grid.newpage()
pushViewport(vp)
draw(anno_tax_density(
  data = psq, taxa = taxa, which = "row", type = "heatmap",
  trans = "log2", zero_replace = "halfmin", axis_param = list(labels_rot = 0)
))

grid.newpage()

Helper to specify heatmap annotation for showing taxa prevalence as barplot

Description

Use this as an argument to taxAnnotation(), which itself is used by cor_heatmap and comp_heatmap as tax_anno argument.

Usage

anno_tax_prev(
  undetected = 0,
  use_counts = TRUE,
  size = grid::unit(20, "mm"),
  baseline = 0,
  border = TRUE,
  bar_width = 0.6,
  gp = grid::gpar(fill = "#CCCCCC"),
  ylim = NULL,
  extend = 0.05,
  axis = TRUE,
  ...,
  data = NULL,
  taxa = NULL,
  which = NULL
)
anno_tax_prev(
  undetected = 0,
  use_counts = TRUE,
  size = grid::unit(20, "mm"),
  baseline = 0,
  border = TRUE,
  bar_width = 0.6,
  gp = grid::gpar(fill = "#CCCCCC"),
  ylim = NULL,
  extend = 0.05,
  axis = TRUE,
  ...,
  data = NULL,
  taxa = NULL,
  which = NULL
)

Arguments

`undetected`	the value above which taxa are classed as detected/present in a sample
`use_counts`	try to retrieve counts from data object?
`size`	width or height as a grid unit object
`baseline`	baseline of bars. The value should be "min" or "max", or a numeric value. It is enforced to be zero for stacked barplots.
`border`	Wether draw borders of the annotation region?
`bar_width`	Relative width of the bars. The value should be smaller than one.
`gp`	Graphic parameters for bars. The length of each graphic parameter can be 1, length of `x` if `x` is a vector, or number of columns of `x` is `x` is a matrix.
`ylim`	Data ranges. By default it is `range(x)` if `x` is a vector, or `range(rowSums(x))` if `x` is a matrix.
`extend`	The extension to both side of `ylim`. The value is a percent value corresponding to `ylim[2] - ylim[1]`.
`axis`	Whether to add axis?
`...`	Arguments passed on to `ComplexHeatmap::anno_barplot` `axis_param` parameters for controlling axis. See `default_axis_param` for all possible settings and default parameters.
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`taxa`	OPTIONAL selection vector of taxa (names, numbers or logical), only set this if providing data argument to override default
`which`	OPTIONAL indicating if it is a 'column' or a 'row' annotation, only set this if providing data argument to override default

Value

function or ComplexHeatmap AnnotationFunction object

Examples

library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")

# makes a function that takes data, taxa and which (at minimum)
fun <- anno_tax_prev()

# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "row", taxa = taxa)

# draw the barplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)

grid::grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)

# let's change some style options and specify the data up front
grid::grid.newpage()
pushViewport(vp)
anno_tax_prev(
  data = psq, taxa = taxa, which = "column",
  gp = grid::gpar(fill = "red", lwd = 3, alpha = 0.5),
  border = FALSE, bar_width = 1
) %>%
  draw()

# clear drawings
grid::grid.newpage()
library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")

# makes a function that takes data, taxa and which (at minimum)
fun <- anno_tax_prev()

# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(data = psq, which = "row", taxa = taxa)

# draw the barplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)

grid::grid.newpage()
pushViewport(vp)
draw(heatmapAnnoFunction)

# let's change some style options and specify the data up front
grid::grid.newpage()
pushViewport(vp)
anno_tax_prev(
  data = psq, taxa = taxa, which = "column",
  gp = grid::gpar(fill = "red", lwd = 3, alpha = 0.5),
  border = FALSE, bar_width = 1
) %>%
  draw()

# clear drawings
grid::grid.newpage()

Helper to specify heatmap annotation for variable distribution boxplots

Description

Use this as an argument to varAnnotation(), which itself is used by cor_heatmap as var_anno() argument.

Usage

anno_var_box(
  fun = identity,
  size = grid::unit(30, "mm"),
  border = TRUE,
  gp = grid::gpar(fill = "#CCCCCC"),
  ylim = NULL,
  extend = 0.05,
  outline = TRUE,
  box_width = 0.6,
  pch = 1,
  pointsize = grid::unit(0.5, "mm"),
  axis = TRUE,
  ...,
  data = NULL,
  vars = NULL,
  which = NULL
)
anno_var_box(
  fun = identity,
  size = grid::unit(30, "mm"),
  border = TRUE,
  gp = grid::gpar(fill = "#CCCCCC"),
  ylim = NULL,
  extend = 0.05,
  outline = TRUE,
  box_width = 0.6,
  pch = 1,
  pointsize = grid::unit(0.5, "mm"),
  axis = TRUE,
  ...,
  data = NULL,
  vars = NULL,
  which = NULL
)

Arguments

`fun`	function applied to all variables, with apply()
`size`	width or height as a grid unit object
`border`	Wether draw borders of the annotation region?
`gp`	Graphic parameters for the boxes. The length of the graphic parameters should be one or the number of observations.
`ylim`	Data ranges.
`extend`	The extension to both side of `ylim`. The value is a percent value corresponding to `ylim[2] - ylim[1]`.
`outline`	Whether draw outline of boxplots?
`box_width`	Relative width of boxes. The value should be smaller than one.
`pch`	Point style.
`pointsize`	size of outlier points, as grid::unit() object
`axis`	Whether to add axis?
`...`	Arguments passed on to `ComplexHeatmap::anno_boxplot` `axis_param` parameters for controlling axis. See `default_axis_param` for all possible settings and default parameters.
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`vars`	OPTIONAL selection vector of variable names, only set this if providing data argument to override default
`which`	OPTIONAL indicating if it is a 'column' or a 'row' annotation, only set this if providing data argument to override default

Value

function or ComplexHeatmap AnnotationFunction object

Examples

library(ComplexHeatmap)
set.seed(123)
fakeData <- as.data.frame.matrix(matrix(rnorm(500, 10, 3), ncol = 10))
names(fakeData) <- paste0("var_", 1:10)

# draw the boxplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(
  anno_var_box(data = fakeData, vars = names(fakeData), which = "column")
)

grid.newpage()
pushViewport(vp)
draw(
  anno_var_box(
    data = fakeData, fun = function(x) log(x + 1),
    vars = rev(names(fakeData)),
    which = "row"
  )
)
library(ComplexHeatmap)
set.seed(123)
fakeData <- as.data.frame.matrix(matrix(rnorm(500, 10, 3), ncol = 10))
names(fakeData) <- paste0("var_", 1:10)

# draw the boxplot without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(
  anno_var_box(data = fakeData, vars = names(fakeData), which = "column")
)

grid.newpage()
pushViewport(vp)
draw(
  anno_var_box(
    data = fakeData, fun = function(x) log(x + 1),
    vars = rev(names(fakeData)),
    which = "row"
  )
)

Helper to specify heatmap annotation for variable distribution density plot

Description

Use this as an argument to varAnnotation(), which itself is used by cor_heatmap var_anno argument.

Usage

anno_var_density(
  fun = identity,
  size = grid::unit(30, "mm"),
  type = c("lines", "violin", "heatmap"),
  xlim = NULL,
  heatmap_colors = c("white", "forestgreen"),
  joyplot_scale = 1.5,
  border = TRUE,
  gp = grid::gpar(fill = "lightgrey"),
  axis = TRUE,
  ...,
  data = NULL,
  vars = NULL,
  which = NULL
)
anno_var_density(
  fun = identity,
  size = grid::unit(30, "mm"),
  type = c("lines", "violin", "heatmap"),
  xlim = NULL,
  heatmap_colors = c("white", "forestgreen"),
  joyplot_scale = 1.5,
  border = TRUE,
  gp = grid::gpar(fill = "lightgrey"),
  axis = TRUE,
  ...,
  data = NULL,
  vars = NULL,
  which = NULL
)

Arguments

`fun`	function applied to all variables, with apply()
`size`	width or height as a grid unit object
`type`	Type of graphics to represent density distribution. "lines" for normal density plot; "violine" for violin plot and "heatmap" for heatmap visualization of density distribution.
`xlim`	Range on x-axis.
`heatmap_colors`	A vector of colors for interpolating density values.
`joyplot_scale`	Relative height of density distribution. A value higher than 1 increases the height of the density distribution and the plot will represented as so-called "joyplot".
`border`	Wether draw borders of the annotation region?
`gp`	Graphic parameters for the boxes. The length of the graphic parameters should be one or the number of observations.
`axis`	Whether to add axis?
`...`	Arguments passed on to `ComplexHeatmap::anno_density` `axis_param` parameters for controlling axis. See `default_axis_param` for all possible settings and default parameters.
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`vars`	OPTIONAL selection vector of variable names, only set this if providing data argument to override default
`which`	OPTIONAL indicating if it is a 'column' or a 'row' annotation, only set this if providing data argument to override default

Value

function or ComplexHeatmap AnnotationFunction object

Examples

library(ComplexHeatmap)
set.seed(123)
fakeData <- as.data.frame.matrix(matrix(rnorm(500, 10, 3), ncol = 10))
names(fakeData) <- paste0("var_", 1:10)

# draw the plots without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(
  anno_var_density(data = fakeData, vars = names(fakeData), which = "row")
)

grid.newpage()
pushViewport(vp)
draw(
  anno_var_density(
    data = fakeData, fun = function(x) log(x + 1),
    vars = rev(names(fakeData)), type = "heatmap",
    which = "column"
  )
)
library(ComplexHeatmap)
set.seed(123)
fakeData <- as.data.frame.matrix(matrix(rnorm(500, 10, 3), ncol = 10))
names(fakeData) <- paste0("var_", 1:10)

# draw the plots without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(
  anno_var_density(data = fakeData, vars = names(fakeData), which = "row")
)

grid.newpage()
pushViewport(vp)
draw(
  anno_var_density(
    data = fakeData, fun = function(x) log(x + 1),
    vars = rev(names(fakeData)), type = "heatmap",
    which = "column"
  )
)

Helper to specify heatmap annotation for variable distribution histograms

Description

Use this as an argument to varAnnotation(), which itself is used by cor_heatmap var_anno argument.

Usage

anno_var_hist(
  fun = identity,
  size = grid::unit(30, "mm"),
  n_breaks = 11,
  border = FALSE,
  gp = grid::gpar(fill = "#CCCCCC"),
  axis = TRUE,
  ...,
  data = NULL,
  vars = NULL,
  which = NULL
)
anno_var_hist(
  fun = identity,
  size = grid::unit(30, "mm"),
  n_breaks = 11,
  border = FALSE,
  gp = grid::gpar(fill = "#CCCCCC"),
  axis = TRUE,
  ...,
  data = NULL,
  vars = NULL,
  which = NULL
)

Arguments

`fun`	function applied to all variables, with apply()
`size`	width or height as a grid unit object
`n_breaks`	number of breaks
`border`	Wether draw borders of the annotation region?
`gp`	Graphic parameters for the boxes. The length of the graphic parameters should be one or the number of observations.
`axis`	Whether to add axis?
`...`	Arguments passed on to `ComplexHeatmap::anno_density` `axis_param` parameters for controlling axis. See `default_axis_param` for all possible settings and default parameters.
`data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`vars`	OPTIONAL selection vector of variable names, only set this if providing data argument to override default
`which`	OPTIONAL indicating if it is a 'column' or a 'row' annotation, only set this if providing data argument to override default

Value

function or ComplexHeatmap AnnotationFunction object

Examples

library(ComplexHeatmap)
set.seed(123)
fakeData <- as.data.frame.matrix(matrix(rnorm(500, 10, 3), ncol = 10))
names(fakeData) <- paste0("var_", 1:10)

# draw the histograms without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(
  anno_var_hist(data = fakeData, vars = names(fakeData), which = "row")
)

grid.newpage()
pushViewport(vp)
draw(
  anno_var_hist(
    data = fakeData, fun = sqrt,
    vars = rev(names(fakeData)), n_breaks = 5,
    which = "column", gp = grid::gpar(fill = 2:6, lwd = c(0.9, 2.5))
  )
)
library(ComplexHeatmap)
set.seed(123)
fakeData <- as.data.frame.matrix(matrix(rnorm(500, 10, 3), ncol = 10))
names(fakeData) <- paste0("var_", 1:10)

# draw the histograms without a heatmap, you will never normally do this!
vp <- viewport(width = 0.75, height = 0.75)
grid.newpage()
pushViewport(vp)
draw(
  anno_var_hist(data = fakeData, vars = names(fakeData), which = "row")
)

grid.newpage()
pushViewport(vp)
draw(
  anno_var_hist(
    data = fakeData, fun = sqrt,
    vars = rev(names(fakeData)), n_breaks = 5,
    which = "column", gp = grid::gpar(fill = 2:6, lwd = c(0.9, 2.5))
  )
)

Plot (grouped and ordered) compositional barplots

Description

Stacked barplots showing composition of phyloseq samples for a specified number of coloured taxa. Normally your phyloseq object should contain counts data, as by default comp_barplot() performs the "compositional" taxa transformation for you, and requires count input for some sample_order methods!

Usage

comp_barplot(
  ps,
  tax_level,
  n_taxa = 8,
  tax_order = sum,
  merge_other = TRUE,
  taxon_renamer = function(x) identity(x),
  sample_order = "bray",
  order_with_all_taxa = FALSE,
  label = "SAMPLE",
  group_by = NA,
  facet_by = NA,
  bar_width = 1,
  bar_outline_colour = "grey5",
  bar_outline_width = 0.1,
  palette = distinct_palette(n_taxa),
  tax_transform_for_ordering = "identity",
  tax_transform_for_plot = "compositional",
  seriate_method = "OLO_ward",
  keep_all_vars = TRUE,
  interactive = FALSE,
  max_taxa = 10000,
  other_name = "Other",
  x = "SAMPLE",
  counts_warn = TRUE,
  ...
)
comp_barplot(
  ps,
  tax_level,
  n_taxa = 8,
  tax_order = sum,
  merge_other = TRUE,
  taxon_renamer = function(x) identity(x),
  sample_order = "bray",
  order_with_all_taxa = FALSE,
  label = "SAMPLE",
  group_by = NA,
  facet_by = NA,
  bar_width = 1,
  bar_outline_colour = "grey5",
  bar_outline_width = 0.1,
  palette = distinct_palette(n_taxa),
  tax_transform_for_ordering = "identity",
  tax_transform_for_plot = "compositional",
  seriate_method = "OLO_ward",
  keep_all_vars = TRUE,
  interactive = FALSE,
  max_taxa = 10000,
  other_name = "Other",
  x = "SAMPLE",
  counts_warn = TRUE,
  ...
)

Arguments

`ps`	phyloseq object
`tax_level`	taxonomic aggregation level (from rank_names(ps))
`n_taxa`	how many taxa to show distinct colours for (all others grouped into "Other")
`tax_order`	order of taxa within the bars, either a function for tax_sort (e.g. sum), or a vector of (all) taxa names at tax_level to set order manually
`merge_other`	if FALSE, taxa coloured/filled as "other" remain distinct, and so can have bar outlines drawn around them
`taxon_renamer`	function that takes taxon names and returns modified names for legend
`sample_order`	vector of sample names; or any distance measure in dist_calc that doesn't require phylogenetic tree; or "asis" for the current order as is returned by phyloseq::sample_names(ps)
`order_with_all_taxa`	if TRUE, this will always use all taxa (not just the top n_taxa) to calculate any distances needed for sample ordering
`label`	name of variable to use for labelling samples, or "SAMPLE" for sample names
`group_by`	splits dataset by this variable (must be categorical) resulting in a list of plots, one for each level of the group_by variable.
`facet_by`	facets plots by this variable (must be categorical). If group_by is also set the faceting will occur separately in the plot for each group.
`bar_width`	default 1 avoids random gapping otherwise seen with many samples (set to less than 1 to introduce gaps between samples)
`bar_outline_colour`	line colour separating taxa and samples (use NA for no outlines)
`bar_outline_width`	width of line separating taxa and samples (for no outlines set bar_outline_colour = NA)
`palette`	palette for taxa fill colours
`tax_transform_for_ordering`	transformation of taxa values used before ordering samples by similarity
`tax_transform_for_plot`	default "compositional" draws proportions of total counts per sample, but you could reasonably use another transformation, e.g. "identity", if you have truly quantitative microbiome profiling data
`seriate_method`	name of any ordering method suitable for distance matrices (see ?seriation::seriate)
`keep_all_vars`	FALSE may speed up internal melting with ps_melt for large phyloseq objects but TRUE is required for some post-hoc plot customisation
`interactive`	creates plot suitable for use with ggiraph
`max_taxa`	maximum distinct taxa groups to show (only really useful for limiting complexity of interactive plots e.g. within ord_explore)
`other_name`	name for other taxa after N
`x`	name of variable to use as x aesthetic: it probably only makes sense to change this when also using facets (check only one sample is represented per bar!)
`counts_warn`	should a warning be issued if counts are unavailable? TRUE, FALSE, or "error" (passed to ps_get)
`...`	extra arguments passed to facet_wrap() (if facet_by is not NA)

Details

sample_order: Either specify a list of sample names to order manually, or the bars/samples can/will be sorted by similarity, according to a specified distance measure (default 'bray'-curtis),
seriate_method specifies a seriation/ordering algorithm (default Ward hierarchical clustering with optimal leaf ordering, see seriation::list_seriation_methods())
group_by: You can group the samples on distinct plots by levels of a variable in the phyloseq object. The list of ggplots produced can be arranged flexibly with the patchwork package functions. If you want to group by several variables you can create an interaction variable with interaction(var1, var2) in the phyloseq sample_data BEFORE using comp_barplot.
facet_by can allow faceting of your plot(s) by a grouping variable. Using this approach is less flexible than using group_by but means you don't have to arrange a list of plots yourself like with the group_by argument. Using facet_by is equivalent to adding a call to facet_wrap(facets = facet_by, scales = "free") to your plot(s). Calling facet_wrap() yourself is itself a more flexible option as you can add other arguments like the number of rows etc. However you must use keep_all_vars = TRUE if you will add faceting manually.
bar_width: No gaps between bars, unless you want them (decrease width argument to add gaps between bars).
bar_outline_colour: Bar outlines default to "grey5" for almost black outlines. Use NA if you don't want outlines.
merge_other: controls whether bar outlines can be drawn around individual (lower abundance) taxa that are grouped in "other" category. If you want to see the diversity of taxa in "other" use merge_taxa = FALSE, or use TRUE if you prefer the cleaner merged look
palette: Default colouring is consistent across multiple plots if created with the group_by argument, and the defaults scheme retains the colouring of the most abundant taxa irrespective of n_taxa

Value

ggplot or list of harmonised ggplots

Examples

library(ggplot2)
data(dietswap, package = "microbiome")

# illustrative simple customised example
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(
    tax_level = "Family", n_taxa = 8,
    bar_outline_colour = NA,
    sample_order = "bray",
    bar_width = 0.7,
    taxon_renamer = toupper
  ) + coord_flip()

# change colour palette with the distinct_palette() function
# remember to set the number of colours to the same as n_taxa argument!
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(
    tax_level = "Family", n_taxa = 8,
    bar_outline_colour = NA,
    sample_order = "bray",
    bar_width = 0.7,
    palette = distinct_palette(8, pal = "kelly"),
    taxon_renamer = toupper
  ) + coord_flip()

# Order samples by the value of one of more sample_data variables.
# Use ps_arrange and set sample_order = "default" in comp_barplot.
# ps_mutate is also used here to create an informative variable for axis labelling
dietswap %>%
  ps_mutate(subject_timepoint = interaction(subject, timepoint)) %>%
  ps_filter(nationality == "AAM", group == "DI", sex == "female") %>%
  ps_arrange(desc(subject), desc(timepoint)) %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12,
    sample_order = "default",
    bar_width = 0.7,
    bar_outline_colour = "black",
    order_with_all_taxa = TRUE,
    label = "subject_timepoint"
  ) + coord_flip()

# Order taxa differently:
# By default, taxa are ordered by total sum across all samples
# You can set a different function for calculating the order, e.g. median
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(tax_level = "Genus", tax_order = median) +
  coord_flip()

# Or you can set the taxa order up front, with tax_sort() and use it as is
dietswap %>%
  ps_filter(timepoint == 1) %>%
  tax_sort(at = "Genus", by = sum) %>%
  comp_barplot(tax_level = "Genus", tax_order = "asis") +
  coord_flip()

# how many taxa are in those light grey "other" bars?
# set merge_other to find out (& remember to set a bar_outline_colour)
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, merge_other = FALSE, bar_outline_colour = "grey50",
  ) +
  coord_flip()


# Often to compare groups, average compositions are presented
p1 <- phyloseq::merge_samples(dietswap, group = "group") %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12,
    sample_order = c("ED", "HE", "DI"),
    bar_width = 0.8
  ) +
  coord_flip() + labs(x = NULL, y = NULL)
p1

# However that "group-averaging" approach hides a lot of within-group variation
p2 <- comp_barplot(dietswap,
  tax_level = "Genus", n_taxa = 12, group_by = "group",
  sample_order = "euclidean", bar_outline_colour = NA
) %>%
  patchwork::wrap_plots(nrow = 3, guides = "collect") &
  coord_flip() & labs(x = NULL, y = NULL) &
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
p2

# Only from p2 you can see that the apparently higher average relative abundance
# of Oscillospira in group DI is probably driven largely by a subgroup
# of DI samples with relatively high Oscillospira.

# make a list of 2 harmonised composition plots (grouped by sex)
p <- comp_barplot(dietswap,
  n_taxa = 15, tax_level = "Genus",
  bar_outline_colour = "black", merge_other = TRUE,
  sample_order = "aitchison", group_by = "sex"
)

# plot them side by side with patchwork package
patch <- patchwork::wrap_plots(p, ncol = 2, guides = "collect")
patch & coord_flip() # make bars in all plots horizontal (note: use & instead of +)

# beautifying tweak #
# modify one plot in place (flip the order of the samples in the 2nd plot)
# notice that the scaling is for the x-axis
# (that's because coord_flip is used afterwards when displaying the plots
patch[[2]] <- patch[[2]] + scale_x_discrete(limits = rev)
# Explainer: rev() function takes current limits and reverses them.
# You could also pass a completely arbitrary order, naming all samples

# you can theme all plots with the & operator
patch & coord_flip() &
  theme(axis.text.y = element_text(size = 5), legend.text = element_text(size = 6))
# See https://patchwork.data-imaginist.com/index.html
library(ggplot2)
data(dietswap, package = "microbiome")

# illustrative simple customised example
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(
    tax_level = "Family", n_taxa = 8,
    bar_outline_colour = NA,
    sample_order = "bray",
    bar_width = 0.7,
    taxon_renamer = toupper
  ) + coord_flip()

# change colour palette with the distinct_palette() function
# remember to set the number of colours to the same as n_taxa argument!
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(
    tax_level = "Family", n_taxa = 8,
    bar_outline_colour = NA,
    sample_order = "bray",
    bar_width = 0.7,
    palette = distinct_palette(8, pal = "kelly"),
    taxon_renamer = toupper
  ) + coord_flip()

# Order samples by the value of one of more sample_data variables.
# Use ps_arrange and set sample_order = "default" in comp_barplot.
# ps_mutate is also used here to create an informative variable for axis labelling
dietswap %>%
  ps_mutate(subject_timepoint = interaction(subject, timepoint)) %>%
  ps_filter(nationality == "AAM", group == "DI", sex == "female") %>%
  ps_arrange(desc(subject), desc(timepoint)) %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12,
    sample_order = "default",
    bar_width = 0.7,
    bar_outline_colour = "black",
    order_with_all_taxa = TRUE,
    label = "subject_timepoint"
  ) + coord_flip()

# Order taxa differently:
# By default, taxa are ordered by total sum across all samples
# You can set a different function for calculating the order, e.g. median
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(tax_level = "Genus", tax_order = median) +
  coord_flip()

# Or you can set the taxa order up front, with tax_sort() and use it as is
dietswap %>%
  ps_filter(timepoint == 1) %>%
  tax_sort(at = "Genus", by = sum) %>%
  comp_barplot(tax_level = "Genus", tax_order = "asis") +
  coord_flip()

# how many taxa are in those light grey "other" bars?
# set merge_other to find out (& remember to set a bar_outline_colour)
dietswap %>%
  ps_filter(timepoint == 1) %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, merge_other = FALSE, bar_outline_colour = "grey50",
  ) +
  coord_flip()


# Often to compare groups, average compositions are presented
p1 <- phyloseq::merge_samples(dietswap, group = "group") %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12,
    sample_order = c("ED", "HE", "DI"),
    bar_width = 0.8
  ) +
  coord_flip() + labs(x = NULL, y = NULL)
p1

# However that "group-averaging" approach hides a lot of within-group variation
p2 <- comp_barplot(dietswap,
  tax_level = "Genus", n_taxa = 12, group_by = "group",
  sample_order = "euclidean", bar_outline_colour = NA
) %>%
  patchwork::wrap_plots(nrow = 3, guides = "collect") &
  coord_flip() & labs(x = NULL, y = NULL) &
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
p2

# Only from p2 you can see that the apparently higher average relative abundance
# of Oscillospira in group DI is probably driven largely by a subgroup
# of DI samples with relatively high Oscillospira.

# make a list of 2 harmonised composition plots (grouped by sex)
p <- comp_barplot(dietswap,
  n_taxa = 15, tax_level = "Genus",
  bar_outline_colour = "black", merge_other = TRUE,
  sample_order = "aitchison", group_by = "sex"
)

# plot them side by side with patchwork package
patch <- patchwork::wrap_plots(p, ncol = 2, guides = "collect")
patch & coord_flip() # make bars in all plots horizontal (note: use & instead of +)

# beautifying tweak #
# modify one plot in place (flip the order of the samples in the 2nd plot)
# notice that the scaling is for the x-axis
# (that's because coord_flip is used afterwards when displaying the plots
patch[[2]] <- patch[[2]] + scale_x_discrete(limits = rev)
# Explainer: rev() function takes current limits and reverses them.
# You could also pass a completely arbitrary order, naming all samples

# you can theme all plots with the & operator
patch & coord_flip() &
  theme(axis.text.y = element_text(size = 5), legend.text = element_text(size = 6))
# See https://patchwork.data-imaginist.com/index.html

Draw heatmap of microbiome composition across samples

Description

Heatmap made with ComplexHeatmap::Heatmap(), with optional annotation of taxa prevalence/abundance, and/or other sample data.

Transform your data with tax_transform() prior to plotting (and/or scale with tax_scale()).

See the heatmaps vignette for more examples of use.

Plotting "compositional" data can give an idea of the dominant taxa in each sample. Plotting some form of log or clr transformed (or scaled) microbial features can highlight other patterns.

The data will be ordered via your selected seriation methods and distances on either the transformed data (default) or the original count data (or with any other transformation).

Any cell numbers printed can be transformed independently of the colour scheme, and do not affect ordering.

Usage

comp_heatmap(
  data,
  taxa = NA,
  taxa_side = "right",
  tax_anno = NULL,
  taxon_renamer = identity,
  samples = NA,
  sample_side = adjacent_side(taxa_side),
  sample_anno = NULL,
  sample_names_show = FALSE,
  colors = heat_palette(palette = "Rocket", rev = TRUE),
  numbers = NULL,
  sample_seriation = "OLO_ward",
  sample_ser_dist = "euclidean",
  sample_ser_counts = !sample_ser_dist %in% c("euclidean", "maximum", "manhattan",
    "canberra", "binary"),
  sample_ser_trans = NULL,
  tax_seriation = "OLO_ward",
  tax_ser_dist = "euclidean",
  tax_ser_counts = FALSE,
  tax_ser_trans = NULL,
  numbers_trans = NULL,
  numbers_zero_replace = 0,
  numbers_use_counts = TRUE,
  grid_col = "white",
  grid_lwd = 0.5,
  name = "Abd.",
  anno_tax = NULL,
  ...
)
comp_heatmap(
  data,
  taxa = NA,
  taxa_side = "right",
  tax_anno = NULL,
  taxon_renamer = identity,
  samples = NA,
  sample_side = adjacent_side(taxa_side),
  sample_anno = NULL,
  sample_names_show = FALSE,
  colors = heat_palette(palette = "Rocket", rev = TRUE),
  numbers = NULL,
  sample_seriation = "OLO_ward",
  sample_ser_dist = "euclidean",
  sample_ser_counts = !sample_ser_dist %in% c("euclidean", "maximum", "manhattan",
    "canberra", "binary"),
  sample_ser_trans = NULL,
  tax_seriation = "OLO_ward",
  tax_ser_dist = "euclidean",
  tax_ser_counts = FALSE,
  tax_ser_trans = NULL,
  numbers_trans = NULL,
  numbers_zero_replace = 0,
  numbers_use_counts = TRUE,
  grid_col = "white",
  grid_lwd = 0.5,
  name = "Abd.",
  anno_tax = NULL,
  ...
)

Arguments

`data`	phyloseq or phyloseq extra
`taxa`	list of taxa to include, or NA for all
`taxa_side`	"top"/"right"/"bottom"/"left": controls heatmap orientation and where any annotations specified in tax_anno are placed
`tax_anno`	NULL or annotation function for taxa: taxAnnotation() output.
`taxon_renamer`	function to rename taxa before plotting
`samples`	list of samples to include on plot
`sample_side`	which side to show any sample annotation on, must be adjacent to taxa_side
`sample_anno`	NULL or annotation function for samples: sampleAnnotation() output.
`sample_names_show`	show sample names? (you can control side and rotation of names with other ComplexHeatmap::Heatmap arguments)
`colors`	output of heat_palette() to set heatmap fill color scheme
`numbers`	output of heat_numbers() to draw numbers on heatmap cells
`sample_seriation`	name of method used to order the samples (from seriation::seriate)
`sample_ser_dist`	name of distance to use with sample_seriation method (if needed)
`sample_ser_counts`	insist on using count data for sample seriation?
`sample_ser_trans`	function for transformation of data used for sample seriation (such as a call to `tax_transform()`)
`tax_seriation`	name of method used to order the taxa (from seriation::seriate)
`tax_ser_dist`	name of distance to use with tax_seriation method (if needed)
`tax_ser_counts`	insist on using count data for taxa seriation?
`tax_ser_trans`	function for transformation of data used for taxa seriation (such as a call to `tax_transform()`)
`numbers_trans`	name of tax_transform transformation, or a function for transformation of data used for drawing numbers on cells
`numbers_zero_replace`	zero replacement method used if named transformation given to number_trans
`numbers_use_counts`	insist on using count data for number drawing? (if TRUE, any numbers_trans transformation would be applied to count data)
`grid_col`	colour of gridlines, or NA for none
`grid_lwd`	width of gridlines
`name`	used as legend title (colourbar)
`anno_tax`	DEPRECATED: optional annotation of taxa distributions: tax_anno() list output, or a pre-made ComplexHeatmap HeatmapAnnotation
`...`	Arguments passed on to `ComplexHeatmap::Heatmap` `row_dend_side` Should the row dendrogram be put on the left or right of the heatmap? `row_dend_width` Width of the row dendrogram, should be a `unit` object. `show_row_dend` Whether show row dendrogram? `row_dend_gp` Graphic parameters for the dendrogram segments. If users already provide a `dendrogram` object with edges rendered, this argument will be ignored. `show_row_names` Whether show row names. `row_names_gp` Graphic parameters for row names. `row_names_rot` Rotation of row names. `row_names_centered` Should row names put centered?

Examples

library(dplyr)
data("dietswap", package = "microbiome")
# create a couple of numerical variables to use
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
psq <- tax_filter(psq, min_prevalence = 1 / 10, min_sample_abundance = 1 / 10)
psq <- tax_agg(psq, "Genus")

# randomly select 20 taxa from the 40 top taxa, and 40 random samples

set.seed(123)
taxa <- sample(tax_top(psq, n = 40), size = 20)
samples <- sample(1:122, size = 40)

comp_heatmap(data = psq, taxa = taxa, samples = samples)

# transforming taxon abundances #

# NOTE: if you plan on transforming taxa (e.g. to compositional data or clr)
# but only want to plot a subset of the taxa (e.g. most abundant)
# you should NOT subset the original phyloseq before transformation!
# Instead, choose the subset of taxa plotted with:

# Note 2, choose a symmetrical palette for clr-transformed data
psq %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  comp_heatmap(
    taxa = taxa, samples = samples, colors = heat_palette(sym = TRUE)
  )

# Almost all the taxa have high values (>> 0) because they are a highly
# abundant subset taken after clr transformation was calculated on all taxa

# See how just taking the first 30 taxa from the dataset gives more balance
psq %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  comp_heatmap(
    taxa = 1:30, samples = samples, colors = heat_palette(sym = TRUE)
  )

# annotating taxa #

# Notes:
# - Unlike cor_heatmap, taxa are not annotated by default
# - Detection threshold set to 50 as HITchip example data seems to have background noise

comp_heatmap(
  data = psq, taxa = taxa, samples = samples,
  tax_anno = taxAnnotation(Prev = anno_tax_prev(undetected = 50))
)

# annotating samples #

htmp <- psq %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  comp_heatmap(
    taxa = taxa, samples = samples, colors = heat_palette(sym = TRUE),
    sample_anno = sampleAnnotation(
      Nation. = anno_sample_cat("nationality", legend_title = "Nation.")
    )
  )
htmp

# legends from `anno_sample_cat()` are stored as an attribute of the Heatmap
ComplexHeatmap::draw(
  object = htmp,
  annotation_legend_list = attr(htmp, "AnnoLegends"), merge_legends = TRUE
)
library(dplyr)
data("dietswap", package = "microbiome")
# create a couple of numerical variables to use
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
psq <- tax_filter(psq, min_prevalence = 1 / 10, min_sample_abundance = 1 / 10)
psq <- tax_agg(psq, "Genus")

# randomly select 20 taxa from the 40 top taxa, and 40 random samples

set.seed(123)
taxa <- sample(tax_top(psq, n = 40), size = 20)
samples <- sample(1:122, size = 40)

comp_heatmap(data = psq, taxa = taxa, samples = samples)

# transforming taxon abundances #

# NOTE: if you plan on transforming taxa (e.g. to compositional data or clr)
# but only want to plot a subset of the taxa (e.g. most abundant)
# you should NOT subset the original phyloseq before transformation!
# Instead, choose the subset of taxa plotted with:

# Note 2, choose a symmetrical palette for clr-transformed data
psq %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  comp_heatmap(
    taxa = taxa, samples = samples, colors = heat_palette(sym = TRUE)
  )

# Almost all the taxa have high values (>> 0) because they are a highly
# abundant subset taken after clr transformation was calculated on all taxa

# See how just taking the first 30 taxa from the dataset gives more balance
psq %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  comp_heatmap(
    taxa = 1:30, samples = samples, colors = heat_palette(sym = TRUE)
  )

# annotating taxa #

# Notes:
# - Unlike cor_heatmap, taxa are not annotated by default
# - Detection threshold set to 50 as HITchip example data seems to have background noise

comp_heatmap(
  data = psq, taxa = taxa, samples = samples,
  tax_anno = taxAnnotation(Prev = anno_tax_prev(undetected = 50))
)

# annotating samples #

htmp <- psq %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  comp_heatmap(
    taxa = taxa, samples = samples, colors = heat_palette(sym = TRUE),
    sample_anno = sampleAnnotation(
      Nation. = anno_sample_cat("nationality", legend_title = "Nation.")
    )
  )
htmp

# legends from `anno_sample_cat()` are stored as an attribute of the Heatmap
ComplexHeatmap::draw(
  object = htmp,
  annotation_legend_list = attr(htmp, "AnnoLegends"), merge_legends = TRUE
)

Microbe-to-sample-data correlation heatmap

Description

Plot correlations between (transformed) microbial abundances and (selected) numeric-like sample_data variables from a phyloseq object.

Lots of customisation options available through the listed arguments, and you can pass any other argument from ComplexHeatmap::Heatmap() too.

Usage

cor_heatmap(
  data,
  taxa = NA,
  tax_anno = taxAnnotation(Prev. = anno_tax_prev(), Abun. = anno_tax_box()),
  taxon_renamer = identity,
  vars = NA,
  var_anno = NULL,
  cor = c("pearson", "kendall", "spearman"),
  cor_use = "everything",
  colors = heat_palette(palette = "Blue-Red 2", sym = TRUE),
  numbers = heat_numbers(decimals = 1, col = "black", fontface = "plain"),
  taxa_side = "right",
  vars_side = adjacent_side(taxa_side),
  seriation_method = "OLO_ward",
  seriation_dist = "euclidean",
  seriation_method_col = seriation_method,
  seriation_dist_col = seriation_dist,
  var_fun = "identity",
  grid_col = "white",
  grid_lwd = 0.5,
  anno_tax = NULL,
  anno_vars = NULL,
  ...
)
cor_heatmap(
  data,
  taxa = NA,
  tax_anno = taxAnnotation(Prev. = anno_tax_prev(), Abun. = anno_tax_box()),
  taxon_renamer = identity,
  vars = NA,
  var_anno = NULL,
  cor = c("pearson", "kendall", "spearman"),
  cor_use = "everything",
  colors = heat_palette(palette = "Blue-Red 2", sym = TRUE),
  numbers = heat_numbers(decimals = 1, col = "black", fontface = "plain"),
  taxa_side = "right",
  vars_side = adjacent_side(taxa_side),
  seriation_method = "OLO_ward",
  seriation_dist = "euclidean",
  seriation_method_col = seriation_method,
  seriation_dist_col = seriation_dist,
  var_fun = "identity",
  grid_col = "white",
  grid_lwd = 0.5,
  anno_tax = NULL,
  anno_vars = NULL,
  ...
)

Arguments

`data`	phyloseq or phyloseq extra
`taxa`	list of taxa to include, or NA for all
`tax_anno`	NULL or annotation function for taxa: taxAnnotation() output.
`taxon_renamer`	function to rename taxa before plotting
`vars`	selection of variable names from sample_data
`var_anno`	NULL or annotation function for variables: varAnnotation() output.
`cor`	correlation coefficient. pearson/kendall/spearman, can be abbreviated (used as legend title)
`cor_use`	passed to cor(use = cor_use)
`colors`	output of heat_palette() to set heatmap fill color scheme
`numbers`	output of heat_numbers() to draw numbers on heatmap cells
`taxa_side`	"top"/"right"/"bottom"/"left": controls heatmap orientation and where any annotations specified in tax_anno are placed
`vars_side`	which side to place any variable annotations specified in var_anno, must be an adjacent side to taxa_side
`seriation_method`	method to order the rows (in seriation::seriate)
`seriation_dist`	distance to use in seriation_method (if needed)
`seriation_method_col`	method to order the columns (in seriation::seriate)
`seriation_dist_col`	distance to use in seriation_method_col (if needed)
`var_fun`	a function (or name of) to be applied to columns of a matrix of vars before correlating (but not used in any variable annotations)
`grid_col`	colour of gridlines, or NA for none
`grid_lwd`	width of gridlines
`anno_tax`	DEPRECATED: optional annotation of taxa distributions: tax_anno() list output, or a pre-made ComplexHeatmap HeatmapAnnotation
`anno_vars`	DEPRECATED: use var_anno argument instead. Optional annotation of variable distributions: var_anno() list output, or a pre-made ComplexHeatmap HeatmapAnnotation
`...`	Arguments passed on to `ComplexHeatmap::Heatmap` `row_dend_width` Width of the row dendrogram, should be a `unit` object. `show_row_dend` Whether show row dendrogram? `row_dend_gp` Graphic parameters for the dendrogram segments. If users already provide a `dendrogram` object with edges rendered, this argument will be ignored. `show_row_names` Whether show row names. `row_names_gp` Graphic parameters for row names. `row_names_rot` Rotation of row names. `row_names_centered` Should row names put centered? `show_heatmap_legend` Whether show heatmap legend?

Details

Using a data.frame for the data argument is also possible, in which case the (selected) numeric-like variables will be correlated with each other, and all arguments relating to taxa will be ignored.

Examples

library(dplyr)
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
psq <- tax_filter(psq, min_prevalence = 1 / 10, min_sample_abundance = 1 / 10)
psq <- tax_agg(psq, "Genus")

# randomly select 20 taxa from the 50 most abundant taxa
set.seed(123)
taxa <- sample(tax_top(psq, n = 50), size = 20)

# NOTE: detection threshold set to 50 as HITchip example data seems to have background noise
ud <- 50

# make simple correlation heatmap with all numeric-like variables
cor_heatmap(
  data = psq, taxa = taxa,
  tax_anno = taxAnnotation(
    Prv. = anno_tax_prev(undetected = ud),
    Abd. = anno_tax_box(undetected = ud)
  )
)

# You can create an annotation object separately in advance
taxAnno <- taxAnnotation(
  Prv. = anno_tax_prev(undetected = ud), Abd. = anno_tax_box(undetected = ud)
)
class(taxAnno) # "function"

# You can select which numeric-like variables to correlate taxa with
cor_heatmap(
  psq, taxa,
  vars = c("african", "female", "weight"), tax_anno = taxAnno
)

# Also you can choose alternative correlation measures
cor_heatmap(psq, taxa, cor = "spearman", tax_anno = taxAnno)

# Annotating variables is possible, and easy with varAnnotation()
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  var_anno = varAnnotation(Val. = anno_var_box(size = grid::unit(2, "cm")))
)

# you can transform the variables before correlating by var_fun
# notice this does not affect the data used for annotations
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno, var_fun = "exp",
  var_anno = varAnnotation(Val. = anno_var_box(size = grid::unit(2, "cm")))
)

# other and multiple annotations
cor_heatmap(
  data = psq, taxa = taxa[1:10], vars = c("african", "weight", "female"),
  tax_anno = taxAnno,
  var_anno = varAnnotation(
    value = anno_var_hist(size = grid::unit(15, "mm")),
    log10p = anno_var_box(function(x) log10(x + 1))
  )
)

# make the same heatmap, but rotated
cor_heatmap(
  data = psq, taxa = taxa[1:10], vars = c("african", "weight", "female"),
  tax_anno = taxAnno, taxa_side = "top",
  var_anno = varAnnotation(
    value = anno_var_hist(size = grid::unit(15, "mm")),
    log10p = anno_var_box(function(x) log10(x + 1))
  )
)

# You can change the colour scheme used, using heat_palette()
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  colors = heat_palette("Green-Orange", rev = TRUE, sym = TRUE)
)

# You can hide or change the style of the numbers with heat_numbers()
cor_heatmap(data = psq, taxa = taxa, tax_anno = taxAnno, numbers = NULL)
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  colors = heat_palette("Berlin", rev = TRUE, sym = TRUE),
  numbers = heat_numbers(decimals = 2, col = "white", fontface = "bold")
)

# You can hide or change the style of the grid lines with grid_col & grid_lwd
cor_heatmap(psq, taxa = taxa, tax_anno = taxAnno, grid_col = NA) # hidden
cor_heatmap(psq, taxa = taxa, tax_anno = taxAnno, grid_lwd = 3) # bigger

# You can pass any other argument from `ComplexHeatmap::Heatmap()` to `...`

# e.g. You can set the absolute width and height of the heatmap body
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  width = grid::unit(40, "mm"), height = grid::unit(10, "cm")
)

# e.g. You can suppress the legend
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno, show_heatmap_legend = FALSE,
  width = grid::unit(40, "mm"), height = grid::unit(10, "cm")
)
library(dplyr)
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
psq <- tax_filter(psq, min_prevalence = 1 / 10, min_sample_abundance = 1 / 10)
psq <- tax_agg(psq, "Genus")

# randomly select 20 taxa from the 50 most abundant taxa
set.seed(123)
taxa <- sample(tax_top(psq, n = 50), size = 20)

# NOTE: detection threshold set to 50 as HITchip example data seems to have background noise
ud <- 50

# make simple correlation heatmap with all numeric-like variables
cor_heatmap(
  data = psq, taxa = taxa,
  tax_anno = taxAnnotation(
    Prv. = anno_tax_prev(undetected = ud),
    Abd. = anno_tax_box(undetected = ud)
  )
)

# You can create an annotation object separately in advance
taxAnno <- taxAnnotation(
  Prv. = anno_tax_prev(undetected = ud), Abd. = anno_tax_box(undetected = ud)
)
class(taxAnno) # "function"

# You can select which numeric-like variables to correlate taxa with
cor_heatmap(
  psq, taxa,
  vars = c("african", "female", "weight"), tax_anno = taxAnno
)

# Also you can choose alternative correlation measures
cor_heatmap(psq, taxa, cor = "spearman", tax_anno = taxAnno)

# Annotating variables is possible, and easy with varAnnotation()
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  var_anno = varAnnotation(Val. = anno_var_box(size = grid::unit(2, "cm")))
)

# you can transform the variables before correlating by var_fun
# notice this does not affect the data used for annotations
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno, var_fun = "exp",
  var_anno = varAnnotation(Val. = anno_var_box(size = grid::unit(2, "cm")))
)

# other and multiple annotations
cor_heatmap(
  data = psq, taxa = taxa[1:10], vars = c("african", "weight", "female"),
  tax_anno = taxAnno,
  var_anno = varAnnotation(
    value = anno_var_hist(size = grid::unit(15, "mm")),
    log10p = anno_var_box(function(x) log10(x + 1))
  )
)

# make the same heatmap, but rotated
cor_heatmap(
  data = psq, taxa = taxa[1:10], vars = c("african", "weight", "female"),
  tax_anno = taxAnno, taxa_side = "top",
  var_anno = varAnnotation(
    value = anno_var_hist(size = grid::unit(15, "mm")),
    log10p = anno_var_box(function(x) log10(x + 1))
  )
)

# You can change the colour scheme used, using heat_palette()
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  colors = heat_palette("Green-Orange", rev = TRUE, sym = TRUE)
)

# You can hide or change the style of the numbers with heat_numbers()
cor_heatmap(data = psq, taxa = taxa, tax_anno = taxAnno, numbers = NULL)
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  colors = heat_palette("Berlin", rev = TRUE, sym = TRUE),
  numbers = heat_numbers(decimals = 2, col = "white", fontface = "bold")
)

# You can hide or change the style of the grid lines with grid_col & grid_lwd
cor_heatmap(psq, taxa = taxa, tax_anno = taxAnno, grid_col = NA) # hidden
cor_heatmap(psq, taxa = taxa, tax_anno = taxAnno, grid_lwd = 3) # bigger

# You can pass any other argument from `ComplexHeatmap::Heatmap()` to `...`

# e.g. You can set the absolute width and height of the heatmap body
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno,
  width = grid::unit(40, "mm"), height = grid::unit(10, "cm")
)

# e.g. You can suppress the legend
cor_heatmap(
  data = psq, taxa = taxa, tax_anno = taxAnno, show_heatmap_legend = FALSE,
  width = grid::unit(40, "mm"), height = grid::unit(10, "cm")
)

Simple wrapper around cor.test for y ~ x style formula input

Description

Intended for use within the function tax_model

Usage

cor_test(formula, data, ...)
cor_test(formula, data, ...)

Arguments

`formula`	a formula in form y ~ x
`data`	dataframe
`...`	passed to cor.test

Examples

data("shao19")
ps <- shao19 %>%
  ps_filter(family_role == "mother") %>%
  tax_filter(min_prevalence = 20) %>%
  tax_agg("family")

cors <- ps %>% tax_model(
  rank = "family", variables = list("age", "number_reads"), type = cor_test
)

tax_models_get(cors)

data("shao19")
ps <- shao19 %>%
  ps_filter(family_role == "mother") %>%
  tax_filter(min_prevalence = 20) %>%
  tax_agg("family")

cors <- ps %>% tax_model(
  rank = "family", variables = list("age", "number_reads"), type = cor_test
)

tax_models_get(cors)

DEPRECATED Heatmap annotations helpers

Description

Functions to easily define ComplexHeatmap annotations for taxa and/or variables

tax_anno creates list describing taxa annotation (for cor_heatmap or comp_heatmap)
var_anno creates list describing variable annotation (for cor_heatmap)

Usage

tax_anno(
  undetected = 0,
  which = NA,
  prev = 1,
  abund = 2,
  size = 30,
  gap = 2,
  rel_sizes = NA,
  args = NULL,
  ...
)

anno_prev(
  data,
  taxa,
  undetected = 0,
  which = "row",
  size = 15,
  bar_width = 0.6,
  gp = grid::gpar(fill = "grey85"),
  ...
)

anno_abund(
  data,
  taxa,
  undetected = 0,
  which = "row",
  size = 15,
  point_size = 0.75,
  box_width = 0.6,
  gp = grid::gpar(fill = "grey85"),
  ...
)

var_anno(
  annos = "var_box",
  funs = "identity",
  names = NA,
  which = "column",
  size = 15 * length(annos),
  gap = 2,
  rel_sizes = NA,
  args = NULL,
  ...
)

old_anno_var_hist(data, vars = NA, which = "column", size = 15, ...)

old_anno_var_box(data, vars = NA, which = "column", size = 15, ...)
tax_anno(
  undetected = 0,
  which = NA,
  prev = 1,
  abund = 2,
  size = 30,
  gap = 2,
  rel_sizes = NA,
  args = NULL,
  ...
)

anno_prev(
  data,
  taxa,
  undetected = 0,
  which = "row",
  size = 15,
  bar_width = 0.6,
  gp = grid::gpar(fill = "grey85"),
  ...
)

anno_abund(
  data,
  taxa,
  undetected = 0,
  which = "row",
  size = 15,
  point_size = 0.75,
  box_width = 0.6,
  gp = grid::gpar(fill = "grey85"),
  ...
)

var_anno(
  annos = "var_box",
  funs = "identity",
  names = NA,
  which = "column",
  size = 15 * length(annos),
  gap = 2,
  rel_sizes = NA,
  args = NULL,
  ...
)

old_anno_var_hist(data, vars = NA, which = "column", size = 15, ...)

old_anno_var_box(data, vars = NA, which = "column", size = 15, ...)

Arguments

`undetected`	value above which taxa are considered present/detected in a sample
`which`	"row" or "column" annnotation
`prev`	order in which prevalence annotation shown (number, or NA to not show)
`abund`	order in which abundance annotation shown (number, or NA to not show)
`size`	total size (mm) of annotations (width/height depending on which)
`gap`	gap in mm between annotations
`rel_sizes`	relative sizes of annotations (NA for equal sizes, or same length as annos)
`args`	extra args passed to each annotation: give as list of lists (one inner list per arg, named, e.g. list(prev = list(whatever = whatever))
`...`	further named args to be passed on (to list)
`data`	phyloseq or ps-extra (or a data.frame or matrix for anno_var_* functions)
`taxa`	names of taxa to plot
`bar_width`	relative width of barchart bars
`gp`	a grid::gpar() object for graphics parameter settings like fill or lwd
`point_size`	size of outlier points in mm
`box_width`	relative width of boxplot boxes
`annos`	name(s) of annotation(s) to show, in order (e.g. 'var_box', 'var_hist')
`funs`	function(s) to transform matrix of variable values before plotting (length must be 1 or same length as annos)
`names`	names to use for each annotation in annos
`vars`	names of variables to plot

Wrapper for vegan::betadisper()

Description

Takes the output of dist_calc function. Or use with the result of the permanova function to ensure the results correspond to exactly the same input data. Runs betadisper for all categorical variables in variables argument. See help('betadisper', package = 'vegan').

Usage

dist_bdisp(
  data,
  variables,
  method = c("centroid", "median")[[1]],
  complete_cases = TRUE,
  verbose = TRUE
)
dist_bdisp(
  data,
  variables,
  method = c("centroid", "median")[[1]],
  complete_cases = TRUE,
  verbose = TRUE
)

Arguments

`data`	psExtra output from dist_calc
`variables`	list of variables to use as group
`method`	centroid or median
`complete_cases`	drop samples with NAs in any of the variables listed
`verbose`	sends messages about progress if true

Value

psExtra containing betadisper results

Examples

library(phyloseq)
library(vegan)
data("dietswap", package = "microbiome")

# add some missings to demonstrate automated removal
sample_data(dietswap)$sex[3:6] <- NA
# create a numeric variable to show it will be skipped with a warning
dietswap <- ps_mutate(dietswap, timepoint = as.numeric(timepoint))

# straight to the betadisp
bd1 <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("aitchison") %>%
  dist_bdisp(variables = c("sex", "bmi_group", "timepoint")) %>%
  bdisp_get()
bd1$sex
# quick vegan plotting methods
plot(bd1$sex$model, label.cex = 0.5)
boxplot(bd1$sex$model)

# compute distance and use for both permanova and dist_bdisp
testDist <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("bray")

PERM <- testDist %>%
  dist_permanova(
    variables = c("sex", "bmi_group"),
    n_processes = 1, n_perms = 99
  )
str(PERM, max.level = 1)

bd <- PERM %>% dist_bdisp(variables = c("sex", "bmi_group"))
bd
library(phyloseq)
library(vegan)
data("dietswap", package = "microbiome")

# add some missings to demonstrate automated removal
sample_data(dietswap)$sex[3:6] <- NA
# create a numeric variable to show it will be skipped with a warning
dietswap <- ps_mutate(dietswap, timepoint = as.numeric(timepoint))

# straight to the betadisp
bd1 <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("aitchison") %>%
  dist_bdisp(variables = c("sex", "bmi_group", "timepoint")) %>%
  bdisp_get()
bd1$sex
# quick vegan plotting methods
plot(bd1$sex$model, label.cex = 0.5)
boxplot(bd1$sex$model)

# compute distance and use for both permanova and dist_bdisp
testDist <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("bray")

PERM <- testDist %>%
  dist_permanova(
    variables = c("sex", "bmi_group"),
    n_processes = 1, n_perms = 99
  )
str(PERM, max.level = 1)

bd <- PERM %>% dist_bdisp(variables = c("sex", "bmi_group"))
bd

Calculate distances between pairs of samples in phyloseq object

Description

Can compute various sample-sample distances using the microbiota composition of your samples:

Bray Curtis ('bray') or any other ecological distance from phyloseq::distance() / vegan::vegdist()
UniFrac distances (using the GUniFrac package)
- generalised: 'gunifrac' (optionally set weighting alpha in gunifrac alpha)
- unweighted: 'unifrac'
- weighted: 'wunifrac'
Aitchison distance (Euclidean distance after centered log ratio transform clr, see details)
Euclidean distance

Use dist_calc with psExtra output of tax_transform (or tax_agg). It returns a psExtra object containing the phyloseq and the name of the distance used in addition to the distance matrix itself. The resulting object is intended to be piped into ord_calc or dist_permanova functions. Alternatively you can directly access the distance matrix with dist_get().

Usage

dist_calc(data, dist = "bray", gunifrac_alpha = 0.5, ...)
dist_calc(data, dist = "bray", gunifrac_alpha = 0.5, ...)

Arguments

`data`	psExtra object, e.g. output from tax_transform()
`dist`	name of distance to calculate between pairs of samples
`gunifrac_alpha`	setting alpha value only relevant if gunifrac distance used
`...`	optional distance-specific named arguments passed to phyloseq::distance()

Value

psExtra object including distance matrix and name of distance used

Aitchison distance note

You should EITHER:

skip the dist_calc function and call ord_calc(method = "PCA") directly on an object with taxa transformed with tax_transform(trans = "clr")
pass an object with untransformed (or 'identity' transformed) taxa to the data argument of dist_calc() and specify dist = "aitchison".

If ordination plots with taxon loading vectors are desired, users require option 1. If the distance matrix is required for permanova, users require option 2.

Binary Jaccard distance note

Jaccard distance can be computed on abundances, but often in microbiome research it is the Binary Jaccard distance that is desired. So remember to first perform a "binary" transformation with tax_transform("binary"), OR pass an additional argument to dist_calc("jaccard", binary = TRUE)

Examples

# bray curtis distance on genera-level features
data("dietswap", package = "microbiome")
bc <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("bray")
bc
class(bc)

# gunifrac distance using phyloseq input
data("esophagus", package = "phyloseq")
gunifrac <- esophagus %>%
  dist_calc("gunifrac") %>%
  dist_get()
class(gunifrac)
# bray curtis distance on genera-level features
data("dietswap", package = "microbiome")
bc <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("bray")
bc
class(bc)

# gunifrac distance using phyloseq input
data("esophagus", package = "phyloseq")
gunifrac <- esophagus %>%
  dist_calc("gunifrac") %>%
  dist_get()
class(gunifrac)

Calculate distances between sequential samples in ps_extra/phyloseq object

Description

Calculate distances between sequential samples in ps_extra/phyloseq object

Usage

dist_calc_seq(
  data,
  dist,
  group,
  seq,
  unequal = "warn",
  start_value = NaN,
  return = "data",
  var_name = paste0(dist, "_DistFromLast")
)
dist_calc_seq(
  data,
  dist,
  group,
  seq,
  unequal = "warn",
  start_value = NaN,
  return = "data",
  var_name = paste0(dist, "_DistFromLast")
)

Arguments

`data`	psExtra object, e.g. output from tax_transform()
`dist`	name of distance to calculate between pairs of sequential samples
`group`	name of variable in phyloseq sample_data used to define groups of samples
`seq`	name of variable in phyloseq sample_data used to define order of samples within groups
`unequal`	"error" or "warn" or "ignore" if groups of samples, defined by group argument, are of unequal size
`start_value`	value returned for the first sample in each group, which has no preceding sample in the group's sequence, and so has no obvious value
`return`	format of return object: "data" returns psExtra with sorted samples and additional variable. "vector" returns only named vector of sequential distances.
`var_name`	name of variable created in psExtra if return arg = "data"

Value

psExtra object sorted and with new sequential distance variable or a named vector of that variable

Examples

library(ggplot2)
library(dplyr)
data("dietswap", package = "microbiome")

pseq <- dietswap %>%
  tax_transform("identity", rank = "Genus") %>%
  dist_calc_seq(
    dist = "aitchison", group = "subject", seq = "timepoint",
    # group sizes are unequal because some subjects are missing a timepoint
    unequal = "ignore"
  )

pseq %>%
  samdat_tbl() %>%
  dplyr::select(1, subject, timepoint, dplyr::last_col())

# ggplot heatmap - unsorted
pseq %>%
  samdat_tbl() %>%
  filter(timepoint != 1) %>%
  ggplot(aes(x = timepoint, y = subject)) +
  geom_tile(aes(fill = aitchison_DistFromLast)) +
  scale_fill_viridis_c(na.value = NA, name = "dist") +
  theme_minimal(base_line_size = NA) +
  scale_y_discrete(limits = rev(levels(samdat_tbl(pseq)$subject)))

# ComplexHeatmap plotting with clustering #
library(tidyr)
library(ComplexHeatmap)

# make data matrix
heatmat <- pseq %>%
  samdat_tbl() %>%
  filter(timepoint != 1) %>%
  pivot_wider(
    id_cols = subject,
    names_from = timepoint, names_prefix = "t",
    values_from = aitchison_DistFromLast
  ) %>%
  tibble::column_to_rownames("subject")

heatmat <- as.matrix.data.frame(heatmat)

heatmap <- Heatmap(
  name = "dist",
  matrix = heatmat, col = viridisLite::viridis(12), na_col = "white",
  cluster_columns = FALSE,
  cluster_rows = hclust(dist(heatmat), method = "ward.D"),
  width = unit(1.5, "in"), rect_gp = gpar(col = "black"),
  row_names_side = "left", row_names_gp = gpar(fontsize = 8)
)
heatmap

# comparison with subject tracking on PCA
pca <- pseq %>%
  # already sorted data
  dist_calc("aitchison") %>%
  ord_calc("PCoA") %>%
  ord_plot(alpha = 0.1, shape = "nationality", size = 2) %>%
  add_paths(
    mapping = aes(colour = subject, alpha = timepoint, size = timepoint),
    id_var = "subject", id_values = c(
      "eve", "hsf", # low variation
      "vem", # medium
      "ufm", # high variation
      "pku" # starts high
    )
  ) +
  scale_alpha_continuous(range = c(0.3, 1), breaks = c(2, 4, 6)) +
  scale_size_continuous(range = c(1, 2), breaks = c(2, 4, 6))

heatmap
pca
library(ggplot2)
library(dplyr)
data("dietswap", package = "microbiome")

pseq <- dietswap %>%
  tax_transform("identity", rank = "Genus") %>%
  dist_calc_seq(
    dist = "aitchison", group = "subject", seq = "timepoint",
    # group sizes are unequal because some subjects are missing a timepoint
    unequal = "ignore"
  )

pseq %>%
  samdat_tbl() %>%
  dplyr::select(1, subject, timepoint, dplyr::last_col())

# ggplot heatmap - unsorted
pseq %>%
  samdat_tbl() %>%
  filter(timepoint != 1) %>%
  ggplot(aes(x = timepoint, y = subject)) +
  geom_tile(aes(fill = aitchison_DistFromLast)) +
  scale_fill_viridis_c(na.value = NA, name = "dist") +
  theme_minimal(base_line_size = NA) +
  scale_y_discrete(limits = rev(levels(samdat_tbl(pseq)$subject)))

# ComplexHeatmap plotting with clustering #
library(tidyr)
library(ComplexHeatmap)

# make data matrix
heatmat <- pseq %>%
  samdat_tbl() %>%
  filter(timepoint != 1) %>%
  pivot_wider(
    id_cols = subject,
    names_from = timepoint, names_prefix = "t",
    values_from = aitchison_DistFromLast
  ) %>%
  tibble::column_to_rownames("subject")

heatmat <- as.matrix.data.frame(heatmat)

heatmap <- Heatmap(
  name = "dist",
  matrix = heatmat, col = viridisLite::viridis(12), na_col = "white",
  cluster_columns = FALSE,
  cluster_rows = hclust(dist(heatmat), method = "ward.D"),
  width = unit(1.5, "in"), rect_gp = gpar(col = "black"),
  row_names_side = "left", row_names_gp = gpar(fontsize = 8)
)
heatmap

# comparison with subject tracking on PCA
pca <- pseq %>%
  # already sorted data
  dist_calc("aitchison") %>%
  ord_calc("PCoA") %>%
  ord_plot(alpha = 0.1, shape = "nationality", size = 2) %>%
  add_paths(
    mapping = aes(colour = subject, alpha = timepoint, size = timepoint),
    id_var = "subject", id_values = c(
      "eve", "hsf", # low variation
      "vem", # medium
      "ufm", # high variation
      "pku" # starts high
    )
  ) +
  scale_alpha_continuous(range = c(0.3, 1), breaks = c(2, 4, 6)) +
  scale_size_continuous(range = c(1, 2), breaks = c(2, 4, 6))

heatmap
pca

Calculate PERMANOVA after dist_calc()

Description

dist_permanova runs a Permutational Multivariate ANOVA (aka Non-parametric MANOVA). This is a way to test for the statistical significance of (independent) associations between variables in your phyloseq::sample_data(), and a microbiota distance matrix you have already calculated with dist_calc().

This function is a wrapper around vegan's adonis2() function. See ?vegan::adonis2() for more insight.

You can also read this excellent book chapter on PERMANOVA by Marti Anderson: doi:10.1002/9781118445112.stat07841

Or this NPMANOVA page on GUSTA ME: https://sites.google.com/site/mb3gustame/hypothesis-tests/manova/npmanova

Usage

dist_permanova(
  data,
  variables = NULL,
  interactions = NULL,
  complete_cases = TRUE,
  n_processes = 1,
  n_perms = 999,
  seed = NULL,
  by = "margin",
  verbose = TRUE,
  ...
)
dist_permanova(
  data,
  variables = NULL,
  interactions = NULL,
  complete_cases = TRUE,
  n_processes = 1,
  n_perms = 999,
  seed = NULL,
  by = "margin",
  verbose = TRUE,
  ...
)

Arguments

`data`	psExtra output from dist_calc()
`variables`	character vector of variables to include in model or character representation of the right-hand side of a formula, e.g "varA + varB + varA:varB"
`interactions`	optional argument to define any interactions between variables, written in the style of e.g. "var_a * var_b"
`complete_cases`	if TRUE, drops observations if they contain missing values (otherwise stops if missings are detected)
`n_processes`	how many parallel processes to use? (on windows this uses parallel::makePSOCKcluster())
`n_perms`	how many permutations? e.g. 9999. Less is faster but more is better!
`seed`	set a random number generator seed to ensure you get the same results each run
`by`	passed to vegan::adonis2() `by` argument: what type of sums of squares to calculate? "margin" or "terms"
`verbose`	sends messages about progress if TRUE
`...`	additional arguments are passed directly to vegan::adonis2() (e.g. strata, add, sqrt.dist etc.)

Details

The variables argument will be collapsed into one string (if length > 1) by pasting together, separated by "+". Any interaction terms described in the interactions argument will be pasted onto the end of the pasted variables argument. Alternatively, you can supply the complete right hand side of the formula yourself e.g variables = "varA + varB + varC\*varD"

Watch out, if any of your variable names contain characters that would normally separate variables in a formula then you should rename the offending variable (e.g. avoid any of "+" "\*" "|" or ":" ) otherwise permanova will split that variable into pieces.

Value

psExtra list containing permanova results and (filtered) input objects

Examples

data("dietswap", package = "microbiome")

# add some missings to demonstrate automated removal
phyloseq::sample_data(dietswap)$sex[3:6] <- NA

# compute distance
testDist <- dietswap %>%
  tax_agg("Genus") %>%
  tax_transform("identity") %>%
  dist_calc("bray")

PERM <- testDist %>%
  dist_permanova(
    seed = 1,
    variables = c("sex", "bmi_group"),
    n_processes = 1,
    n_perms = 99 # only 99 perms used in examples for speed (use 9999+!)
  )
PERM
str(PERM, max.level = 1)

# try permanova with interaction terms
PERM2 <- testDist %>%
  dist_permanova(
    seed = 1,
    variables = "nationality + sex * bmi_group",
    n_processes = 1, n_perms = 99
  )
perm_get(PERM2)

# specify the same model in alternative way
PERM3 <- testDist %>%
  dist_permanova(
    seed = 1,
    variables = c("nationality", "sex", "bmi_group"),
    interactions = "sex * bmi_group",
    n_processes = 1, n_perms = 99
  )
perm_get(PERM3)

identical(PERM3, PERM2) # TRUE

# take same distance matrix used for the permanova and plot an ordination
PERM2 %>%
  ord_calc(method = "PCoA") %>%
  ord_plot(color = "bmi_group")
# this trick ensures any samples dropped from the permanova
# for having missing values in the covariates are NOT included
# in the corresponding ordination plot
data("dietswap", package = "microbiome")

# add some missings to demonstrate automated removal
phyloseq::sample_data(dietswap)$sex[3:6] <- NA

# compute distance
testDist <- dietswap %>%
  tax_agg("Genus") %>%
  tax_transform("identity") %>%
  dist_calc("bray")

PERM <- testDist %>%
  dist_permanova(
    seed = 1,
    variables = c("sex", "bmi_group"),
    n_processes = 1,
    n_perms = 99 # only 99 perms used in examples for speed (use 9999+!)
  )
PERM
str(PERM, max.level = 1)

# try permanova with interaction terms
PERM2 <- testDist %>%
  dist_permanova(
    seed = 1,
    variables = "nationality + sex * bmi_group",
    n_processes = 1, n_perms = 99
  )
perm_get(PERM2)

# specify the same model in alternative way
PERM3 <- testDist %>%
  dist_permanova(
    seed = 1,
    variables = c("nationality", "sex", "bmi_group"),
    interactions = "sex * bmi_group",
    n_processes = 1, n_perms = 99
  )
perm_get(PERM3)

identical(PERM3, PERM2) # TRUE

# take same distance matrix used for the permanova and plot an ordination
PERM2 %>%
  ord_calc(method = "PCoA") %>%
  ord_plot(color = "bmi_group")
# this trick ensures any samples dropped from the permanova
# for having missing values in the covariates are NOT included
# in the corresponding ordination plot

Colour palettes suitable for 20+ categories

Description

Available palettes (max colors) are "brewerPlus" (41), "kelly" (20) and "greenArmytage" (25).

"brewerPlus" is an arbitrary expansion of the "Paired" and "Dark2" colorbrewer palettes. The philosophy behind this expansion was to ensure that similar colours are far apart, and the earlier colours are attractive.
"kelly" is based on the 22-colour palette developed by Kenneth Kelly but with white and black starting colours removed. This palette is ordered such that the first colours are most distinct.
"greenArmytage" is based on a 26-colour palette proposed by Paul Green-Armytage, with black removed. This palette is not ordered by maximum contrast.

Usage

distinct_palette(n = NA, pal = "brewerPlus", add = "lightgrey")
distinct_palette(n = NA, pal = "brewerPlus", add = "lightgrey")

Arguments

`n`	number of colours to return
`pal`	palette name, one of "brewerPlus", "kelly", "greenArmytage"
`add`	colour to append to end of palette, as colour n+1, lightgrey by default for the use as "other" taxa in comp_barplot, or NA for no additional colour.

Details

Hex color codes for 'kelly' and 'greenArmytage' palettes are copied and slightly modified from the Polychrome R package: i.e. Polychrome::kelly.colors() and Polychrome::green.armytage.colors()

Please consider also citing Coombes 2019 doi:10.18637/jss.v090.c01 if you use either of these palettes.

See the Polychrome reference manual for more information: https://CRAN.R-project.org/package=Polychrome

Value

vector of colours

Examples

brewerPlus <- distinct_palette()
scales::show_col(brewerPlus)

kelly <- distinct_palette(pal = "kelly")
scales::show_col(kelly)

greenArmytage <- distinct_palette(pal = "greenArmytage")
scales::show_col(greenArmytage)
brewerPlus <- distinct_palette()
scales::show_col(brewerPlus)

kelly <- distinct_palette(pal = "kelly")
scales::show_col(kelly)

greenArmytage <- distinct_palette(pal = "greenArmytage")
scales::show_col(greenArmytage)

set options for drawing gridlines on heatmaps

Description

set options for drawing gridlines on heatmaps

Usage

heat_grid(
  col = "white",
  alpha = 1,
  lty = 1,
  lwd = 0.5,
  lex = 1,
  lineend = "round",
  linejoin = "round"
)
heat_grid(
  col = "white",
  alpha = 1,
  lty = 1,
  lwd = 0.5,
  lex = 1,
  lineend = "round",
  linejoin = "round"
)

Arguments

`col`	Colour for lines and borders.
`alpha`	Alpha channel for transparency
`lty`	Line type
`lwd`	Line width
`lex`	Multiplier applied to line width
`lineend`	Line end style (round, butt, square)
`linejoin`	Line join style (round, mitre, bevel)

Aesthetic settings for drawing numbers on heatmap tiles

Description

Works with comp_heatmap() and cor_heatmap(). See the help for those functions.

Usage

heat_numbers(
  decimals = 0,
  fontsize = 7,
  col = "darkgrey",
  fontface = "bold",
  fmt = NULL,
  ...
)
heat_numbers(
  decimals = 0,
  fontsize = 7,
  col = "darkgrey",
  fontface = "bold",
  fmt = NULL,
  ...
)

Arguments

`decimals`	number of decimal places to print
`fontsize`	fontsize specification,
`col`	colour of font
`fontface`	plain, bold, italic
`fmt`	NULL or number print format, see ?sprintf, overrides decimals arg if set
`...`	passed to grid::gpar() for grid.text

Value

list

Easy palettes for ComplexHeatmap

Description

Pass a named colorspace hcl palette to circlize::colorRamp2.

If you do not specify a range this function returns a function and the heatmap color palette will use the range of the data automatically
If you do specify a range, this returns a colour palette with that range

Usage

heat_palette(
  palette = ifelse(sym, "Blue-Red 3", "Rocket"),
  breaks = "auto",
  range = NA,
  sym = FALSE,
  rev = FALSE
)
heat_palette(
  palette = ifelse(sym, "Blue-Red 3", "Rocket"),
  breaks = "auto",
  range = NA,
  sym = FALSE,
  rev = FALSE
)

Arguments

`palette`	named palette from colorspace::hcl_palettes() diverging/sequential or a vector of colour names/hexcodes
`breaks`	number of breaks, "auto" is 11 for a named palette, or uses palette length
`range`	NA to return palette generating function that takes range or numeric vector indicating the range, to return a palette
`sym`	makes palette range symmetrical around 0 if TRUE
`rev`	reverse the palette?

Value

circlize::colorRamp2 palette if range = NA, or function returning a palette when given a range

IBD study data in phyloseq object.

Description

A phyloseq object with an OTU table and sample data from an IBD microbiome study. Originally released as ibd example data in the corncob package.

Usage

ibd
ibd

Format

A phyloseq-class experiment-level object with an OTU table and sample data.

References

Papa, E., Docktor, M., Smillie, C., Weber, S., Preheim, S. P., Gevers, D., Giannoukos, G., Ciulla, D., Tabbaa, D., Ingram, J., Schauer, D. B., Ward, D. V., Korzenik, J. R., Xavier, R. J., Bousvaros, A., Alm, E. J. & Schauer, D. B. (2012). Non-invasive mapping of the gastrointestinal microbiota identifies children with inflammatory bowel disease. PloS One, 7(6), e39242. <doi.org/10.1371/journal.pone.0039242>.

Duvallet, C., Gibbons, S., Gurry, T., Irizarry, R., & Alm, E. (2017). MicrobiomeHD: the human gut microbiome in health and disease [Data set]. Zenodo. <doi.org/10.5281/zenodo.1146764>.

microViz: microbiome data analysis and visualization

Description

microViz provides functions for statistics and visualization of microbiome sequencing data. microViz wraps, extends and complements popular microbial ecology packages like phyloseq, vegan, and microbiome.

Check out the website for tutorials and illustrated help pages.

https://david-barnett.github.io/microViz/

Author(s)

David Barnett (ORCID) (GitHub)

Ordinate samples (arrange by similarity in multiple dimensions)

Description

Used before plotting with ord_plot() or explorating interactively with ord_explore(). Use method = "auto" to automatically pick an appropriate method from:

"PCA" (Principal Components Analysis) combines taxa abundances into new dimensions. The first axes display the greatest variation in your microbial data.
"RDA" (Redundancy Analysis) is constrained PCA, roughly speaking. It finds variation in your data that can be explained both by the constraints variables, and the microbial data.
"PCoA" (Principal Coordinates Analysis) finds a coordinate system that best preserves the original distances between samples.
"CAP" (Constrained Analysis of Principal Coordinates) is also known as distance-based Redundancy Analysis.

Alternatively to leaving method = "auto", you can explicitly specify any of the above methods, or choose one of the following:

"CCA" (Canonical Correspondence Analysis) - NOT canonical correlation analysis!
"NMDS" (Non-metric Multidimensional Scaling)

You are strongly recommended to check out this useful website for introductory explanations of these methods the "GUide to STatistical Analysis in Microbial Ecology": https://sites.google.com/site/mb3gustame/

Usage

ord_calc(
  data,
  method = "auto",
  constraints = NULL,
  conditions = NULL,
  scale_cc = TRUE,
  verbose = TRUE,
  ...
)
ord_calc(
  data,
  method = "auto",
  constraints = NULL,
  conditions = NULL,
  scale_cc = TRUE,
  verbose = TRUE,
  ...
)

Arguments

`data`	psExtra object: output from dist_calc(), or tax_transform() if no distance calculation required for method e.g. for RDA
`method`	which ordination method to use? "auto" means automatically determined from psExtra and other args. If you really know what you want: manually set one of 'PCoA', 'PCA', 'CCA', 'CAP' or 'RDA'
`constraints`	(a vector of) valid sample_data name(s) to constrain analyses, or leave as NULL for unconstrained ordination. Non-NULL values are compatible with method = "auto"/"RDA"/"CAP"
`conditions`	(a vector of) valid sample_data name(s) to partial these out of analyses with Condition(), or leave as NULL
`scale_cc`	If TRUE (default) ensures any constraints and conditions variables are scaled before use, to ensure their effects are comparable. If set to FALSE you must ensure you have already set the variables on a similar scale yourself! If there are no constraints or conditions, scale_cc does nothing.
`verbose`	If TRUE or "max", show any warnings and messages about constraint and conditions scaling and missings etc. FALSE suppresses warnings!
`...`	optional arguments passed on to phyloseq::ordinate()

Details

Extends functionality of phyloseq::ordinate(). Results can be used directly in ord_plot(). You can extract the ordination object for other analyses with ord_get()

Value

psExtra object

Examples

library(phyloseq)
library(vegan)
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints
dietswap <- ps_mutate(
  dietswap,
  female = dplyr::if_else(sex == "female", true = 1, false = 0),
  weight = dplyr::recode(bmi_group, obese = 3, overweight = 2, lean = 1)
)

# add a couple of missing values to demo automated dropping of observations with missings
sample_data(dietswap)$female[c(3, 4)] <- NA

# compute ordination
test <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("bray") %>%
  ord_calc(constraints = c("weight", "female"))

# familiarise yourself with the structure of the returned psExtra object
test
str(test, max.level = 1)

# compute RDA with centre-log-ratio transformed taxa
test2 <- dietswap %>%
  tax_agg("Genus") %>%
  tax_transform("clr") %>%
  ord_calc(constraints = c("weight", "female"))

# plot with vegan package graphics to show it returns a standard ordination object
ord_get(test2) %>% vegan::ordiplot()

# This is equivalent to CAP with "aitchison" distance
ord_plot(test2, plot_taxa = 8:1)
# but the latter (below) doesn't allow plotting taxa loadings with ord_plot
dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("aitchison") %>%
  ord_calc(constraints = c("weight", "female")) %>%
  ord_plot()
library(phyloseq)
library(vegan)
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints
dietswap <- ps_mutate(
  dietswap,
  female = dplyr::if_else(sex == "female", true = 1, false = 0),
  weight = dplyr::recode(bmi_group, obese = 3, overweight = 2, lean = 1)
)

# add a couple of missing values to demo automated dropping of observations with missings
sample_data(dietswap)$female[c(3, 4)] <- NA

# compute ordination
test <- dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("bray") %>%
  ord_calc(constraints = c("weight", "female"))

# familiarise yourself with the structure of the returned psExtra object
test
str(test, max.level = 1)

# compute RDA with centre-log-ratio transformed taxa
test2 <- dietswap %>%
  tax_agg("Genus") %>%
  tax_transform("clr") %>%
  ord_calc(constraints = c("weight", "female"))

# plot with vegan package graphics to show it returns a standard ordination object
ord_get(test2) %>% vegan::ordiplot()

# This is equivalent to CAP with "aitchison" distance
ord_plot(test2, plot_taxa = 8:1)
# but the latter (below) doesn't allow plotting taxa loadings with ord_plot
dietswap %>%
  tax_agg("Genus") %>%
  dist_calc("aitchison") %>%
  ord_calc(constraints = c("weight", "female")) %>%
  ord_plot()

Interactively explore microbial compositions of ordinated samples

Description

A Shiny app used to create and explore an interactive version of ord_plot(). You can select samples on an ordination plot to view their composition with stacked barplots.

The ord_explore() data argument takes either:

the output of ord_calc() (i.e. a psExtra with an ordination)
a plain phyloseq object: ord_explore() will help you build an ordination

Once the app is running (in your browser), you can:

Create/edit the ordination if required
- look at the R console error messages if your chosen options don't build
Style the ordination plot (e.g. choose dimensions; set colour and size; ...)
- Taxa loading arrows can be added only to PCA, RDA and CCA plots
- Convex hulls or ellipses can only be drawn if Colour is set to a variable
- To track individuals over time with the path plotter, your data MUST already be sorted by time (e.g. with ps_arrange)!
Click on or use the lasso tool to select 1 or more samples to view their compositions
- By default samples can be selected individually
- Set the "Select" option to another variable to select by level of that variable
Style the taxonomic compositions barplot
- The samples are ordered using the seriate_method argument and the same transformation and distance as used in the ordination plot
- The app may lag if you select 100s of samples and ungroup the "Other" category
- To avoid this lag: either reduce the number of taxa or samples, or deselect "Interactive" barplot
Stop the app by clicking the red stop button in the R console
- Closing the web browser window doesn't stop the app, (you can find the app again at the local http address shown in the R console)
- Don't forget to copy the ordination plot code before you close the app

See the Details section for some known limitations of the app. Please report any other app problems on the microViz GitHub issues page.

Usage

ord_explore(
  data,
  sample_id = NULL,
  seriate_method = "OLO_ward",
  app_options = list(launch.browser = TRUE),
  plot_widths = c(7, 9),
  modal_fade = TRUE,
  notification_durations = list(2, 20),
  counts_warn = TRUE,
  ...
)
ord_explore(
  data,
  sample_id = NULL,
  seriate_method = "OLO_ward",
  app_options = list(launch.browser = TRUE),
  plot_widths = c(7, 9),
  modal_fade = TRUE,
  notification_durations = list(2, 20),
  counts_warn = TRUE,
  ...
)

Arguments

`data`	a phyloseq, or the psExtra output of ord_calc
`sample_id`	name of sample ID variable to use as default for selecting samples
`seriate_method`	seriation method to order phyloseq samples by similarity
`app_options`	passed to shinyApp() options argument
`plot_widths`	widths of plots in inches, including any legends (first number is ordination, second is composition barplot)
`modal_fade`	should the popover menus (modals) have a fade animation?
`notification_durations`	length 2 list giving duration in seconds of short and long notifications or NULL for notifications that do not disappear automatically
`counts_warn`	should a warning be issued if accounts are unavailable?
`...`	additional arguments passed to ord_plot

Details

Limitations:

If a "Select:" grouping variable is NA for some samples, then that grouping variable cannot be used to select those samples
"Shape:" can only be mapped to variables with maximum 5 distinct levels, not including NAs. NAs in the shape variable are shown as hollow circles.

On some web browsers, e.g. older versions of Firefox, the numeric inputs' buttons are sometimes hard to click. As a workaround, click the box and type a number or use the arrow keys. This problem occurs in all Shiny apps, not just microViz.

Value

nothing, opens default browser

Examples

# example code only runs in interactive R session
if (interactive()) {
  library(phyloseq)
  library(dplyr)

  # example of quickstart approach with interactive ordination calculation #
  microViz::ibd %>%
    # filtering makes subsequent calculations faster
    tax_filter(min_prevalence = 2) %>%
    tax_fix() %>%
    ord_explore()

  # simple example with precalculated ordination #
  data("enterotype")
  taxa_names(enterotype)[1] <- "unclassified" # replaces the "-1" taxon name
  ps <- tax_fix(enterotype) # remove NA taxa
  ord1 <- ps %>%
    tax_transform("identity", rank = "Genus") %>%
    dist_calc("bray") %>%
    ord_calc(method = "PCoA")

  ord_explore(data = ord1, auto_caption = 6)

  # constrained ordination example #
  data("dietswap", package = "microbiome")

  # create a couple of numerical variables to use as constraints
  dietswap <- dietswap %>%
    ps_mutate(
      weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
      female = if_else(sex == "female", true = 1, false = 0)
    ) %>%
    tax_agg("Genus")

  constrained_aitchison_rda <- dietswap %>%
    tax_transform("clr") %>%
    ord_calc(constraints = c("weight", "female"))

  # label style arguments can be passed to ord_explore
  constrained_aitchison_rda %>%
    ord_explore(
      tax_lab_style = list(size = 3),
      constraint_lab_style = list(size = 4), auto_caption = 6
    )
  # Try changing the point colour to bmi_group or similar
  # Style points interactively!
  # (setting colour/shape/etc as arguments doesn't work)

  # dietswap is actually a longitudinal dataset, with multiple samples per
  # subject. If we arrange by timepoint first (!!!), we can use the "paths"
  # additional plot layer from the ord_explore "Add:" menu to track
  # individual subjects over time.
  dietswap %>%
    ps_arrange(timepoint) %>%
    tax_fix() %>%
    ord_explore()


  # Another dataset, where "size" variable drives gradient on PC1
  # Try setting size and/or alpha to correspond to "size"!
  # Then edit the ordination to use "size" as a condition, see what happens
  # hmp2 <- microbiomeutilities::hmp2
  hmp2 %>%
    tax_fix() %>%
    tax_transform(rank = "Genus", "identity") %>%
    dist_calc("aitchison") %>%
    ord_calc() %>%
    ord_explore()

  # another dataset
  data("soilrep", package = "phyloseq")
  # test auto creation of SAMPLE var
  ps <- soilrep %>% ps_select(-Sample)
  # The barplot is actually quite useless with the 16000+ anonymous OTUs
  # in this dataset, but the 1000s of unmerged "Other" categories do render
  phyloseq_validate(ps) %>%
    tax_fix() %>%
    dist_calc("aitchison") %>%
    ord_calc() %>%
    ord_explore()
}
# example code only runs in interactive R session
if (interactive()) {
  library(phyloseq)
  library(dplyr)

  # example of quickstart approach with interactive ordination calculation #
  microViz::ibd %>%
    # filtering makes subsequent calculations faster
    tax_filter(min_prevalence = 2) %>%
    tax_fix() %>%
    ord_explore()

  # simple example with precalculated ordination #
  data("enterotype")
  taxa_names(enterotype)[1] <- "unclassified" # replaces the "-1" taxon name
  ps <- tax_fix(enterotype) # remove NA taxa
  ord1 <- ps %>%
    tax_transform("identity", rank = "Genus") %>%
    dist_calc("bray") %>%
    ord_calc(method = "PCoA")

  ord_explore(data = ord1, auto_caption = 6)

  # constrained ordination example #
  data("dietswap", package = "microbiome")

  # create a couple of numerical variables to use as constraints
  dietswap <- dietswap %>%
    ps_mutate(
      weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
      female = if_else(sex == "female", true = 1, false = 0)
    ) %>%
    tax_agg("Genus")

  constrained_aitchison_rda <- dietswap %>%
    tax_transform("clr") %>%
    ord_calc(constraints = c("weight", "female"))

  # label style arguments can be passed to ord_explore
  constrained_aitchison_rda %>%
    ord_explore(
      tax_lab_style = list(size = 3),
      constraint_lab_style = list(size = 4), auto_caption = 6
    )
  # Try changing the point colour to bmi_group or similar
  # Style points interactively!
  # (setting colour/shape/etc as arguments doesn't work)

  # dietswap is actually a longitudinal dataset, with multiple samples per
  # subject. If we arrange by timepoint first (!!!), we can use the "paths"
  # additional plot layer from the ord_explore "Add:" menu to track
  # individual subjects over time.
  dietswap %>%
    ps_arrange(timepoint) %>%
    tax_fix() %>%
    ord_explore()


  # Another dataset, where "size" variable drives gradient on PC1
  # Try setting size and/or alpha to correspond to "size"!
  # Then edit the ordination to use "size" as a condition, see what happens
  # hmp2 <- microbiomeutilities::hmp2
  hmp2 %>%
    tax_fix() %>%
    tax_transform(rank = "Genus", "identity") %>%
    dist_calc("aitchison") %>%
    ord_calc() %>%
    ord_explore()

  # another dataset
  data("soilrep", package = "phyloseq")
  # test auto creation of SAMPLE var
  ps <- soilrep %>% ps_select(-Sample)
  # The barplot is actually quite useless with the 16000+ anonymous OTUs
  # in this dataset, but the 1000s of unmerged "Other" categories do render
  phyloseq_validate(ps) %>%
    tax_fix() %>%
    dist_calc("aitchison") %>%
    ord_calc() %>%
    ord_explore()
}

Customisable ggplot2 ordination plot

Description

Draw ordination plot. Utilises psExtra object produced by of ord_calc.

For an extensive tutorial see the ordination vignette.
For interpretation see the the relevant pages on PCA, PCoA, RDA, or CCA on the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME) website: https://sites.google.com/site/mb3gustame/

Usage

ord_plot(
  data,
  axes = 1:2,
  plot_taxa = FALSE,
  tax_vec_length = NA,
  tax_vec_style_all = vec_tax_all(),
  tax_vec_style_sel = vec_tax_sel(),
  tax_lab_length = tax_vec_length * 1.1,
  tax_lab_style = list(),
  taxon_renamer = function(x) identity(x),
  constraint_vec_length = NA,
  constraint_vec_style = vec_constraint(),
  constraint_lab_length = constraint_vec_length * 1.1,
  constraint_lab_style = list(),
  var_renamer = function(x) identity(x),
  plot_samples = TRUE,
  scaling = 2,
  auto_caption = 8,
  center = FALSE,
  clip = "off",
  expand = !center,
  interactive = FALSE,
  ...
)
ord_plot(
  data,
  axes = 1:2,
  plot_taxa = FALSE,
  tax_vec_length = NA,
  tax_vec_style_all = vec_tax_all(),
  tax_vec_style_sel = vec_tax_sel(),
  tax_lab_length = tax_vec_length * 1.1,
  tax_lab_style = list(),
  taxon_renamer = function(x) identity(x),
  constraint_vec_length = NA,
  constraint_vec_style = vec_constraint(),
  constraint_lab_length = constraint_vec_length * 1.1,
  constraint_lab_style = list(),
  var_renamer = function(x) identity(x),
  plot_samples = TRUE,
  scaling = 2,
  auto_caption = 8,
  center = FALSE,
  clip = "off",
  expand = !center,
  interactive = FALSE,
  ...
)

Arguments

`data`	psExtra object with ordination attached, i.e. output from ord_calc
`axes`	which axes to plot: numerical vector of length 2, e.g. 1:2 or c(3,5)
`plot_taxa`	if ord_calc method was "PCA/RDA" draw the taxa loading vectors (see details)
`tax_vec_length`	taxon arrow vector scale multiplier. NA = auto-scaling, or provide a numeric multiplier yourself.
`tax_vec_style_all`	list of named aesthetic attributes for all (background) taxon vectors
`tax_vec_style_sel`	list of named aesthetic attributes for taxon vectors for the taxa selected by plot_taxa
`tax_lab_length`	scale multiplier for label distance/position for any selected taxa
`tax_lab_style`	list of style options for the taxon labels, see tax_lab_style() function.
`taxon_renamer`	function that takes any plotted taxon names and returns modified names for labels
`constraint_vec_length`	constraint arrow vector scale multiplier. NA = auto-scaling, or provide a numeric multiplier yourself.
`constraint_vec_style`	list of aesthetics/arguments (colour, alpha etc) for the constraint vectors
`constraint_lab_length`	label distance/position for any constraints (relative to default position which is proportional to correlations with each axis)
`constraint_lab_style`	list of aesthetics/arguments (colour, size etc) for the constraint labels
`var_renamer`	function to rename constraining variables for plotting their labels
`plot_samples`	if TRUE, plot sample points with geom_point
`scaling`	Type 2, or type 1 scaling. For more info, see https://sites.google.com/site/mb3gustame/constrained-analyses/redundancy-analysis. Either "species" or "site" scores are scaled by (proportional) eigenvalues, and the other set of scores is left unscaled (from ?vegan::scores.cca)
`auto_caption`	size of caption with info about the ordination, NA for none
`center`	expand plot limits to center around origin point (0,0)
`clip`	clipping of labels that extend outside plot limits?
`expand`	expand plot limits a little bit further than data range?
`interactive`	creates plot suitable for use with ggiraph (used in ord_explore)
`...`	pass aesthetics arguments for sample points, drawn with geom_point using aes_string

Details

How to specify the plot_taxa argument (when using PCA, CCA or RDA):

FALSE –> plot no taxa vectors or labels
integer vector e.g. 1:3 –> plot labels for top 3 taxa (by longest line length)
single numeric value e.g. 0.75 –> plot labels for taxa with line length > 0.75
character vector e.g. c('g__Bacteroides', 'g__Veillonella') –> plot labels for the exactly named taxa

Value

ggplot

Examples

library(ggplot2)
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints or conditions
dietswap <- dietswap %>%
  ps_mutate(
    weight = dplyr::recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = dplyr::if_else(sex == "female", true = 1, false = 0)
  )

# unconstrained PCA ordination
unconstrained_aitchison_pca <- dietswap %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc() # method = "auto" --> picks PCA as no constraints or distances

unconstrained_aitchison_pca %>%
  ord_plot(colour = "bmi_group", plot_taxa = 1:5) +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))

# you can generate an interactive version of the plot by specifying
# interactive = TRUE, and passing a variable name to another argument
# called `data_id` which is required for interactive point selection
interactive_plot <- unconstrained_aitchison_pca %>%
  ord_plot(
    colour = "bmi_group", plot_taxa = 1:5,
    interactive = TRUE, data_id = "sample"
  )

# to start the html viewer, and allow selecting points, we must use a
# ggiraph function called girafe and set some options and css
ggiraph::girafe(
  ggobj = interactive_plot,
  options = list(
    ggiraph::opts_selection(
      css = ggiraph::girafe_css(
        css = "fill:orange;stroke:black;",
        point = "stroke-width:1.5px"
      ),
      type = "multiple", # this activates lasso selection (click top-right)
      only_shiny = FALSE # allows interactive plot outside of shiny app
    )
  )
)


# remove effect of weight with conditions arg
# scaling weight with scale_cc is not necessary as only 1 condition is used
dietswap %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(conditions = "weight", scale_cc = FALSE) %>%
  ord_plot(colour = "bmi_group") +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))

# alternatively, constrain variation on weight and female
constrained_aitchison_rda <- dietswap %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(constraints = c("weight", "female")) # constraints --> RDA

constrained_aitchison_rda %>%
  ord_plot(colour = "bmi_group", constraint_vec_length = 2) +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))

# ggplot allows additional customisation of the resulting plot
p <- constrained_aitchison_rda %>%
  ord_plot(colour = "bmi_group", plot_taxa = 1:3) +
  lims(x = c(-5, 6), y = c(-5, 5)) +
  scale_colour_brewer(palette = "Set1")

p + stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))
p + stat_density2d(aes(colour = bmi_group))

# you can rename the taxa on the labels with any function that
# takes and modifies a character vector
constrained_aitchison_rda %>%
  ord_plot(
    colour = "bmi_group",
    plot_taxa = 1:3,
    taxon_renamer = function(x) stringr::str_extract(x, "^.")
  ) +
  lims(x = c(-5, 6), y = c(-5, 5)) +
  scale_colour_brewer(palette = "Set1")

# You can plot PCoA and constrained PCoA plots too.
# You don't typically need/want to use transformed taxa variables for PCoA
# But it is good practice to call tax_transform("identity") so that
# the automatic caption can record that no transformation was applied
dietswap %>%
  tax_agg("Genus") %>%
  tax_transform("identity") %>%
  # so caption can record (lack of) transform
  dist_calc("bray") %>%
  # bray curtis
  ord_calc() %>%
  # guesses you want an unconstrained PCoA
  ord_plot(colour = "bmi_group")

# it is possible to facet these plots
# (although I'm not sure it makes sense to)
# but only unconstrained ordination plots and with plot_taxa = FALSE
unconstrained_aitchison_pca %>%
  ord_plot(color = "sex", auto_caption = NA) +
  facet_wrap("sex") +
  theme(line = element_blank()) +
  stat_density2d(aes(colour = sex)) +
  guides(colour = "none")

unconstrained_aitchison_pca %>%
  ord_plot(color = "bmi_group", plot_samples = FALSE, auto_caption = NA) +
  facet_wrap("sex") +
  theme(line = element_blank(), axis.text = element_blank()) +
  stat_density2d_filled(show.legend = FALSE) +
  geom_point(size = 1, shape = 21, colour = "black", fill = "white")
library(ggplot2)
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints or conditions
dietswap <- dietswap %>%
  ps_mutate(
    weight = dplyr::recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = dplyr::if_else(sex == "female", true = 1, false = 0)
  )

# unconstrained PCA ordination
unconstrained_aitchison_pca <- dietswap %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc() # method = "auto" --> picks PCA as no constraints or distances

unconstrained_aitchison_pca %>%
  ord_plot(colour = "bmi_group", plot_taxa = 1:5) +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))

# you can generate an interactive version of the plot by specifying
# interactive = TRUE, and passing a variable name to another argument
# called `data_id` which is required for interactive point selection
interactive_plot <- unconstrained_aitchison_pca %>%
  ord_plot(
    colour = "bmi_group", plot_taxa = 1:5,
    interactive = TRUE, data_id = "sample"
  )

# to start the html viewer, and allow selecting points, we must use a
# ggiraph function called girafe and set some options and css
ggiraph::girafe(
  ggobj = interactive_plot,
  options = list(
    ggiraph::opts_selection(
      css = ggiraph::girafe_css(
        css = "fill:orange;stroke:black;",
        point = "stroke-width:1.5px"
      ),
      type = "multiple", # this activates lasso selection (click top-right)
      only_shiny = FALSE # allows interactive plot outside of shiny app
    )
  )
)


# remove effect of weight with conditions arg
# scaling weight with scale_cc is not necessary as only 1 condition is used
dietswap %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(conditions = "weight", scale_cc = FALSE) %>%
  ord_plot(colour = "bmi_group") +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))

# alternatively, constrain variation on weight and female
constrained_aitchison_rda <- dietswap %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(constraints = c("weight", "female")) # constraints --> RDA

constrained_aitchison_rda %>%
  ord_plot(colour = "bmi_group", constraint_vec_length = 2) +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))

# ggplot allows additional customisation of the resulting plot
p <- constrained_aitchison_rda %>%
  ord_plot(colour = "bmi_group", plot_taxa = 1:3) +
  lims(x = c(-5, 6), y = c(-5, 5)) +
  scale_colour_brewer(palette = "Set1")

p + stat_ellipse(aes(linetype = bmi_group, colour = bmi_group))
p + stat_density2d(aes(colour = bmi_group))

# you can rename the taxa on the labels with any function that
# takes and modifies a character vector
constrained_aitchison_rda %>%
  ord_plot(
    colour = "bmi_group",
    plot_taxa = 1:3,
    taxon_renamer = function(x) stringr::str_extract(x, "^.")
  ) +
  lims(x = c(-5, 6), y = c(-5, 5)) +
  scale_colour_brewer(palette = "Set1")

# You can plot PCoA and constrained PCoA plots too.
# You don't typically need/want to use transformed taxa variables for PCoA
# But it is good practice to call tax_transform("identity") so that
# the automatic caption can record that no transformation was applied
dietswap %>%
  tax_agg("Genus") %>%
  tax_transform("identity") %>%
  # so caption can record (lack of) transform
  dist_calc("bray") %>%
  # bray curtis
  ord_calc() %>%
  # guesses you want an unconstrained PCoA
  ord_plot(colour = "bmi_group")

# it is possible to facet these plots
# (although I'm not sure it makes sense to)
# but only unconstrained ordination plots and with plot_taxa = FALSE
unconstrained_aitchison_pca %>%
  ord_plot(color = "sex", auto_caption = NA) +
  facet_wrap("sex") +
  theme(line = element_blank()) +
  stat_density2d(aes(colour = sex)) +
  guides(colour = "none")

unconstrained_aitchison_pca %>%
  ord_plot(color = "bmi_group", plot_samples = FALSE, auto_caption = NA) +
  facet_wrap("sex") +
  theme(line = element_blank(), axis.text = element_blank()) +
  stat_density2d_filled(show.legend = FALSE) +
  geom_point(size = 1, shape = 21, colour = "black", fill = "white")

Circular compositional barplot sorted by ordination angle

Description

Use with ord_calc output as data argument. Order of samples extracted from ordination axes in data. Best paired with ordination plot made from same ord_calc output.

Usage

ord_plot_iris(
  data,
  tax_level,
  axes = 1:2,
  n_taxa = 10,
  ord_plot = "none",
  taxon_renamer = function(x) identity(x),
  palette = distinct_palette(n_taxa),
  anno_colour = NULL,
  anno_colour_style = list(),
  anno_binary = NULL,
  anno_binary_style = list(),
  keep_all_vars = FALSE,
  scaling = 2,
  count_warn = TRUE,
  ...
)
ord_plot_iris(
  data,
  tax_level,
  axes = 1:2,
  n_taxa = 10,
  ord_plot = "none",
  taxon_renamer = function(x) identity(x),
  palette = distinct_palette(n_taxa),
  anno_colour = NULL,
  anno_colour_style = list(),
  anno_binary = NULL,
  anno_binary_style = list(),
  keep_all_vars = FALSE,
  scaling = 2,
  count_warn = TRUE,
  ...
)

Arguments

`data`	psExtra output of ord_calc
`tax_level`	taxonomic aggregation level (from rank_names(ps))
`axes`	which 2 axes of ordination to use for ordering bars
`n_taxa`	how many taxa to colour show distinct colours for (all other taxa grouped into "other").
`ord_plot`	add a matching ordination plot to your iris plot ('list' returns separate plots in a list, 'above'/'below' uses patchwork to pair plots together into one)
`taxon_renamer`	function to rename taxa in the legend
`palette`	colour palette
`anno_colour`	name of sample_data variable to use for colouring geom_segment annotation ring
`anno_colour_style`	list of further arguments passed to geom_segment e.g. size
`anno_binary`	name(s) of binary sample_data variable(s) (levels T/F or 1/0) to use for filtered geom_point annotation ring(s) (annotates at TRUE values)
`anno_binary_style`	list of further arguments passed to geom_point e.g. colour, size, y, etc.
`keep_all_vars`	slows down processing but is required for any post-hoc plot customisation options
`scaling`	Type 2, or type 1 scaling. For more info, see https://sites.google.com/site/mb3gustame/constrained-analyses/redundancy-analysis. Either "species" or "site" scores are scaled by (proportional) eigenvalues, and the other set of scores is left unscaled (from ?vegan::scores.cca)
`count_warn`	warn if count data are not available? i.e. phyloseq otu_table is not positive integers and psExtra counts slot is NULL
`...`	Arguments passed on to `comp_barplot` `merge_other` if FALSE, taxa coloured/filled as "other" remain distinct, and so can have bar outlines drawn around them `bar_width` default 1 avoids random gapping otherwise seen with many samples (set to less than 1 to introduce gaps between samples) `bar_outline_colour` line colour separating taxa and samples (use NA for no outlines) `bar_outline_width` width of line separating taxa and samples (for no outlines set bar_outline_colour = NA) `tax_transform_for_plot` default "compositional" draws proportions of total counts per sample, but you could reasonably use another transformation, e.g. "identity", if you have truly quantitative microbiome profiling data `interactive` creates plot suitable for use with ggiraph `max_taxa` maximum distinct taxa groups to show (only really useful for limiting complexity of interactive plots e.g. within ord_explore) `other_name` name for other taxa after N

Details

data must also contain counts table if taxa were transformed (e.g. for clr PCA ordination) (i.e. you must have used tax_transform with keep_counts = TRUE, if transformation was not "identity")

You cannot set a variable fill aesthetic (only fixed) for the annotation points, as the fill is used for the taxonomic composition bars

Value

ggplot

Examples

library(dplyr)
library(ggplot2)
data("dietswap", package = "microbiome")

# although these iris plots are great for 100s of samples
# we'll take a subset of the data (for speed in this example)
ps <- dietswap %>%
  ps_filter(timepoint %in% c(1, 2)) %>%
  # copy an otu to the sample data
  ps_otu2samdat("Prevotella melaninogenica et rel.") %>%
  # create a couple of useful variables
  ps_mutate(
    female = sex == "female",
    african = nationality == "AFR",
    log_P.melaninogenica = log10(`Prevotella melaninogenica et rel.` + 1)
  )

# define a function for taking the end off the long genus names in this dataset
tax_renamer <- function(tax) {
  stringr::str_remove(tax, " [ae]t rel.")
}

ord <- ps %>%
  tax_agg("Genus") %>%
  dist_calc("aitchison") %>%
  ord_calc(method = "PCoA")

# ordination plot for comparison
ord %>% ord_plot(color = "log_P.melaninogenica", size = 3)

ord_plot_iris(
  data = ord,
  tax_level = "Genus",
  n_taxa = 10,
  anno_colour = "nationality",
  anno_colour_style = list(size = 3),
  anno_binary = "female",
  anno_binary_style = list(shape = "F", size = 2.5),
  taxon_renamer = tax_renamer
) +
  scale_colour_brewer(palette = "Dark2")

# It is also possible to use comp_barplot customisation arguments
# like bar_width and bar_outline_colour, and to make interactive iris plots
# using ggiraph:

if (interactive()) {
  hover_over_me <- ord_plot_iris(
    data = ord,
    tax_level = "Genus",
    n_taxa = 10,
    anno_colour = "nationality",
    anno_colour_style = list(size = 3),
    anno_binary = "female",
    anno_binary_style = list(shape = "F", size = 2.5),
    taxon_renamer = tax_renamer,
    interactive = TRUE,
    bar_width = 0.8, bar_outline_colour = "black"
  ) +
    scale_colour_brewer(palette = "Dark2")

  ggiraph::girafe(ggobj = hover_over_me)
}

# Using PCA for ordination after transformations (e.g. clr) means the untransformed taxonomic
# data are only available for plotting as compositions if you transformed with
# tax_transform(keep_counts = TRUE) and your original data were in fact counts.
# Compositional data will also work, and you can set count_warn to FALSE to avoid the warning

clr_pca <- ps %>%
  tax_agg("Genus") %>%
  tax_transform("clr") %>%
  ord_calc(method = "PCA")

# you can generate a simple paired layout of ord_plot and iris plot
# or separately create and pair the plots yourself, for more control

# simple pairing
ord_plot_iris(
  data = clr_pca, n_taxa = 12,
  tax_level = "Genus",
  taxon_renamer = tax_renamer,
  ord_plot = "below",
  bar_width = 0.8, bar_outline_colour = "black",
  anno_binary = "african",
  anno_binary_style = list(
    y = 1.08, colour = "gray50", shape = "circle open", size = 1, stroke = 1.5
  )
)

# manual pairing
plot1 <- clr_pca %>% ord_plot(
  plot_taxa = 6:1, tax_vec_length = 0.6,
  colour = "gray50", shape = "nationality",
  taxon_renamer = tax_renamer,
  auto_caption = NA, center = TRUE,
) +
  scale_shape_manual(values = c(AFR = "circle", AAM = "circle open"))

iris <- ord_plot_iris(
  data = clr_pca, n_taxa = 15,
  tax_level = "Genus",
  taxon_renamer = tax_renamer,
  anno_binary = "african",
  anno_binary_style = list(y = 1.05, colour = "gray50", shape = "circle", size = 1)
) +
  # shrink legend text size
  theme(legend.text = element_text(size = 7))

cowplot::plot_grid(plot1, iris, nrow = 1, align = "h", axis = "b", rel_widths = 3:4)

# you can add multiple rings of binary annotations
ord_plot_iris(
  data = clr_pca, n_taxa = 15,
  tax_level = "Genus",
  taxon_renamer = tax_renamer,
  anno_binary = c("african", "female"),
  anno_binary_style = list(
    colour = c("gray50", "coral"),
    shape = c("circle", "F"), size = c(0.5, 2)
  )
) +
  theme(legend.text = element_text(size = 7))
library(dplyr)
library(ggplot2)
data("dietswap", package = "microbiome")

# although these iris plots are great for 100s of samples
# we'll take a subset of the data (for speed in this example)
ps <- dietswap %>%
  ps_filter(timepoint %in% c(1, 2)) %>%
  # copy an otu to the sample data
  ps_otu2samdat("Prevotella melaninogenica et rel.") %>%
  # create a couple of useful variables
  ps_mutate(
    female = sex == "female",
    african = nationality == "AFR",
    log_P.melaninogenica = log10(`Prevotella melaninogenica et rel.` + 1)
  )

# define a function for taking the end off the long genus names in this dataset
tax_renamer <- function(tax) {
  stringr::str_remove(tax, " [ae]t rel.")
}

ord <- ps %>%
  tax_agg("Genus") %>%
  dist_calc("aitchison") %>%
  ord_calc(method = "PCoA")

# ordination plot for comparison
ord %>% ord_plot(color = "log_P.melaninogenica", size = 3)

ord_plot_iris(
  data = ord,
  tax_level = "Genus",
  n_taxa = 10,
  anno_colour = "nationality",
  anno_colour_style = list(size = 3),
  anno_binary = "female",
  anno_binary_style = list(shape = "F", size = 2.5),
  taxon_renamer = tax_renamer
) +
  scale_colour_brewer(palette = "Dark2")

# It is also possible to use comp_barplot customisation arguments
# like bar_width and bar_outline_colour, and to make interactive iris plots
# using ggiraph:

if (interactive()) {
  hover_over_me <- ord_plot_iris(
    data = ord,
    tax_level = "Genus",
    n_taxa = 10,
    anno_colour = "nationality",
    anno_colour_style = list(size = 3),
    anno_binary = "female",
    anno_binary_style = list(shape = "F", size = 2.5),
    taxon_renamer = tax_renamer,
    interactive = TRUE,
    bar_width = 0.8, bar_outline_colour = "black"
  ) +
    scale_colour_brewer(palette = "Dark2")

  ggiraph::girafe(ggobj = hover_over_me)
}

# Using PCA for ordination after transformations (e.g. clr) means the untransformed taxonomic
# data are only available for plotting as compositions if you transformed with
# tax_transform(keep_counts = TRUE) and your original data were in fact counts.
# Compositional data will also work, and you can set count_warn to FALSE to avoid the warning

clr_pca <- ps %>%
  tax_agg("Genus") %>%
  tax_transform("clr") %>%
  ord_calc(method = "PCA")

# you can generate a simple paired layout of ord_plot and iris plot
# or separately create and pair the plots yourself, for more control

# simple pairing
ord_plot_iris(
  data = clr_pca, n_taxa = 12,
  tax_level = "Genus",
  taxon_renamer = tax_renamer,
  ord_plot = "below",
  bar_width = 0.8, bar_outline_colour = "black",
  anno_binary = "african",
  anno_binary_style = list(
    y = 1.08, colour = "gray50", shape = "circle open", size = 1, stroke = 1.5
  )
)

# manual pairing
plot1 <- clr_pca %>% ord_plot(
  plot_taxa = 6:1, tax_vec_length = 0.6,
  colour = "gray50", shape = "nationality",
  taxon_renamer = tax_renamer,
  auto_caption = NA, center = TRUE,
) +
  scale_shape_manual(values = c(AFR = "circle", AAM = "circle open"))

iris <- ord_plot_iris(
  data = clr_pca, n_taxa = 15,
  tax_level = "Genus",
  taxon_renamer = tax_renamer,
  anno_binary = "african",
  anno_binary_style = list(y = 1.05, colour = "gray50", shape = "circle", size = 1)
) +
  # shrink legend text size
  theme(legend.text = element_text(size = 7))

cowplot::plot_grid(plot1, iris, nrow = 1, align = "h", axis = "b", rel_widths = 3:4)

# you can add multiple rings of binary annotations
ord_plot_iris(
  data = clr_pca, n_taxa = 15,
  tax_level = "Genus",
  taxon_renamer = tax_renamer,
  anno_binary = c("african", "female"),
  anno_binary_style = list(
    colour = c("gray50", "coral"),
    shape = c("circle", "F"), size = c(0.5, 2)
  )
) +
  theme(legend.text = element_text(size = 7))

Create ordination plot vector styling lists

Description

Used by ord_plot, see examples there.

Usage

vec_constraint(
  linewidth = 1,
  alpha = 0.8,
  colour = "brown",
  arrow = grid::arrow(length = grid::unit(0.005, units = "npc"), type = "closed", angle =
    30),
  lineend = "round",
  linejoin = "mitre",
  ...
)

vec_tax_sel(
  linewidth = 0.5,
  alpha = 1,
  colour = "black",
  arrow = grid::arrow(length = grid::unit(0.005, units = "npc"), type = "closed", angle =
    30),
  lineend = "round",
  linejoin = "mitre",
  ...
)

vec_tax_all(linewidth = 0.5, alpha = 0.25, arrow = NULL, ...)
vec_constraint(
  linewidth = 1,
  alpha = 0.8,
  colour = "brown",
  arrow = grid::arrow(length = grid::unit(0.005, units = "npc"), type = "closed", angle =
    30),
  lineend = "round",
  linejoin = "mitre",
  ...
)

vec_tax_sel(
  linewidth = 0.5,
  alpha = 1,
  colour = "black",
  arrow = grid::arrow(length = grid::unit(0.005, units = "npc"), type = "closed", angle =
    30),
  lineend = "round",
  linejoin = "mitre",
  ...
)

vec_tax_all(linewidth = 0.5, alpha = 0.25, arrow = NULL, ...)

Arguments

`linewidth`	width of vector
`alpha`	opacity of vector
`colour`	colour of vector
`arrow`	arrow style specified with grid::arrow() or NULL for no arrow
`lineend`	Line end style (round, butt, square).
`linejoin`	Line join style (round, mitre, bevel).
`...`	further arguments passed to geom_segment

Value

list

Create list for ord_plot() *_lab_style arguments

Description

Customise taxa and constraint labels on your ordination plots. Choose 'text' or 'label' type, rotate and/or justify the text/labels and set aesthetic appearances using tax_lab_style() or constraint_lab_style().

Usage

tax_lab_style(
  type = "label",
  max_angle = 0,
  perpendicular = FALSE,
  aspect_ratio = 1,
  justify = "auto",
  size = 2,
  alpha = 1,
  colour = "black",
  ...
)

constraint_lab_style(
  type = "label",
  max_angle = 0,
  perpendicular = FALSE,
  aspect_ratio = 1,
  justify = "auto",
  size = 2.5,
  alpha = 1,
  colour = "brown",
  ...
)
tax_lab_style(
  type = "label",
  max_angle = 0,
  perpendicular = FALSE,
  aspect_ratio = 1,
  justify = "auto",
  size = 2,
  alpha = 1,
  colour = "black",
  ...
)

constraint_lab_style(
  type = "label",
  max_angle = 0,
  perpendicular = FALSE,
  aspect_ratio = 1,
  justify = "auto",
  size = 2.5,
  alpha = 1,
  colour = "brown",
  ...
)

Arguments

`type`	'label', 'text' or 'richtext' ('richtext' also used if 'label' type are rotated, when max_angle > 0)
`max_angle`	maximum angle of rotation to allow to match vector angle (requires ggtext package to rotate "label" type)
`perpendicular`	if TRUE, sets rotated labels perpendicular to desired angle, not parallel
`aspect_ratio`	aspect ratio of plot (y/x) must also be used in coord_fixed() ratio argument (must be set when rotated labels are used, to ensure match to arrow angles)
`justify`	"center", "side", or "auto"? Should the text/label align with the arrows at the text center or text sides (uses hjust, if 'auto', picks based on whether max_angle is greater than 0)
`size`	fixed size of text or label
`alpha`	fixed alpha of text or label
`colour`	fixed colour of text or label
`...`	further named arguments passed to geom_text, geom_label or geom_richtext

Value

named list

Examples

# These examples show styling of taxa labels with tax_lab_style().
# The same options are available for constraint labels in constrained
# ordinations. constraint_lab_style() just has different default settings.

library(ggplot2)

# get example inflammatory bowel disease stool dataset from corncob package
data("ibd", package = "microViz")

# filter out rare taxa and clean up names etc
ibd <- ibd %>%
  tax_filter(min_prevalence = 3) %>%
  tax_fix() %>%
  phyloseq_validate()

# calculate a centered-log-ratio transformed PCA ordination
ibd_ord <- ibd %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc("PCA")

# basic plot with default label style
ibd_ord %>% ord_plot(color = "ibd", plot_taxa = 1:10)

# Rotating labels: requires the ggtext package #
# A fixed coordinate ratio must be set to ensure label rotation
# matches the vectors. It is also helpful to set the vector and label length
# multipliers manually for a good look. Rotated labels are justified to the
# 'sides' automatically by tax_lab_style() with justify = 'auto'
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.3,
    tax_lab_style = tax_lab_style(max_angle = 90)
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# You can use text instead of labels
# - a bold fontface helps text to stand out
# - see ?ggplot2::geom_text for all settings available
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.4,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, size = 2.5, fontface = "bold.italic"
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# With text you can prevent overlaps with check_overlap = TRUE
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:12,
    tax_vec_length = 1.3, tax_lab_length = 1.4,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, size = 3, fontface = "bold.italic",
      check_overlap = TRUE
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# With labels, you can reduce the padding and line weight to free space
# but check_overlap is not available
# see ?ggtext::geom_richtext for more possibilities
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.35,
    tax_lab_style = tax_lab_style(
      max_angle = 90, fontface = "italic", size = 2.5, fill = "grey95",
      label.size = 0.1, # width outline
      label.padding = unit(0.1, "lines"),
      label.r = unit(0, "lines") # reduces rounding of corners to radius 0
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# Perpendicular angled labels/text are possible
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:12,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, perpendicular = TRUE, size = 3,
      check_overlap = TRUE
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))


# You can limit and/or attenuate the angle of rotation by:
#  - setting a lower max_angle
#  - decreasing the aspect_ratio in the tax_lab_style call
ibd_ord %>%
  ord_plot(
    shape = "circle", color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.3,
    tax_lab_style = tax_lab_style(
      max_angle = 10, size = 2, label.size = 0.1,
      label.padding = unit(0.1, "lines"), label.r = unit(0, "lines")
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

ibd_ord %>%
  ord_plot(
    shape = "circle", color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.3,
    tax_lab_style = tax_lab_style(
      max_angle = 90, size = 2, label.size = 0.1, aspect_ratio = 0.5,
      label.padding = unit(0.1, "lines"), label.r = unit(0, "lines")
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# another example with some extras #
ibd_ord %>%
  ord_plot(
    shape = "circle filled", fill = "ibd",
    plot_taxa = 1:10,
    taxon_renamer = function(x) stringr::str_replace_all(x, "_", " "),
    tax_vec_length = 2, tax_lab_length = 2.1,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, size = 2.5,
      fontface = "bold.italic", check_overlap = TRUE
    )
  ) +
  coord_fixed(1, clip = "off", xlim = c(-5, 5)) +
  theme(legend.position = c(0.8, 0.2), legend.background = element_rect()) +
  stat_chull(mapping = aes(colour = ibd, fill = ibd), alpha = 0.1)
# These examples show styling of taxa labels with tax_lab_style().
# The same options are available for constraint labels in constrained
# ordinations. constraint_lab_style() just has different default settings.

library(ggplot2)

# get example inflammatory bowel disease stool dataset from corncob package
data("ibd", package = "microViz")

# filter out rare taxa and clean up names etc
ibd <- ibd %>%
  tax_filter(min_prevalence = 3) %>%
  tax_fix() %>%
  phyloseq_validate()

# calculate a centered-log-ratio transformed PCA ordination
ibd_ord <- ibd %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc("PCA")

# basic plot with default label style
ibd_ord %>% ord_plot(color = "ibd", plot_taxa = 1:10)

# Rotating labels: requires the ggtext package #
# A fixed coordinate ratio must be set to ensure label rotation
# matches the vectors. It is also helpful to set the vector and label length
# multipliers manually for a good look. Rotated labels are justified to the
# 'sides' automatically by tax_lab_style() with justify = 'auto'
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.3,
    tax_lab_style = tax_lab_style(max_angle = 90)
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# You can use text instead of labels
# - a bold fontface helps text to stand out
# - see ?ggplot2::geom_text for all settings available
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.4,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, size = 2.5, fontface = "bold.italic"
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# With text you can prevent overlaps with check_overlap = TRUE
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:12,
    tax_vec_length = 1.3, tax_lab_length = 1.4,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, size = 3, fontface = "bold.italic",
      check_overlap = TRUE
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# With labels, you can reduce the padding and line weight to free space
# but check_overlap is not available
# see ?ggtext::geom_richtext for more possibilities
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.35,
    tax_lab_style = tax_lab_style(
      max_angle = 90, fontface = "italic", size = 2.5, fill = "grey95",
      label.size = 0.1, # width outline
      label.padding = unit(0.1, "lines"),
      label.r = unit(0, "lines") # reduces rounding of corners to radius 0
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# Perpendicular angled labels/text are possible
ibd_ord %>%
  ord_plot(
    color = "ibd", plot_taxa = 1:12,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, perpendicular = TRUE, size = 3,
      check_overlap = TRUE
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))


# You can limit and/or attenuate the angle of rotation by:
#  - setting a lower max_angle
#  - decreasing the aspect_ratio in the tax_lab_style call
ibd_ord %>%
  ord_plot(
    shape = "circle", color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.3,
    tax_lab_style = tax_lab_style(
      max_angle = 10, size = 2, label.size = 0.1,
      label.padding = unit(0.1, "lines"), label.r = unit(0, "lines")
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

ibd_ord %>%
  ord_plot(
    shape = "circle", color = "ibd", plot_taxa = 1:7,
    tax_vec_length = 1.3, tax_lab_length = 1.3,
    tax_lab_style = tax_lab_style(
      max_angle = 90, size = 2, label.size = 0.1, aspect_ratio = 0.5,
      label.padding = unit(0.1, "lines"), label.r = unit(0, "lines")
    )
  ) +
  coord_fixed(ratio = 1, clip = "off", xlim = c(-3.5, 3.5))

# another example with some extras #
ibd_ord %>%
  ord_plot(
    shape = "circle filled", fill = "ibd",
    plot_taxa = 1:10,
    taxon_renamer = function(x) stringr::str_replace_all(x, "_", " "),
    tax_vec_length = 2, tax_lab_length = 2.1,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, size = 2.5,
      fontface = "bold.italic", check_overlap = TRUE
    )
  ) +
  coord_fixed(1, clip = "off", xlim = c(-5, 5)) +
  theme(legend.position = c(0.8, 0.2), legend.background = element_rect()) +
  stat_chull(mapping = aes(colour = ibd, fill = ibd), alpha = 0.1)

Check for (and fix) common problems with phyloseq objects

Description

It checks for, and messages about, common uninformative entries in the tax_table, which often cause unwanted results
If there is no sample_data, it creates a sample_data dataframe with the sample_names (as "SAMPLE" variable)
If there is no tax_table, it creates a 1-column tax_table matrix with the taxa_names, and calls the rank "unique"
If remove_undetected = TRUE, it removes taxa where phyloseq::taxa_sums() is equal to zero, with a warning

Usage

phyloseq_validate(
  ps,
  remove_undetected = FALSE,
  min_tax_length = 4,
  verbose = TRUE
)
phyloseq_validate(
  ps,
  remove_undetected = FALSE,
  min_tax_length = 4,
  verbose = TRUE
)

Arguments

`ps`	phyloseq object
`remove_undetected`	if TRUE, removes taxa that sum to zero across all samples
`min_tax_length`	minimum number of characters to not consider a tax_table entry suspiciously short
`verbose`	print informative messages if true

Value

possibly modified phyloseq object

Examples

data(dietswap, package = "microbiome")

# expect warning about taxa summing to zero
phyloseq_validate(dietswap, remove_undetected = TRUE, verbose = TRUE)

# verbose = FALSE will suppress messages and warnings but still:
# replace NULL sample_data and remove taxa that sum to 0 across all samples
# (if remove_undetected = TRUE)
phyloseq_validate(dietswap, verbose = FALSE)

# Sometimes you might have a phyloseq with no sample_data
# This isn't compatible with some microViz functions, like comp_barplot
# So some functions internally use phyloseq_validate to fix this
dietswap@sam_data <- NULL
phyloseq_validate(dietswap)

# Sometimes you might have a phyloseq with no tax_table
# This isn't compatible with some microViz functions, like tax_top,
# so this is another reason to start your analyses with phyloseq_validate!
data("soilrep", package = "phyloseq")
soilrep # has NULL tax_table
phyloseq_validate(soilrep)

# If no messages or warnings are emitted,
# this means no problems were detected, and nothing was changed
# (but only if verbose = TRUE)
data(dietswap, package = "microbiome")

# expect warning about taxa summing to zero
phyloseq_validate(dietswap, remove_undetected = TRUE, verbose = TRUE)

# verbose = FALSE will suppress messages and warnings but still:
# replace NULL sample_data and remove taxa that sum to 0 across all samples
# (if remove_undetected = TRUE)
phyloseq_validate(dietswap, verbose = FALSE)

# Sometimes you might have a phyloseq with no sample_data
# This isn't compatible with some microViz functions, like comp_barplot
# So some functions internally use phyloseq_validate to fix this
dietswap@sam_data <- NULL
phyloseq_validate(dietswap)

# Sometimes you might have a phyloseq with no tax_table
# This isn't compatible with some microViz functions, like tax_top,
# so this is another reason to start your analyses with phyloseq_validate!
data("soilrep", package = "phyloseq")
soilrep # has NULL tax_table
phyloseq_validate(soilrep)

# If no messages or warnings are emitted,
# this means no problems were detected, and nothing was changed
# (but only if verbose = TRUE)

Calculate prevalence from numeric vector

Description

Useful as helper for taxon prevalence calculation

Usage

prev(x, undetected = 0)
prev(x, undetected = 0)

Arguments

`x`	numeric vector (of taxon counts or proportions)
`undetected`	value above which a taxon is considered present or detected

Value

numeric value

Examples

prev(c(0, 0, 1, 2, 4))
prev(c(0, 0, 1, 2, 4), undetected = 1.5)
prev(c(0, 0, 1, 2, 4))
prev(c(0, 0, 1, 2, 4), undetected = 1.5)

Print method for psExtraInfo object

Description

Print method for psExtraInfo object

Usage

## S3 method for class 'psExtraInfo'
print(
  x,
  ...,
  which = c("tax_agg", "tax_trans", "tax_scale", "dist_method", "ord_info")
)
## S3 method for class 'psExtraInfo'
print(
  x,
  ...,
  which = c("tax_agg", "tax_trans", "tax_scale", "dist_method", "ord_info")
)

Arguments

`x`	psExtraInfo object
`...`	ignored
`which`	which elements of psExtraInfo list to print

Arrange samples in phyloseq by sample_data variables or taxon abundance

Description

Uses information in the sample_data or tax_table of phyloseq object to set the order of the samples (sample_data or tax_table specified by .target arg)

Give this function arguments in the same way you would use dplyr::arrange()

Usage

ps_arrange(ps, ..., .target = "sample_data")
ps_arrange(ps, ..., .target = "sample_data")

Arguments

`ps`	phyloseq object
`...`	dots passed directly to dplyr::arrange()
`.target`	arrange samples by "sample_data" variables or "otu_table" taxa abundances

Value

phyloseq

Examples

data("dietswap", package = "microbiome")

dietswap %>%
  ps_arrange(subject, timepoint) %>%
  phyloseq::sample_data() %>%
  head(8)

ps <- dietswap %>% ps_arrange(subject, desc(timepoint))
phyloseq::sample_data(ps) %>% head(8)
phyloseq::otu_table(ps)[1:8, 1:8]

# you can also arrange samples by the abundances of taxa in the otu tables
pst <- dietswap %>% ps_arrange(desc(Akkermansia), .target = "otu_table")
phyloseq::otu_table(pst)[1:8, 1:8]
phyloseq::sample_data(pst) %>% head(8)
data("dietswap", package = "microbiome")

dietswap %>%
  ps_arrange(subject, timepoint) %>%
  phyloseq::sample_data() %>%
  head(8)

ps <- dietswap %>% ps_arrange(subject, desc(timepoint))
phyloseq::sample_data(ps) %>% head(8)
phyloseq::otu_table(ps)[1:8, 1:8]

# you can also arrange samples by the abundances of taxa in the otu tables
pst <- dietswap %>% ps_arrange(desc(Akkermansia), .target = "otu_table")
phyloseq::otu_table(pst)[1:8, 1:8]
phyloseq::sample_data(pst) %>% head(8)

Calculate diversity index and add to phyloseq sample data

Description

Wrapper around microbiome::diversity() function. Takes and returns a phyloseq object. Calculates an alpha diversity index at a given taxonomic rank. Returns phyloseq unaggregated, with an additional variable. Variable name is by default created by pasting the index and rank.

Usage

ps_calc_diversity(
  ps,
  rank,
  index = "shannon",
  exp = FALSE,
  varname = paste0(ifelse(exp, "exp_", ""), paste0(index, "_", rank))
)
ps_calc_diversity(
  ps,
  rank,
  index = "shannon",
  exp = FALSE,
  varname = paste0(ifelse(exp, "exp_", ""), paste0(index, "_", rank))
)

Arguments

`ps`	phyloseq
`rank`	taxonomic rank name, or "unique"
`index`	name of diversity index from microbiome::diversity(). One of: "inverse_simpson", "gini_simpson", "shannon", "fisher", "coverage"
`exp`	exponentiate the result? (i.e. return e^index) - see details
`varname`	name of the variable to be added to phyloseq sample data

Details

Don't filter taxa before calculating alpha diversity.

See the following resources for a discussion of exponentiated diversity indices http://www.loujost.com/Statistics%20and%20Physics/Diversity%20and%20Similarity/Diversity%20of%20a%20single%20community.htm http://www.loujost.com/Statistics%20and%20Physics/Diversity%20and%20Similarity/EffectiveNumberOfSpecies.htm

Value

phyloseq

Examples

data(ibd, package = "microViz")
ibd %>%
  ps_filter(abx == "abx") %>%
  tax_fix() %>%
  ps_calc_diversity("Genus", index = "shannon", exp = TRUE) %>%
  ps_calc_diversity("Family", index = "inverse_simpson") %>%
  tax_transform(rank = "Genus", transform = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(
    colour = "exp_shannon_Genus", size = "inverse_simpson_Family"
  ) +
  ggplot2::scale_colour_viridis_c()

data(ibd, package = "microViz")
ibd %>%
  ps_filter(abx == "abx") %>%
  tax_fix() %>%
  ps_calc_diversity("Genus", index = "shannon", exp = TRUE) %>%
  ps_calc_diversity("Family", index = "inverse_simpson") %>%
  tax_transform(rank = "Genus", transform = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(
    colour = "exp_shannon_Genus", size = "inverse_simpson_Family"
  ) +
  ggplot2::scale_colour_viridis_c()

Calculate dominant taxon in each phyloseq sample

Description

Which taxon is most abundant in each sample of your phyloseq object? This function adds this information as a new variable in your phyloseq sample_data.

If the most abundant taxon is below the proportional abundance threshold, the dominant taxon will be "none" for that sample
If there are more than n_max dominant taxa across all samples (not including "none") the dominant taxon will be "other" for those samples

Usage

ps_calc_dominant(
  ps,
  rank,
  threshold = 0.3,
  n_max = 6,
  var = paste("dominant", rank, sep = "_"),
  none = "none",
  other = "other"
)
ps_calc_dominant(
  ps,
  rank,
  threshold = 0.3,
  n_max = 6,
  var = paste("dominant", rank, sep = "_"),
  none = "none",
  other = "other"
)

Arguments

`ps`	phyloseq object
`rank`	taxonomic rank to calculate dominance at
`threshold`	minimum proportion at which to consider a sample dominated by a taxon
`n_max`	maximum number of taxa that can be listed as dominant taxa
`var`	name of variable to add to phyloseq object sample data
`none`	character value to use when no taxon reaches threshold
`other`	character value to use when another taxon (>n_max) dominates

Details

Thanks to Vitor Heidrich for the idea and a draft implementation

Value

phyloseq object

Examples

library(ggplot2)
ps <- microViz::ibd %>%
  tax_filter(min_prevalence = 3) %>%
  tax_fix() %>%
  phyloseq_validate()

ps %>%
  ps_filter(DiseaseState == "CD") %>%
  ps_calc_dominant(rank = "Genus") %>%
  comp_barplot(tax_level = "Genus", label = "dominant_Genus", n_taxa = 12) +
  coord_flip()

ps %>%
  ps_calc_dominant(rank = "Genus") %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(colour = "dominant_Genus", size = 3, alpha = 0.6) +
  scale_colour_brewer(palette = "Dark2")

# customise function options
ps %>%
  ps_calc_dominant(
    rank = "Family", other = "Other", none = "Not dominated",
    threshold = 0.4, n_max = 3
  ) %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(colour = "dominant_Family", size = 3, alpha = 0.6) +
  scale_colour_manual(values = c(
    Bacteroidaceae = "forestgreen", Lachnospiraceae = "darkblue",
    Ruminococcaceae = "darkorange", Other = "red", "Not dominated" = "grey"
  ))
library(ggplot2)
ps <- microViz::ibd %>%
  tax_filter(min_prevalence = 3) %>%
  tax_fix() %>%
  phyloseq_validate()

ps %>%
  ps_filter(DiseaseState == "CD") %>%
  ps_calc_dominant(rank = "Genus") %>%
  comp_barplot(tax_level = "Genus", label = "dominant_Genus", n_taxa = 12) +
  coord_flip()

ps %>%
  ps_calc_dominant(rank = "Genus") %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(colour = "dominant_Genus", size = 3, alpha = 0.6) +
  scale_colour_brewer(palette = "Dark2")

# customise function options
ps %>%
  ps_calc_dominant(
    rank = "Family", other = "Other", none = "Not dominated",
    threshold = 0.4, n_max = 3
  ) %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(colour = "dominant_Family", size = 3, alpha = 0.6) +
  scale_colour_manual(values = c(
    Bacteroidaceae = "forestgreen", Lachnospiraceae = "darkblue",
    Ruminococcaceae = "darkorange", Other = "red", "Not dominated" = "grey"
  ))

Calculate richness estimate and add to phyloseq sample data

Description

Wrapper around microbiome::richness() function. Takes and returns a phyloseq object. Calculates a richness estimate at a given taxonomic rank. Returns phyloseq unaggregated, with an additional variable. Variable name is by default created by pasting the index and rank.

Usage

ps_calc_richness(
  ps,
  rank,
  index = "observed",
  detection = 0,
  varname = paste0(index, "_", rank)
)
ps_calc_richness(
  ps,
  rank,
  index = "observed",
  detection = 0,
  varname = paste0(index, "_", rank)
)

Arguments

`ps`	phyloseq
`rank`	taxonomic rank name, or "unique"
`index`	"observed" or "chao1" - name of richness estimate from microbiome::richness()
`detection`	Detection threshold. Used for the "observed" index.
`varname`	name of the variable to be added to phyloseq sample data

Details

Don't filter taxa before calculating richness.

These richness indices are estimates. For a discussion of the uncertainty and bias of these estimates see e.g. work by Amy Willis https://doi.org/10.3389/fmicb.2019.02407

Value

phyloseq

Examples

data(ibd, package = "microViz")
ibd %>%
  ps_filter(abx == "abx") %>%
  tax_fix() %>%
  ps_calc_richness("Genus", index = "observed") %>%
  ps_calc_richness("Family", index = "chao1") %>%
  tax_transform(rank = "Genus", transform = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(
    colour = "observed_Genus", size = "chao1_Family"
  ) +
  ggplot2::scale_colour_viridis_c()

data(ibd, package = "microViz")
ibd %>%
  ps_filter(abx == "abx") %>%
  tax_fix() %>%
  ps_calc_richness("Genus", index = "observed") %>%
  ps_calc_richness("Family", index = "chao1") %>%
  tax_transform(rank = "Genus", transform = "clr") %>%
  ord_calc("PCA") %>%
  ord_plot(
    colour = "observed_Genus", size = "chao1_Family"
  ) +
  ggplot2::scale_colour_viridis_c()

De-duplicate phyloseq samples

Description

Use one or more variables in the sample_data to identify and remove duplicate samples (leaving one sample per group).

methods:

method = "readcount" keeps the one sample in each duplicate group with the highest total number of reads (phyloseq::sample_sums)
method = "first" keeps the first sample in each duplicate group encountered in the row order of the sample_data
method = "last" keeps the last sample in each duplicate group encountered in the row order of the sample_data
method = "random" keeps a random sample from each duplicate group (set.seed for reproducibility)

More than one "duplicate" sample can be kept per group by setting n samples > 1.

Usage

ps_dedupe(
  ps,
  vars,
  method = "readcount",
  verbose = TRUE,
  n = 1,
  .keep_group_var = FALSE,
  .keep_readcount = FALSE,
  .message_IDs = FALSE,
  .label_only = FALSE,
  .keep_all_taxa = FALSE
)
ps_dedupe(
  ps,
  vars,
  method = "readcount",
  verbose = TRUE,
  n = 1,
  .keep_group_var = FALSE,
  .keep_readcount = FALSE,
  .message_IDs = FALSE,
  .label_only = FALSE,
  .keep_all_taxa = FALSE
)

Arguments

`ps`	phyloseq object
`vars`	names of variables, whose (combined) levels identify groups from which only 1 sample is desired
`method`	keep sample with max "readcount" or the "first" or "last" or "random" samples encountered in given sample_data order for each duplicate group
`verbose`	message about number of groups, and number of samples dropped?
`n`	number of 'duplicates' to keep per group, defaults to 1
`.keep_group_var`	keep grouping variable .GROUP. in phyloseq object?
`.keep_readcount`	keep readcount variable .READCOUNT. in phyloseq object?
`.message_IDs`	message sample names of dropped variables?
`.label_only`	if TRUE, the samples will NOT be filtered, just labelled with a new logical variable .KEEP_SAMPLE.
`.keep_all_taxa`	keep all taxa after removing duplicates? If FALSE, the default, taxa are removed if they never occur in any of the retained samples

Details

What happens when duplicated samples have exactly equal readcounts in method = "readcount"? The first encountered maximum is kept (in sample_data row order, like method = "first")

Value

phyloseq object

Examples

data("dietswap", package = "microbiome")

dietswap
# let's pretend the dietswap data contains technical replicates from each subject
# we want to keep only one of them
ps_dedupe(dietswap, vars = "subject", method = "readcount", verbose = TRUE)

# contrived example to show identifying "duplicates" via the interaction of multiple columns
ps1 <- ps_dedupe(
  ps = dietswap, method = "readcount", verbose = TRUE,
  vars = c("timepoint", "group", "bmi_group")
)
phyloseq::sample_data(ps1)

ps2 <- ps_dedupe(
  ps = dietswap, method = "first", verbose = TRUE,
  vars = c("timepoint", "group", "bmi_group")
)
phyloseq::sample_data(ps2)
data("dietswap", package = "microbiome")

dietswap
# let's pretend the dietswap data contains technical replicates from each subject
# we want to keep only one of them
ps_dedupe(dietswap, vars = "subject", method = "readcount", verbose = TRUE)

# contrived example to show identifying "duplicates" via the interaction of multiple columns
ps1 <- ps_dedupe(
  ps = dietswap, method = "readcount", verbose = TRUE,
  vars = c("timepoint", "group", "bmi_group")
)
phyloseq::sample_data(ps1)

ps2 <- ps_dedupe(
  ps = dietswap, method = "first", verbose = TRUE,
  vars = c("timepoint", "group", "bmi_group")
)
phyloseq::sample_data(ps2)

Deselect phyloseq samples with sample_data missings

Description

Check phyloseq object sample_data for missing values (NAs)

specify which variables to check with vars argument, or check all
drop samples with any missings

Usage

ps_drop_incomplete(ps, vars = NA, verbose = FALSE)
ps_drop_incomplete(ps, vars = NA, verbose = FALSE)

Arguments

`ps`	phyloseq with sample_data
`vars`	vector of variable names to check for missings (or NA, which uses all variables in sample data)
`verbose`	message about number of samples dropped if verbose not FALSE, (and only if > 0 samples dropped) and message about number of missing per variable in vars if verbose = "max" (and message even if 0 samples dropped)

Details

This is a wrapper for stats::complete.cases function.

Value

phyloseq

Examples

library(phyloseq)
data("enterotype", package = "phyloseq")

enterotype
ps_drop_incomplete(enterotype)
ps_drop_incomplete(enterotype, vars = "Enterotype", verbose = TRUE)
ps_drop_incomplete(enterotype, vars = "Sample_ID", verbose = TRUE)
ps_drop_incomplete(enterotype, vars = c("Enterotype", "Sample_ID"))
ps_drop_incomplete(enterotype, verbose = "max")
library(phyloseq)
data("enterotype", package = "phyloseq")

enterotype
ps_drop_incomplete(enterotype)
ps_drop_incomplete(enterotype, vars = "Enterotype", verbose = TRUE)
ps_drop_incomplete(enterotype, vars = "Sample_ID", verbose = TRUE)
ps_drop_incomplete(enterotype, vars = c("Enterotype", "Sample_ID"))
ps_drop_incomplete(enterotype, verbose = "max")

Filter phyloseq samples by sample_data variables

Description

Keep only samples with sample_data matching one or more conditions. By default this function also removes taxa which never appear in any of the remaining samples, by running tax_filter(min_prevalence = 1). You can prevent this taxa filtering with .keep_all_taxa = TRUE.

Usage

ps_filter(ps, ..., .target = "sample_data", .keep_all_taxa = FALSE)
ps_filter(ps, ..., .target = "sample_data", .keep_all_taxa = FALSE)

Arguments

`ps`	phyloseq object
`...`	passed directly to dplyr::filter (see examples and ?dplyr::filter)
`.target`	which slot of phyloseq to use for filtering by, currently only "sample_data" supported
`.keep_all_taxa`	if FALSE (the default), remove taxa which are no longer present in the dataset after filtering

Details

Use ps_filter as you would use use dplyr::filter(), but with a phyloseq object!

Value

phyloseq object (with filtered sample_data)

Examples

library(phyloseq)
library(dplyr)

data("enterotype", package = "phyloseq")
enterotype
sample_data(enterotype)[1:10, 1:5]

# keep only samples with seqtech not equal to sanger
ps1 <- ps_filter(enterotype, SeqTech != "Sanger")
ps1
sample_data(ps1)[1:10, 1:5]

# keep only samples with no NAs in any variables
ps2 <- enterotype %>% ps_filter(!if_any(everything(), is.na))
ps2
sample_data(ps2)[1:8, 1:8]

# ps2 is equivalent to dropping samples with incomplete sample_variables and tax_filtering 0s
ps3 <- enterotype %>%
  ps_drop_incomplete() %>%
  tax_filter(undetected = 0, use_counts = FALSE)
# we needed to set a low detection threshold because this example data is proportions
identical(ps2, ps3) # TRUE

# function will give warning if some of the otu_values are negative
# (which may happen when filtering data that has e.g. clr-transformed taxa abundances)
# as it attempts to discard any taxa that become always absent/0 after filtering (by default)
# set .keep_all_taxa = TRUE to avoid this filtering behaviour, which is unwanted in this case
enterotype %>%
  tax_transform("clr") %>%
  ps_get() %>%
  ps_filter(SeqTech == "Sanger", .keep_all_taxa = TRUE)
library(phyloseq)
library(dplyr)

data("enterotype", package = "phyloseq")
enterotype
sample_data(enterotype)[1:10, 1:5]

# keep only samples with seqtech not equal to sanger
ps1 <- ps_filter(enterotype, SeqTech != "Sanger")
ps1
sample_data(ps1)[1:10, 1:5]

# keep only samples with no NAs in any variables
ps2 <- enterotype %>% ps_filter(!if_any(everything(), is.na))
ps2
sample_data(ps2)[1:8, 1:8]

# ps2 is equivalent to dropping samples with incomplete sample_variables and tax_filtering 0s
ps3 <- enterotype %>%
  ps_drop_incomplete() %>%
  tax_filter(undetected = 0, use_counts = FALSE)
# we needed to set a low detection threshold because this example data is proportions
identical(ps2, ps3) # TRUE

# function will give warning if some of the otu_values are negative
# (which may happen when filtering data that has e.g. clr-transformed taxa abundances)
# as it attempts to discard any taxa that become always absent/0 after filtering (by default)
# set .keep_all_taxa = TRUE to avoid this filtering behaviour, which is unwanted in this case
enterotype %>%
  tax_transform("clr") %>%
  ps_get() %>%
  ps_filter(SeqTech == "Sanger", .keep_all_taxa = TRUE)

Join a dataframe to phyloseq sample data

Description

You can use most types of join from the dplyr::*_join function family, including e.g. "inner", "left", "semi", "anti" (see details below). Defaults to type = "left" which calls left_join(), this supports x as a phyloseq and y as a dataframe. Most of the time you'll want "left" (adds variables with no sample filtering), or "inner" (adds variables and filters samples). This function simply:

extracts the sample_data from the phyloseq as a dataframe
performs the chosen type of join (with the given arguments)
filters the phyloseq if type = inner, semi or anti
reattaches the modified sample_data to the phyloseq and returns the phyloseq

Usage

ps_join(
  x,
  y,
  by = NULL,
  match_sample_names = NULL,
  keep_sample_name_col = TRUE,
  sample_name_natural_join = FALSE,
  type = "left",
  .keep_all_taxa = FALSE
)
ps_join(
  x,
  y,
  by = NULL,
  match_sample_names = NULL,
  keep_sample_name_col = TRUE,
  sample_name_natural_join = FALSE,
  type = "left",
  .keep_all_taxa = FALSE
)

Arguments

`x`	phyloseq (or dataframe)
`y`	dataframe (or phyloseq for e.g. type = "right")
`by`	A character vector of variables to join by (col must be present in both x and y or paired via a named vector like c("xname" = "yname", etc.))
`match_sample_names`	match against the phyloseq sample_names by naming a variable in the additional dataframe (this is in addition to any variables named in by)
`keep_sample_name_col`	should the column named in match_sample_names be kept in the returned phyloseq's sample_data? (only relevant if match_sample_names is not NULL)
`sample_name_natural_join`	if TRUE, use sample_name AND all shared colnames to match rows (only relevant if match_sample_names is not NULL, this arg takes precedence over anything also entered in `by` arg)
`type`	name of type of join e.g. "left", "right", "inner", "semi" (see dplyr help pages)
`.keep_all_taxa`	if FALSE (the default), remove taxa which are no longer present in the dataset after filtering

Details

Mutating joins, which will add columns from a dataframe to phyloseq sample data, matching rows based on the key columns named in the by argument:

"inner": includes all rows in present in both x and y.
"left": includes all rows in x. (so x must be the phyloseq)
"right": includes all rows in y. (so y must be the phyloseq)
"full": includes all rows present in x or y. (will likely NOT work, as additional rows cannot be added to sample_data!)

If a row in x matches multiple rows in y (based on variables named in the by argument), all the rows in y will be added once for each matching row in x. This will cause this function to fail, as additional rows cannot be added to the phyloseq sample_data!

Filtering joins filter rows from x based on the presence or absence of matches in y:

"semi": return all rows from x with a match in y.
"anti": return all rows from x without a match in y.

Value

phyloseq with modified sample_data (and possibly filtered)

Examples

library(phyloseq)
data("enterotype", package = "phyloseq")

x <- enterotype
y <- data.frame(
  ID_var = sample_names(enterotype)[c(1:50, 101:150)],
  SeqTech = sample_data(enterotype)[c(1:50, 101:150), "SeqTech"],
  arbitrary_info = rep(c("A", "B"), 50)
)

# simply match the new data to samples that exist in x, as default is a left_join
# where some sample names of x are expected to match variable ID_var in dataframe y
out1A <- ps_join(x = x, y = y, match_sample_names = "ID_var")
out1A
sample_data(out1A)[1:6, ]


# use sample_name and all shared variables to join
# (a natural join is not a type of join per se,
# but it indicates that all shared variables should be used for matching)
out1B <- ps_join(
  x = x, y = y, match_sample_names = "ID_var",
  sample_name_natural_join = TRUE, keep_sample_name_col = FALSE
)
out1B
sample_data(out1B)[1:6, ]

# if you only want to keep phyloseq samples that exist in the new data, try an inner join
# this will add the new variables AND filter the phyloseq
# this example matches sample names to ID_var and by matching the shared SeqTech variable
out1C <- ps_join(x = x, y = y, type = "inner", by = "SeqTech", match_sample_names = "ID_var")
out1C
sample_data(out1C)[1:6, ]

# the id variable is named Sample_ID in x and ID_var in y
# semi_join is only a filtering join (doesn't add new variables but just filters samples in x)
out2A <- ps_join(x = x, y = y, by = c("Sample_ID" = "ID_var"), type = "semi")
out2A
sample_data(out2A)[1:6, ]

# anti_join is another type of filtering join
out2B <- ps_join(x = x, y = y, by = c("Sample_ID" = "ID_var"), type = "anti")
out2B
sample_data(out2B)[1:6, ]

# semi and anti joins keep opposite sets of samples
intersect(sample_names(out2A), sample_names(out2B))

# you can mix and match named and unnamed values in the `by` vector
# inner is like a combination of left join and semi join
out3 <- ps_join(x = x, y = y, by = c("Sample_ID" = "ID_var", "SeqTech"), type = "inner")
out3
sample_data(out3)[1:6, ]
library(phyloseq)
data("enterotype", package = "phyloseq")

x <- enterotype
y <- data.frame(
  ID_var = sample_names(enterotype)[c(1:50, 101:150)],
  SeqTech = sample_data(enterotype)[c(1:50, 101:150), "SeqTech"],
  arbitrary_info = rep(c("A", "B"), 50)
)

# simply match the new data to samples that exist in x, as default is a left_join
# where some sample names of x are expected to match variable ID_var in dataframe y
out1A <- ps_join(x = x, y = y, match_sample_names = "ID_var")
out1A
sample_data(out1A)[1:6, ]


# use sample_name and all shared variables to join
# (a natural join is not a type of join per se,
# but it indicates that all shared variables should be used for matching)
out1B <- ps_join(
  x = x, y = y, match_sample_names = "ID_var",
  sample_name_natural_join = TRUE, keep_sample_name_col = FALSE
)
out1B
sample_data(out1B)[1:6, ]

# if you only want to keep phyloseq samples that exist in the new data, try an inner join
# this will add the new variables AND filter the phyloseq
# this example matches sample names to ID_var and by matching the shared SeqTech variable
out1C <- ps_join(x = x, y = y, type = "inner", by = "SeqTech", match_sample_names = "ID_var")
out1C
sample_data(out1C)[1:6, ]

# the id variable is named Sample_ID in x and ID_var in y
# semi_join is only a filtering join (doesn't add new variables but just filters samples in x)
out2A <- ps_join(x = x, y = y, by = c("Sample_ID" = "ID_var"), type = "semi")
out2A
sample_data(out2A)[1:6, ]

# anti_join is another type of filtering join
out2B <- ps_join(x = x, y = y, by = c("Sample_ID" = "ID_var"), type = "anti")
out2B
sample_data(out2B)[1:6, ]

# semi and anti joins keep opposite sets of samples
intersect(sample_names(out2A), sample_names(out2B))

# you can mix and match named and unnamed values in the `by` vector
# inner is like a combination of left join and semi join
out3 <- ps_join(x = x, y = y, by = c("Sample_ID" = "ID_var", "SeqTech"), type = "inner")
out3
sample_data(out3)[1:6, ]

Melt phyloseq data object into large data.frame (tibble)

Description

The ps_melt function is a specialized melt function for melting phyloseq objects (instances of the phyloseq class), usually for producing graphics with ggplot2. The naming conventions used in downstream phyloseq graphics functions have reserved the following variable names that should not be used as the names of sample_variables or taxonomic rank_names. These reserved names are c("Sample", "Abundance", "OTU"). Also, you should not have identical names for sample variables and taxonomic ranks. That is, the intersection of the output of the following two functions sample_variables, rank_names should be an empty vector (e.g. intersect(sample_variables(ps), rank_names(ps))). All of these potential name collisions are checked-for and renamed automatically with a warning. However, if you (re)name your variables accordingly ahead of time, it will reduce confusion and eliminate the warnings.

NOTE: Code and documentation copied (and very slightly modified) from an old version of speedyseq by Michael McLaren. speedyseq reimplements psmelt from phyloseq to use functions from the tidyr and dplyr packages. The name in microViz is changed to ps_melt for consistency with other functions.

Usage

ps_melt(ps)
ps_melt(ps)

Arguments

`ps`	(Required). An `otu_table-class` or `phyloseq-class`. Function most useful for phyloseq-class.

Details

Note that “melted” phyloseq data is stored much less efficiently, and so RAM storage issues could arise with a smaller dataset (smaller number of samples/OTUs/variables) than one might otherwise expect. For common sizes of graphics-ready datasets, however, this should not be a problem. Because the number of OTU entries has a large effect on the RAM requirement, methods to reduce the number of separate OTU entries – for instance by agglomerating OTUs based on phylogenetic distance using tip_glom – can help alleviate RAM usage problems. This function is made user-accessible for flexibility, but is also used extensively by plot functions in phyloseq.

Value

A tibble class data frame.

Examples

library(ggplot2)
library(phyloseq)
data("GlobalPatterns")
gp_ch <- subset_taxa(GlobalPatterns, Phylum == "Chlamydiae")
mdf <- ps_melt(gp_ch)
mdf2 <- psmelt(gp_ch) # slower
# same dataframe, except with somewhat different row orders
dplyr::all_equal(tibble::as_tibble(mdf), mdf2, convert = TRUE) # TRUE
nrow(mdf2)
ncol(mdf)
colnames(mdf)
head(rownames(mdf))
p <- ggplot(mdf, aes(x = SampleType, y = Abundance, fill = Genus))
p <- p + geom_bar(color = "black", stat = "identity", position = "stack")
# This example plot doesn't make any sense
print(p + coord_flip())
# TODO replace this...
library(ggplot2)
library(phyloseq)
data("GlobalPatterns")
gp_ch <- subset_taxa(GlobalPatterns, Phylum == "Chlamydiae")
mdf <- ps_melt(gp_ch)
mdf2 <- psmelt(gp_ch) # slower
# same dataframe, except with somewhat different row orders
dplyr::all_equal(tibble::as_tibble(mdf), mdf2, convert = TRUE) # TRUE
nrow(mdf2)
ncol(mdf)
colnames(mdf)
head(rownames(mdf))
p <- ggplot(mdf, aes(x = SampleType, y = Abundance, fill = Genus))
p <- p + geom_bar(color = "black", stat = "identity", position = "stack")
# This example plot doesn't make any sense
print(p + coord_flip())
# TODO replace this...

Modify or compute new sample_data in phyloseq object

Description

Add or compute new phyloseq sample_data variables. Uses dplyr::mutate() syntax.

Usage

ps_mutate(ps, ...)
ps_mutate(ps, ...)

Arguments

`ps`	phyloseq object with sample data
`...`	passed straight to dplyr::mutate (see examples and dplyr::mutate help)

Value

phyloseq object with modified sample_data

Examples

library(phyloseq)
library(dplyr)
data("enterotype")

sample_data(enterotype)[1:10, ]

months_in_year <- 12
ps <- enterotype %>%
  ps_mutate(
    age_months = Age * months_in_year,
    IDs_match = as.character(Sample_ID) == as.character(SampleID),
    placeholder = "Word"
  )

sample_data(ps)[1:10, ]

# Using the dplyr::across functionality is also possible
ps <- ps %>%
  ps_mutate(
    dplyr::across(where(is.factor), toupper),
    another_var = TRUE,
    SeqTech = NULL # deletes SeqTech variable
  )

head(sample_data(ps))
library(phyloseq)
library(dplyr)
data("enterotype")

sample_data(enterotype)[1:10, ]

months_in_year <- 12
ps <- enterotype %>%
  ps_mutate(
    age_months = Age * months_in_year,
    IDs_match = as.character(Sample_ID) == as.character(SampleID),
    placeholder = "Word"
  )

sample_data(ps)[1:10, ]

# Using the dplyr::across functionality is also possible
ps <- ps %>%
  ps_mutate(
    dplyr::across(where(is.factor), toupper),
    another_var = TRUE,
    SeqTech = NULL # deletes SeqTech variable
  )

head(sample_data(ps))

Copy phyloseq otu_table data to sample_data

Description

Copy phyloseq otu_table data to sample_data

Usage

ps_otu2samdat(ps, taxa = NULL)
ps_otu2samdat(ps, taxa = NULL)

Arguments

`ps`	phyloseq with sample_data
`taxa`	list of taxa_names to copy to sample_data, or NULL (which selects all with `phyloseq::taxa_names()`)

Value

phyloseq with augmented sample_data

Examples

library(phyloseq)
data("dietswap", package = "microbiome")

ps <- dietswap %>% ps_otu2samdat("Akkermansia")
sample_variables(ps)

# or if you do not specify any taxa, all are copied
ps_all <- dietswap %>% ps_otu2samdat()
sample_variables(ps_all)[1:15]

# this could be useful for colouring ordination plots, for example
ps %>%
  ps_mutate(log_akkermansia = log(Akkermansia)) %>%
  dist_calc("bray") %>%
  ord_calc(method = "PCoA") %>%
  ord_plot(
    colour = "log_akkermansia",
    size = 3, shape = "nationality"
  )
library(phyloseq)
data("dietswap", package = "microbiome")

ps <- dietswap %>% ps_otu2samdat("Akkermansia")
sample_variables(ps)

# or if you do not specify any taxa, all are copied
ps_all <- dietswap %>% ps_otu2samdat()
sample_variables(ps_all)[1:15]

# this could be useful for colouring ordination plots, for example
ps %>%
  ps_mutate(log_akkermansia = log(Akkermansia)) %>%
  dist_calc("bray") %>%
  ord_calc(method = "PCoA") %>%
  ord_plot(
    colour = "log_akkermansia",
    size = 3, shape = "nationality"
  )

Set order of samples in phyloseq object

Description

Manually set order of samples by specifying samples names in desired order.

Usage

ps_reorder(ps, sample_order)
ps_reorder(ps, sample_order)

Arguments

`ps`	phyloseq
`sample_order`	names or current numerical indices of samples in desired order

Details

Ordering of samples in a phyloseq is controlled from the otu_table slot!

Value

phyloseq

Examples

library(phyloseq)
data("dietswap", package = "microbiome")

dietswap %>%
  sample_data() %>%
  head(8)

new_order <- rev(sample_names(dietswap))
dietswap %>%
  ps_reorder(new_order) %>%
  sample_data() %>%
  head(8)

# random ordering with numbers
set.seed(1000)
random_order <- sample(1:nsamples(dietswap))
dietswap %>%
  ps_reorder(random_order) %>%
  sample_data() %>%
  head(8)
library(phyloseq)
data("dietswap", package = "microbiome")

dietswap %>%
  sample_data() %>%
  head(8)

new_order <- rev(sample_names(dietswap))
dietswap %>%
  ps_reorder(new_order) %>%
  sample_data() %>%
  head(8)

# random ordering with numbers
set.seed(1000)
random_order <- sample(1:nsamples(dietswap))
dietswap %>%
  ps_reorder(random_order) %>%
  sample_data() %>%
  head(8)

Select phyloseq sample_data using dplyr::select syntax

Description

Simple selection of phyloseq sample_data variables, might be useful for printing reduced sample_data, or inside other functions

Usage

ps_select(ps, ...)
ps_select(ps, ...)

Arguments

`ps`	phyloseq with sample_data
`...`	passed straight to dplyr::select

Value

phyloseq object

Examples

library(phyloseq)
library(dplyr)
data("enterotype", package = "phyloseq")

head(sample_data(enterotype))

enterotype %>%
  ps_select(!contains("Sample")) %>%
  sample_data() %>%
  head()
library(phyloseq)
library(dplyr)
data("enterotype", package = "phyloseq")

head(sample_data(enterotype))

enterotype %>%
  ps_select(!contains("Sample")) %>%
  sample_data() %>%
  head()

Arrange samples in a phyloseq by microbiome similarity

Description

Uses seriation methods from seriation::seriate and often dist_calc (depending on if seriation method requires a distance matrix)

Usage

ps_seriate(
  ps,
  method = "OLO_ward",
  dist = "bray",
  tax_transform = "identity",
  add_variable = FALSE,
  rank = NA
)
ps_seriate(
  ps,
  method = "OLO_ward",
  dist = "bray",
  tax_transform = "identity",
  add_variable = FALSE,
  rank = NA
)

Arguments

`ps`	phyloseq object
`method`	seriation method for ordering samples, from seriation::seriate
`dist`	distance method for dist_calc (only used if required for particular seriation method!)
`tax_transform`	transformation to apply before seriation or any distance calculation
`add_variable`	add a variable to the sample data indicating seriation order
`rank`	taxonomic rank to aggregate at, before seriation, NA for no aggregation

Value

phyloseq

Examples

library(phyloseq)
data("dietswap", package = "microbiome")

dietswap %>%
  sample_data() %>%
  head(8)

dietswap %>%
  tax_agg("Genus") %>%
  ps_get() %>%
  ps_seriate(method = "OLO_ward", dist = "bray") %>%
  sample_data() %>%
  head(8)
library(phyloseq)
data("dietswap", package = "microbiome")

dietswap %>%
  sample_data() %>%
  head(8)

dietswap %>%
  tax_agg("Genus") %>%
  ps_get() %>%
  ps_seriate(method = "OLO_ward", dist = "bray") %>%
  sample_data() %>%
  head(8)

Sort phyloseq samples by ordination axes scores

Description

ps_sort_ord reorders samples in a phyloseq object based on their relative position on 1 or 2 ordination axes.

ord_order_samples gets the sample_names in order from the ordination contained in a psExtra. This is used internally by ps_sort_ord

If 2 axes given, the samples are sorted by anticlockwise rotation around the selected ordination axes, starting on first axis given, upper right quadrant. (This is used by ord_plot_iris.)

If 1 axis is given, samples are sorted by increasing value order along this axis. This could be used to arrange samples on a rectangular barplot in order of appearance along a parallel axis of a paired ordination plot.

Usage

ps_sort_ord(ps, ord, axes = 1:2, scaling = 2)

ord_order_samples(ord, axes = 1:2, scaling = 2)
ps_sort_ord(ps, ord, axes = 1:2, scaling = 2)

ord_order_samples(ord, axes = 1:2, scaling = 2)

Arguments

`ps`	phyloseq object to be sorted
`ord`	psExtra with ordination object
`axes`	which axes to use for sorting? numerical vector of length 1 or 2
`scaling`	Type 2, or type 1 scaling. For more info, see https://sites.google.com/site/mb3gustame/constrained-analyses/redundancy-analysis. Either "species" or "site" scores are scaled by (proportional) eigenvalues, and the other set of scores is left unscaled (from ?vegan::scores.cca)

Value

ps_sort_ord returns a phyloseq

ord_order_samples returns a character vector

Examples

# attach other necessary packages
library(ggplot2)

# example data
ibd <- microViz::ibd %>%
  tax_filter(min_prevalence = 2) %>%
  tax_fix() %>%
  phyloseq_validate()

# create numeric variables for constrained ordination
ibd <- ibd %>%
  ps_mutate(
    ibd = as.numeric(ibd == "ibd"),
    steroids = as.numeric(steroids == "steroids"),
    abx = as.numeric(abx == "abx"),
    female = as.numeric(gender == "female"),
    # and make a shorter ID variable
    id = stringr::str_remove_all(sample, "^[0]{1,2}|-[A-Z]+")
  )

# create an ordination
ordi <- ibd %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc()

ord_order_samples(ordi, axes = 1) %>% head(8)
ps_sort_ord(ibd, ordi, axes = 1) %>%
  phyloseq::sample_names() %>%
  head(8)

p1 <- ord_plot(ordi, colour = "grey90", plot_taxa = 1:8, tax_vec_length = 1) +
  geom_text(aes(label = id), size = 2.5, colour = "red")

b1 <- ibd %>%
  ps_sort_ord(ord = ordi, axes = 1) %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, label = "id",
    order_taxa = ord_order_taxa(ordi, axes = 1),
    sample_order = "asis"
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

patchwork::wrap_plots(p1, b1, ncol = 1)

# constrained ordination example (and match vertical axis) #
cordi <- ibd %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(
    constraints = c("steroids", "abx", "ibd"), conditions = "female",
    scale_cc = FALSE
  )

cordi %>% ord_plot(plot_taxa = 1:6, axes = 2:1)
# attach other necessary packages
library(ggplot2)

# example data
ibd <- microViz::ibd %>%
  tax_filter(min_prevalence = 2) %>%
  tax_fix() %>%
  phyloseq_validate()

# create numeric variables for constrained ordination
ibd <- ibd %>%
  ps_mutate(
    ibd = as.numeric(ibd == "ibd"),
    steroids = as.numeric(steroids == "steroids"),
    abx = as.numeric(abx == "abx"),
    female = as.numeric(gender == "female"),
    # and make a shorter ID variable
    id = stringr::str_remove_all(sample, "^[0]{1,2}|-[A-Z]+")
  )

# create an ordination
ordi <- ibd %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc()

ord_order_samples(ordi, axes = 1) %>% head(8)
ps_sort_ord(ibd, ordi, axes = 1) %>%
  phyloseq::sample_names() %>%
  head(8)

p1 <- ord_plot(ordi, colour = "grey90", plot_taxa = 1:8, tax_vec_length = 1) +
  geom_text(aes(label = id), size = 2.5, colour = "red")

b1 <- ibd %>%
  ps_sort_ord(ord = ordi, axes = 1) %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, label = "id",
    order_taxa = ord_order_taxa(ordi, axes = 1),
    sample_order = "asis"
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

patchwork::wrap_plots(p1, b1, ncol = 1)

# constrained ordination example (and match vertical axis) #
cordi <- ibd %>%
  tax_transform("clr", rank = "Genus") %>%
  ord_calc(
    constraints = c("steroids", "abx", "ibd"), conditions = "female",
    scale_cc = FALSE
  )

cordi %>% ord_plot(plot_taxa = 1:6, axes = 2:1)

Extract elements from psExtra class

Description

ps_get returns phyloseq
info_get returns psExtraInfo object
dist_get returns distance matrix (or NULL)
ord_get returns ordination object (or NULL)
perm_get returns adonis2() permanova model (or NULL)
bdisp_get returns results of betadisper() (or NULL)
otu_get returns phyloseq otu_table matrix with taxa as columns
tt_get returns phyloseq tax_table
tax_models_get returns list generated by tax_model or NULL
tax_stats_get returns dataframe generated by tax_models2stats or NULL
taxatree_models_get returns list generated by taxatree_models or NULL
taxatree_stats_get returns dataframe generated by taxatree_models2stats or NULL
samdat_tbl returns phyloseq sample_data as a tibble with sample_names as new first column called .sample_name

Usage

ps_get(psExtra, ps_extra, counts = FALSE, warn = TRUE)

dist_get(psExtra, ps_extra)

ord_get(psExtra, ps_extra)

info_get(psExtra, ps_extra)

perm_get(psExtra, ps_extra)

bdisp_get(psExtra, ps_extra)

tax_models_get(psExtra)

tax_stats_get(psExtra)

taxatree_models_get(psExtra)

taxatree_stats_get(psExtra)

otu_get(data, taxa = NA, samples = NA, counts = FALSE, warn = TRUE)

tt_get(data)

samdat_tbl(data, sample_names_col = ".sample_name")
ps_get(psExtra, ps_extra, counts = FALSE, warn = TRUE)

dist_get(psExtra, ps_extra)

ord_get(psExtra, ps_extra)

info_get(psExtra, ps_extra)

perm_get(psExtra, ps_extra)

bdisp_get(psExtra, ps_extra)

tax_models_get(psExtra)

tax_stats_get(psExtra)

taxatree_models_get(psExtra)

taxatree_stats_get(psExtra)

otu_get(data, taxa = NA, samples = NA, counts = FALSE, warn = TRUE)

tt_get(data)

samdat_tbl(data, sample_names_col = ".sample_name")

Arguments

`psExtra`	psExtra S4 class object
`ps_extra`	deprecated! don't use this
`counts`	should ps_get or otu_get attempt to return counts? if present in object
`warn`	if counts = TRUE, should a warning be emitted if counts are not available? set warn = "error" to stop if counts are not available
`data`	phyloseq or psExtra
`taxa`	subset of taxa to return, NA for all (default)
`samples`	subset of samples to return, NA for all (default)
`sample_names_col`	name of column where sample_names are put. if NA, return data.frame with rownames (sample_names)

Value

element(s) from psExtra object (or NULL)

Examples

data("dietswap", package = "microbiome")

psx <- tax_transform(dietswap, "compositional", rank = "Genus")

psx

ps_get(psx)

ps_get(psx, counts = TRUE)

info_get(psx)

dist_get(psx) # this psExtra has no dist_calc result

ord_get(psx) # this psExtra has no ord_calc result

perm_get(psx) # this psExtra has no dist_permanova result

bdisp_get(psx) # this psExtra has no dist_bdisp result

# these can be returned from phyloseq objects too
otu_get(psx, taxa = 6:9, samples = c("Sample-9", "Sample-1", "Sample-6"))

otu_get(psx, taxa = 6:9, samples = c(9, 1, 6), counts = TRUE)

tt_get(psx) %>% head()

samdat_tbl(psx)

samdat_tbl(psx, sample_names_col = "SAMPLENAME")
data("dietswap", package = "microbiome")

psx <- tax_transform(dietswap, "compositional", rank = "Genus")

psx

ps_get(psx)

ps_get(psx, counts = TRUE)

info_get(psx)

dist_get(psx) # this psExtra has no dist_calc result

ord_get(psx) # this psExtra has no ord_calc result

perm_get(psx) # this psExtra has no dist_permanova result

bdisp_get(psx) # this psExtra has no dist_bdisp result

# these can be returned from phyloseq objects too
otu_get(psx, taxa = 6:9, samples = c("Sample-9", "Sample-1", "Sample-6"))

otu_get(psx, taxa = 6:9, samples = c(9, 1, 6), counts = TRUE)

tt_get(psx) %>% head()

samdat_tbl(psx)

samdat_tbl(psx, sample_names_col = "SAMPLENAME")

Define psExtra class S4 object

Description

Define psExtra class S4 object

Slots

info: list.
counts: otu_table.
dist: dist.
ord: ANY.
permanova: ANY.
bdisp: ANY.
taxatree_models: list.
taxatree_stats: data.frame.
tax_models: list.
tax_stats: data.frame.

Examples

library(phyloseq)
data("shao19")

ps <- shao19 %>% ps_filter(infant_age == 12)
ps %>% tax_agg("genus")
library(phyloseq)
data("shao19")

ps <- shao19 %>% ps_filter(infant_age == 12)
ps %>% tax_agg("genus")

Helper to specify a HeatmapAnnotation for samples in comp_heatmap

Description

Helper to specify a HeatmapAnnotation for samples in comp_heatmap

Usage

sampleAnnotation(
  ...,
  name,
  annotation_legend_param = list(),
  show_legend = TRUE,
  gp = grid::gpar(col = NA),
  border = FALSE,
  gap = grid::unit(2, "mm"),
  show_annotation_name = TRUE,
  annotation_label = NULL,
  annotation_name_gp = grid::gpar(),
  annotation_name_offset = NULL,
  annotation_name_rot = NULL,
  annotation_name_align = FALSE,
  annotation_name_side = "auto",
  .data = NULL,
  .samples = NULL,
  .side = NULL
)
sampleAnnotation(
  ...,
  name,
  annotation_legend_param = list(),
  show_legend = TRUE,
  gp = grid::gpar(col = NA),
  border = FALSE,
  gap = grid::unit(2, "mm"),
  show_annotation_name = TRUE,
  annotation_label = NULL,
  annotation_name_gp = grid::gpar(),
  annotation_name_offset = NULL,
  annotation_name_rot = NULL,
  annotation_name_align = FALSE,
  annotation_name_side = "auto",
  .data = NULL,
  .samples = NULL,
  .side = NULL
)

Arguments

`...`	Name-value pairs where the names correspond to annotation names and values are the output of sample annotation functions such as anno_sample(), or manually specified AnnotationFunction objects
`name`	Name of the heatmap annotation, optional.
`annotation_legend_param`	A list which contains parameters for annotation legends. See `color_mapping_legend,ColorMapping-method` for all possible options.
`show_legend`	Whether show annotation legends. The value can be one single value or a vector.
`gp`	Graphic parameters for simple annotations (with `fill` parameter ignored).
`border`	border of single annotations.
`gap`	Gap between annotations. It can be a single value or a vector of `unit` objects.
`show_annotation_name`	Whether show annotation names? For column annotation, annotation names are drawn either on the left or the right, and for row annotations, names are draw either on top or at the bottom. The value can be a vector.
`annotation_label`	Labels for the annotations. By default it is the same as individual annotation names.
`annotation_name_gp`	Graphic parameters for annotation names. Graphic parameters can be vectors.
`annotation_name_offset`	Offset to the annotation names, a `unit` object. The value can be a vector.
`annotation_name_rot`	Rotation of the annotation names. The value can be a vector.
`annotation_name_align`	Whether to align the annotation names.
`annotation_name_side`	Side of the annotation names.
`.data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`.samples`	OPTIONAL selection vector of sample names, only set this if providing .data argument to override default
`.side`	OPTIONAL string, indicating the side for the variable annotations: only set this to override default

Value

HeatmapAnnotation object

Examples

library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")
samples <- phyloseq::sample_names(psq)

set.seed(42) # random colours used in first example
# sampleAnnotation returns a function that takes data, samples, and which
fun <- sampleAnnotation(
  gap = grid::unit(2.5, "mm"),
  Dis1 = anno_sample(var = "DiseaseState"),
  IBD = anno_sample_cat(var = "ibd"),
  Dis2 = anno_sample_cat(var = "DiseaseState", col = 1:4)
)

# manually specify the sample annotation function by giving it data etc.
heatmapAnnoFunction <- fun(.data = psq, .side = "top", .samples = samples)

# draw the annotation without a heatmap, you will never normally do this!
grid.newpage()
vp <- viewport(width = 0.65, height = 0.75)
pushViewport(vp)
draw(heatmapAnnoFunction)
pushViewport(viewport(x = 0.7, y = 0.6))
draw(attr(heatmapAnnoFunction, "Legends"))
library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")
samples <- phyloseq::sample_names(psq)

set.seed(42) # random colours used in first example
# sampleAnnotation returns a function that takes data, samples, and which
fun <- sampleAnnotation(
  gap = grid::unit(2.5, "mm"),
  Dis1 = anno_sample(var = "DiseaseState"),
  IBD = anno_sample_cat(var = "ibd"),
  Dis2 = anno_sample_cat(var = "DiseaseState", col = 1:4)
)

# manually specify the sample annotation function by giving it data etc.
heatmapAnnoFunction <- fun(.data = psq, .side = "top", .samples = samples)

# draw the annotation without a heatmap, you will never normally do this!
grid.newpage()
vp <- viewport(width = 0.65, height = 0.75)
pushViewport(vp)
draw(heatmapAnnoFunction)
pushViewport(viewport(x = 0.7, y = 0.6))
draw(attr(heatmapAnnoFunction, "Legends"))

Filled shapes for ggiraph interactive plots

Description

Generates a custom ggplot2 shape scale, as used in ord_explore's ordination. Uses filled shapes, therefore fill aesthetic must be set, in addition to colour, to have filled shapes. Points with NA values for the shape variable are shown as hollow circles.

Usage

scale_shape_girafe_filled()
scale_shape_girafe_filled()

Details

Composite shapes e.g. number 7 "square cross" cause ggiraph interactive plots to fail when a variable shape and tooltip is set.

Shapes used are, in order: "circle filled", "triangle filled", "square filled", "diamond filled", and "triangle down filled"

Value

ggplot2 Scale object

Examples

microViz::ibd %>%
  tax_fix() %>%
  phyloseq_validate() %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc(
    method = "PCA"
  ) %>%
  ord_plot(
    axes = c(1, 2),
    plot_taxa = 1:6,
    colour = "DiseaseState", fill = "DiseaseState",
    shape = "circle", alpha = 0.5,
    size = 3
  ) +
  scale_shape_girafe_filled()
microViz::ibd %>%
  tax_fix() %>%
  phyloseq_validate() %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc(
    method = "PCA"
  ) %>%
  ord_plot(
    axes = c(1, 2),
    plot_taxa = 1:6,
    colour = "DiseaseState", fill = "DiseaseState",
    shape = "circle", alpha = 0.5,
    size = 3
  ) +
  scale_shape_girafe_filled()

Gut microbiota relative abundance data from Shao et al. 2019

Description

A phyloseq dataset containing 1644 gut microbiome samples. Fecal samples were collected from 596 infants. 175 mothers also provided fecal samples.

Usage

shao19
shao19

Format

large phyloseq object

Data acquisition

The processed (metaphlan3) relative abundance data were obtained using curatedMetagenomicData package version 3.4.2. https://waldronlab.io/curatedMetagenomicData/articles/available-studies.html

A small amount of data cleaning was performed after download:

to make sample metadata more user friendly
to fill gaps in the taxonomy table.

Source

Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth. doi:10.1038/s41586-019-1560-1

Draw convex hull for a set of points on a ggplot

Description

Draws a (convex) polygon around the outermost points of a set of points. Useful as a visual aid for identifying groups of points on a scatterplot, such as an ordination plot.

Usage

stat_chull(
  mapping = NULL,
  data = NULL,
  geom = "polygonHollow",
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  ...
)
stat_chull(
  mapping = NULL,
  data = NULL,
  geom = "polygonHollow",
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  ...
)

Arguments

`mapping`	Set of aesthetic mappings created by `aes()`. If specified and `inherit.aes = TRUE` (the default), it is combined with the default mapping at the top level of the plot. You must supply `mapping` if there is no plot mapping.
`data`	The data to be displayed in this layer. There are three options: If `NULL`, the default, the data is inherited from the plot data as specified in the call to `ggplot()`. A `data.frame`, or other object, will override the plot data. All objects will be fortified to produce a data frame. See `fortify()` for which variables will be created. A `function` will be called with a single argument, the plot data. The return value must be a `data.frame`, and will be used as the layer data. A `function` can be created from a `formula` (e.g. `~ head(.x, 10)`).
`geom`	The geometric object to use to display the data for this layer. When using a `⁠stat_*()⁠` function to construct a layer, the `geom` argument can be used to override the default coupling between stats and geoms. The `geom` argument accepts the following: A `Geom` ggproto subclass, for example `GeomPoint`. A string naming the geom. To give the geom as a string, strip the function name of the `geom_` prefix. For example, to use `geom_point()`, give the geom as `"point"`. For more information and other ways to specify the geom, see the layer geom documentation.
`position`	A position adjustment to use on the data for this layer. This can be used in various ways, including to prevent overplotting and improving the display. The `position` argument accepts the following: The result of calling a position function, such as `position_jitter()`. This method allows for passing extra arguments to the position. A string naming the position adjustment. To give the position as a string, strip the function name of the `position_` prefix. For example, to use `position_jitter()`, give the position as `"jitter"`. For more information and other ways to specify the position, see the layer position documentation.
`na.rm`	If `FALSE`, the default, missing values are removed with a warning. If `TRUE`, missing values are silently removed.
`show.legend`	logical. Should this layer be included in the legends? `NA`, the default, includes if any aesthetics are mapped. `FALSE` never includes, and `TRUE` always includes. It can also be a named logical vector to finely select the aesthetics to display.
`inherit.aes`	If `FALSE`, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. `borders()`.
`...`	Other arguments passed on to `layer()`'s `params` argument. These arguments broadly fall into one of 4 categories below. Notably, further arguments to the `position` argument, or aesthetics that are required can not be passed through `...`. Unknown arguments that are not part of the 4 categories below are ignored. Static aesthetics that are not mapped to a scale, but are at a fixed value and apply to the layer as a whole. For example, `colour = "red"` or `linewidth = 3`. The geom's documentation has an Aesthetics section that lists the available options. The 'required' aesthetics cannot be passed on to the `params`. Please note that while passing unmapped aesthetics as vectors is technically possible, the order and required length is not guaranteed to be parallel to the input data. When constructing a layer using a `⁠stat_()⁠` function, the `...` argument can be used to pass on parameters to the `geom` part of the layer. An example of this is `stat_density(geom = "area", outline.type = "both")`. The geom's documentation lists which parameters it can accept. Inversely, when constructing a layer using a `⁠geom_()⁠` function, the `...` argument can be used to pass on parameters to the `stat` part of the layer. An example of this is `geom_area(stat = "density", adjust = 0.5)`. The stat's documentation lists which parameters it can accept. The `key_glyph` argument of `layer()` may also be passed on through `...`. This can be one of the functions described as key glyphs, to change the display of the layer in the legend.

Details

This is a ggplot2 extension - slightly modified from the original code found here:

https://CRAN.r-project.org/package=ggplot2/vignettes/extending-ggplot2.html

Examples

library(ggplot2)
microViz::ibd %>%
  tax_fix() %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "DiseaseState", shape = "DiseaseState", alpha = 0.5) +
  stat_chull(aes(colour = DiseaseState))

microViz::ibd %>%
  tax_fix() %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "DiseaseState", shape = "DiseaseState", alpha = 0.5) +
  stat_chull(aes(colour = DiseaseState, fill = DiseaseState), alpha = 0.1)
library(ggplot2)
microViz::ibd %>%
  tax_fix() %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "DiseaseState", shape = "DiseaseState", alpha = 0.5) +
  stat_chull(aes(colour = DiseaseState))

microViz::ibd %>%
  tax_fix() %>%
  tax_transform(rank = "Genus", trans = "clr") %>%
  ord_calc(method = "PCA") %>%
  ord_plot(colour = "DiseaseState", shape = "DiseaseState", alpha = 0.5) +
  stat_chull(aes(colour = DiseaseState, fill = DiseaseState), alpha = 0.1)

Aggregate taxa and track aggregation in psExtra

Description

tax_agg sums the abundances of the phyloseq taxa at the given rank. It records the tax_agg rank argument in the info of the psExtra object output. This psExtra object tracks aggregation, and any further transformations and scaling, to help you keep track of what you have done with your phyloseq object and automatically caption ordination plots.

Instead of tax_agg, consider using tax_transform() with a rank argument instead, to both aggregate and transform the taxa. This is also useful when you want to aggregate but not transform the taxa, and yet still log the "identity" transformation in psExtra for captioning your ordination plots. e.g. tax_transform(rank = "Genus", trans = "identity")

tax_agg allows you to pass NA or "unique" to the rank argument which will NOT aggregate the taxa. If you use rank = "unique" or add_unique = TRUE, it will add a new rank called unique, identical to the taxa_names (after any aggregation)

Be aware: you should not use the top_N argument yourself without good reason. top_N provides a feature inspired by the deprecated microbiome function aggregate_top_taxa which is primarily useful for decluttering compositional barplots. microViz comp_barplot (and ord_plot_iris) already run tax_agg with a top_N argument for you, so you should not. The tax_table produced when using top_N is otherwise INVALID FOR MOST OTHER ANALYSES.

Usage

tax_agg(
  ps,
  rank = NA,
  sort_by = NA,
  top_N = NA,
  force = FALSE,
  add_unique = FALSE
)
tax_agg(
  ps,
  rank = NA,
  sort_by = NA,
  top_N = NA,
  force = FALSE,
  add_unique = FALSE
)

Arguments

`ps`	phyloseq object
`rank`	NA (for tax_names level) or name of valid taxonomic rank (try phyloseq::rank_names(ps)) or "unique"
`sort_by`	if not NA, how should the taxa be sorted, uses tax_sort(), takes same options as `by` arg
`top_N`	NA does nothing, but if top_N is a number, it creates an extra tax_table column called top, which is the same as the unique column for the first top_N number of taxa, and "other" otherwise.
`force`	If TRUE, this forces aggregation at chosen rank to occur regardless of if the output will be sensible! This avoids the "Taxa not unique at rank: ..." error, but may allow very inappropriate aggregation to occur. Do not use force = TRUE unless you know why you are doing this, and what the result will be. If you are getting an error with force = FALSE, it is almost certainly better to examine the tax_table and fix the problem. force = TRUE is similar to microbiome::aggregate_taxa, which also does not check that the taxa are uniquely defined by only the aggregation level.
`add_unique`	if TRUE, adds a rank named unique, identical to the rownames after aggregation

Details

This function is inspired by microbiome::aggregate_taxa. However if microbiome::aggregate_taxa is used, microViz cannot track this aggregation.

Comparing aggregate_taxa and tax_agg:

Except for the ordering of taxa, and the addition of a "unique" rank being optional (in tax_agg), the resulting phyloseq objects are identical for aggregating a phyloseq with no ambiguous taxa. Taxa are ambiguous when the tax_table converges at a lower rank after branching, such as if two different genera share the same species (e.g. "s__"). microbiome::aggregate_taxa handles ambiguous taxa by creating a "unique" rank with all of the taxonomic rank info pasted together into one, often very long, name. tax_agg throws an error, and directs the user to tax_fix() to fix the ambiguous taxa before aggregation, which should then result in (much) shorter unique names at the aggregation rank.

Value

psExtra object including phyloseq and tax_agg rank info

Examples

library(phyloseq)
library(dplyr)
data("dietswap", package = "microbiome")

tax_agg(ps = dietswap, "Phylum") %>%
  ps_get() %>%
  tax_table()
tax_agg(ps = dietswap, "Family") %>%
  ps_get() %>%
  tax_table()

# create some missing values
tax_table(dietswap)[3:7, "Genus"] <- "g__"

# this will produce an error, instructing the user to use tax_fix
# tax_agg(ps = dietswap, "Genus")

# this will then work:
dietswap %>%
  tax_fix() %>%
  tax_agg("Genus")

# you can replace unknown values with `tax_fix()`
# which will fix most problems, like the common "g__" and "s__"
# but default tax_fix settings won't catch this long unknown
tax_table(dietswap)[13:17, "Family"] <- "some_unknown_family"
dietswap %>%
  tax_fix(unknowns = "some_unknown_family") %>%
  tax_agg("Family")

# try tax_fix_interactive() to help you find and fix all the uninformative
# and converging values in your taxonomy table.

# the code below won't aggregate taxa,
# but just adds a new rank called unique, equal to taxa_names
tax_agg(ps = dietswap, rank = NA, add_unique = TRUE)
identical(tax_agg(dietswap, NA, add_unique = TRUE), tax_agg(dietswap, "unique")) # TRUE
library(phyloseq)
library(dplyr)
data("dietswap", package = "microbiome")

tax_agg(ps = dietswap, "Phylum") %>%
  ps_get() %>%
  tax_table()
tax_agg(ps = dietswap, "Family") %>%
  ps_get() %>%
  tax_table()

# create some missing values
tax_table(dietswap)[3:7, "Genus"] <- "g__"

# this will produce an error, instructing the user to use tax_fix
# tax_agg(ps = dietswap, "Genus")

# this will then work:
dietswap %>%
  tax_fix() %>%
  tax_agg("Genus")

# you can replace unknown values with `tax_fix()`
# which will fix most problems, like the common "g__" and "s__"
# but default tax_fix settings won't catch this long unknown
tax_table(dietswap)[13:17, "Family"] <- "some_unknown_family"
dietswap %>%
  tax_fix(unknowns = "some_unknown_family") %>%
  tax_agg("Family")

# try tax_fix_interactive() to help you find and fix all the uninformative
# and converging values in your taxonomy table.

# the code below won't aggregate taxa,
# but just adds a new rank called unique, equal to taxa_names
tax_agg(ps = dietswap, rank = NA, add_unique = TRUE)
identical(tax_agg(dietswap, NA, add_unique = TRUE), tax_agg(dietswap, "unique")) # TRUE

Filter rare and/or low abundance taxa from a phyloseq object

Description

Removes taxa (from all samples) that do not meet a given criterion or combination of criteria. If a value for min_prevalence, min_total_abundance or min_sample_abundance is 1 or greater, then it is treated as an absolute minimum number of samples/reads. If <1, it is treated as proportion of all samples/reads. This function is designed to work with counts. otu_table must contain counts particularly if you want to set a non-zero value for min_total_abundance.

Usage

tax_filter(
  ps,
  min_prevalence = 1,
  prev_detection_threshold = 1,
  min_total_abundance = 0,
  min_sample_abundance = 0,
  tax_level = NA,
  names_only = FALSE,
  use_counts = TRUE,
  undetected = NULL,
  verbose = TRUE
)
tax_filter(
  ps,
  min_prevalence = 1,
  prev_detection_threshold = 1,
  min_total_abundance = 0,
  min_sample_abundance = 0,
  tax_level = NA,
  names_only = FALSE,
  use_counts = TRUE,
  undetected = NULL,
  verbose = TRUE
)

Arguments

`ps`	phyloseq or psExtra (ideally with count data available)
`min_prevalence`	number or proportion of samples that a taxon must be present in
`prev_detection_threshold`	min required counts (or value) for a taxon to be considered present in that sample (or set undetected arg)
`min_total_abundance`	minimum total readcount of a taxon, summed across all samples (can be proportion of all counts)
`min_sample_abundance`	taxa must have at least this many reads in one or more samples
`tax_level`	if given, aggregates data at named taxonomic rank before filtering, but returns phyloseq at the ORIGINAL level of aggregation!
`names_only`	if names_only is true return only names of taxa, not the phyloseq
`use_counts`	expect count data in phyloseq otu_table? default is TRUE
`undetected`	e.g. 0, value at (or below) which a taxon is considered not present in that sample. If set, this overrides prev_detection_threshold.
`verbose`	message about proportional prevalence calculations?

Value

filtered phyloseq object AT ORIGINAL LEVEL OF AGGREGATION (not at the level in tax_level)

Examples

data("dietswap", package = "microbiome")
# Dropping rare and low abundance taxa #
# Filter at unique taxa level, keeping only those with a prevalence of 10% or more
# and at least 10 thousand reads when summed across all samples.
# Then aggregate to Family level taxonomy.
dietswap %>%
  tax_filter(min_prevalence = 0.1, min_total_abundance = 10000) %>%
  tax_agg("Family")

# Keeping ubiquitous families #
# keep only families that have at least 1000 counts present in 90% of samples
# then aggregate the remaining taxa at 'Genus' level
dietswap %>%
  tax_filter(
    tax_level = "Family", min_prevalence = 0.90,
    prev_detection_threshold = 1000
  ) %>%
  tax_agg("Genus")
data("dietswap", package = "microbiome")
# Dropping rare and low abundance taxa #
# Filter at unique taxa level, keeping only those with a prevalence of 10% or more
# and at least 10 thousand reads when summed across all samples.
# Then aggregate to Family level taxonomy.
dietswap %>%
  tax_filter(min_prevalence = 0.1, min_total_abundance = 10000) %>%
  tax_agg("Family")

# Keeping ubiquitous families #
# keep only families that have at least 1000 counts present in 90% of samples
# then aggregate the remaining taxa at 'Genus' level
dietswap %>%
  tax_filter(
    tax_level = "Family", min_prevalence = 0.90,
    prev_detection_threshold = 1000
  ) %>%
  tax_agg("Genus")

Replace unknown, NA, or short tax_table values

Description

Identifies phyloseq tax_table values as unknown or uninformative and replaces them with the first informative value from a higher taxonomic rank.

Short values in phyloseq tax_table are typically empty strings or " ", or "g__" etc. so it is helpful to replace them. (If this is unwanted: set min_length = 0 to avoid filtering on length.)
Values in unknowns are also removed, even if longer than min_length. It is up to the user to specify sensible values in unknowns if their dataset has other unwanted values.
NA values are also replaced.

See this article for an extended discussion of tax_table fixing. https://david-barnett.github.io/microViz/articles/web-only/tax-fixing.html

Usage

tax_fix(
  ps,
  min_length = 4,
  unknowns = NA,
  suffix_rank = "classified",
  sep = " ",
  anon_unique = TRUE,
  verbose = TRUE
)
tax_fix(
  ps,
  min_length = 4,
  unknowns = NA,
  suffix_rank = "classified",
  sep = " ",
  anon_unique = TRUE,
  verbose = TRUE
)

Arguments

`ps`	phyloseq or tax_table (taxonomyTable)
`min_length`	replace strings shorter than this, must be integer > 0
`unknowns`	also replace strings matching any in this vector, NA default vector shown in details!
`suffix_rank`	"classified" (default) or "current", when replacing an entry, should the suffix be taken from the lowest classified rank for that taxon, "classified", or the "current" unclassified rank?
`sep`	character(s) separating new name and taxonomic rank level suffix (see suffix_rank)
`anon_unique`	make anonymous taxa unique by replacing unknowns with taxa_name? otherwise they are replaced with paste("unknown", first_rank_name), which is therefore the same for every anonymous taxon, meaning they will be merged if tax_agg is used. (anonymous taxa are taxa with all unknown values in their tax_table row, i.e. cannot be classified even at highest rank available)
`verbose`	emit warnings when cannot replace with informative name?

Details

By default (unknowns = NA), unknowns is set to a vector containing:

's__' 'g__' 'f__' 'o__' 'c__' 'p__' 'k__' 'S__' 'G__' 'F__' 'O__' 'C__' 'P__' 'K__' 'NA' 'NaN' ' ' ” 'unknown' 'Unknown' 's__unknown' 's__Unknown' 's__NA' 'g__unknown' 'g__Unknown' 'g__NA' 'f__unknown' 'f__Unknown' 'f__NA' 'o__unknown' 'o__Unknown' 'o__NA' 'c__unknown' 'c__Unknown' 'c__NA' 'p__unknown' 'p__Unknown' 'p__NA' 'k__unknown' 'k__Unknown' 'k__NA' 'S__unknown' 'S__Unknown' 'S__NA' 'G__unknown' 'G__Unknown' 'G__NA' 'F__unknown' 'F__Unknown' 'F__NA' 'O__unknown' 'O__Unknown' 'O__NA' 'C__unknown' 'C__Unknown' 'C__NA' 'P__unknown' 'P__Unknown' 'P__NA' 'K__unknown' 'K__Unknown' 'K__NA'

Value

object same class as ps

Examples

library(dplyr)
library(phyloseq)

data(dietswap, package = "microbiome")
ps <- dietswap

# create unknowns to test filling
tt <- tax_table(ps)
ntax <- ntaxa(ps)
set.seed(123)
g <- sample(1:ntax, 30)
f <- sample(g, 10)
p <- sample(f, 3)
tt[g, 3] <- "g__"
tt[f, 2] <- "f__"
tt[p, 1] <- "p__"
tt[sample(1:ntax, 10), 3] <- "unknown"
# create a row with only NAs
tt[1, ] <- NA

tax_table(ps) <- tax_table(tt)

ps
# tax_fix with defaults should solve most problems
tax_table(ps) %>% head(50)

# this will replace "unknown"s as well as short values including "g__" and "f__"
tax_fix(ps) %>%
  tax_table() %>%
  head(50)

# This will only replace short entries, and so won't replace literal "unknown" values
ps %>%
  tax_fix(unknowns = NULL) %>%
  tax_table() %>%
  head(50)

# Change rank suffix and separator settings
tax_fix(ps, suffix_rank = "current", sep = " - ") %>%
  tax_table() %>%
  head(50)

# by default, completely unclassified (anonymous) taxa are named by their
# taxa_names / rownames at all ranks.
# This makes anonymous taxa distinct from each other,
# and so they won't be merged on aggregation with tax_agg.
# If you think your anonymous taxa should merge on tax_agg,
# or you just want them to be named the all same for another reason,
# set anon_unique = FALSE (compare the warning messages)
tax_fix(ps, anon_unique = FALSE)
tax_fix(ps, anon_unique = TRUE)

# here's a larger example tax_table shows its still fast with 1000s rows,
# from microbiomeutilities package
# library(microbiomeutilities)
# data("hmp2")
# system.time(tax_fix(hmp2, min_length = 1))
library(dplyr)
library(phyloseq)

data(dietswap, package = "microbiome")
ps <- dietswap

# create unknowns to test filling
tt <- tax_table(ps)
ntax <- ntaxa(ps)
set.seed(123)
g <- sample(1:ntax, 30)
f <- sample(g, 10)
p <- sample(f, 3)
tt[g, 3] <- "g__"
tt[f, 2] <- "f__"
tt[p, 1] <- "p__"
tt[sample(1:ntax, 10), 3] <- "unknown"
# create a row with only NAs
tt[1, ] <- NA

tax_table(ps) <- tax_table(tt)

ps
# tax_fix with defaults should solve most problems
tax_table(ps) %>% head(50)

# this will replace "unknown"s as well as short values including "g__" and "f__"
tax_fix(ps) %>%
  tax_table() %>%
  head(50)

# This will only replace short entries, and so won't replace literal "unknown" values
ps %>%
  tax_fix(unknowns = NULL) %>%
  tax_table() %>%
  head(50)

# Change rank suffix and separator settings
tax_fix(ps, suffix_rank = "current", sep = " - ") %>%
  tax_table() %>%
  head(50)

# by default, completely unclassified (anonymous) taxa are named by their
# taxa_names / rownames at all ranks.
# This makes anonymous taxa distinct from each other,
# and so they won't be merged on aggregation with tax_agg.
# If you think your anonymous taxa should merge on tax_agg,
# or you just want them to be named the all same for another reason,
# set anon_unique = FALSE (compare the warning messages)
tax_fix(ps, anon_unique = FALSE)
tax_fix(ps, anon_unique = TRUE)

# here's a larger example tax_table shows its still fast with 1000s rows,
# from microbiomeutilities package
# library(microbiomeutilities)
# data("hmp2")
# system.time(tax_fix(hmp2, min_length = 1))

Shiny app to help you use tax_fix

Description

Try this app if you get errors with tax_fix() that are tricky to work past, or suggestions to use tax_fix() that you don't understand.

The app shows you the tax_table of your data (searchable) with unknown values highlighted.

It allows you to interactively modify minimum allowed length and to select further values to be defined as unknown.

It will show you the correct tax_fix() code to copy paste into your script to reproduce the interactive filtering.

Usage

tax_fix_interactive(data, app_options = list(launch.browser = TRUE))
tax_fix_interactive(data, app_options = list(launch.browser = TRUE))

Arguments

`data`	a phyloseq object
`app_options`	options list passed to shinyApp()

Value

nothing

Examples

library(dplyr)
library(phyloseq)

# create some problem-filled example tax_table data
data(dietswap, package = "microbiome")
ps <- dietswap
# create unknowns to test filling
tt <- tax_table(ps)
ntax <- ntaxa(ps)
set.seed(123)
g <- sample(1:ntax, 30)
f <- sample(g, 10)
p <- sample(f, 3)
tt[g, 3] <- "g__"
tt[f, 2] <- "f__"
tt[p, 1] <- "p__"
tt[sample(1:ntax, 10), 3] <- "unknown"
# create a row with only NAs
tt[1, ] <- NA
tax_table(ps) <- tax_table(tt)

# function takes a phyloseq and shows code for how to fix the tax_table
# tax_fix_interactive(data = ps)
library(dplyr)
library(phyloseq)

# create some problem-filled example tax_table data
data(dietswap, package = "microbiome")
ps <- dietswap
# create unknowns to test filling
tt <- tax_table(ps)
ntax <- ntaxa(ps)
set.seed(123)
g <- sample(1:ntax, 30)
f <- sample(g, 10)
p <- sample(f, 3)
tt[g, 3] <- "g__"
tt[f, 2] <- "f__"
tt[p, 1] <- "p__"
tt[sample(1:ntax, 10), 3] <- "unknown"
# create a row with only NAs
tt[1, ] <- NA
tax_table(ps) <- tax_table(tt)

# function takes a phyloseq and shows code for how to fix the tax_table
# tax_fix_interactive(data = ps)

Statistical modelling for individual taxa in a phyloseq

Description

tax_model provides a simple framework to statistically model the abundance of individual taxa in your data. You can choose which type of statistical model you want to fit, and you can choose at which rank and (optionally) which specific taxa to fit statistical models for. tax_model takes a phyloseq and returns a list of statistical models, one model for each taxon. The same independent variables are used for all models, as specified in variables or formula argument (latter takes precedence).

taxatree_models runs tax_model on every taxon at multiple taxonomic ranks (you choose which ranks with the plural ranks argument), and returns the results as a named nested list designed for use with taxatree_plots. One list per rank, one model per taxon at each rank.

type = "bbdml" will run beta binomial regression model(s) using the corncob package. For bbdml the same formula/variables is/are used for modelling both the abundance and dispersion parameters.

Usage

tax_model(
  ps,
  rank,
  type = "lm",
  variables = NULL,
  formula = NULL,
  taxa = NULL,
  use_future = FALSE,
  return_psx = TRUE,
  checkVars = TRUE,
  checkNA = "warning",
  verbose = TRUE,
  trans = "identity",
  trans_args = list(),
  ...
)
tax_model(
  ps,
  rank,
  type = "lm",
  variables = NULL,
  formula = NULL,
  taxa = NULL,
  use_future = FALSE,
  return_psx = TRUE,
  checkVars = TRUE,
  checkNA = "warning",
  verbose = TRUE,
  trans = "identity",
  trans_args = list(),
  ...
)

Arguments

`ps`	phyloseq object
`rank`	name of taxonomic rank to aggregate to and model taxa at
`type`	name of modelling function to use, or the function itself
`variables`	vector of variable names, to be used as model formula right hand side. If variables is a list, not a vector, a model is fit for each entry in list.
`formula`	Right hand side of a formula, as a formula object or character string. Or a list of these. (alternative to variables argument, do not provide both)
`taxa`	taxa to model (named, numbered, logical selection, or defaulting to all if NULL)
`use_future`	if TRUE parallel processing with future is possible, see details.
`return_psx`	if TRUE, list of results will be returned attached to psExtra object
`checkVars`	should the predictor variables be checked for zero variance?
`checkNA`	One of "stop", "warning", "message", or "allow", which indicates whether to check predictor variables for NAs, and how to report any NAs if present?
`verbose`	message about progress and any taxa name modifications
`trans`	name of tax_transform transformation to apply to aggregated taxa before fitting statistical models
`trans_args`	named list of any additional arguments to tax_transform e.g. list(zero_replace = "halfmin")
`...`	extra args passed directly to modelling function

Details

tax_model and taxatree_models can use parallel processing with the future package. This can speed up analysis if you have many taxa to model. Set use_future = TRUE and run a line like this before doing your modelling: future::plan(future::multisession, workers = 3) (This requires the future and future.apply packages to be installed.)

Value

psExtra with named list of model objects attached. Or a list of lists of models (if multiple models per taxon). Or just a list (of lists), if return_psx is FALSE.

Examples

library(corncob)
library(dplyr)

data(dietswap, package = "microbiome")
ps <- dietswap

# create some binary variables for easy visualization
ps <- ps %>% ps_mutate(
  female = if_else(sex == "female", 1, 0, NaN),
  overweight = if_else(bmi_group == "overweight", 1, 0, NaN),
  obese = if_else(bmi_group == "obese", 1, 0, NaN)
)

# This example HITChip dataset has some taxa with the same name for phylum and family...
# We can fix problems like this with the tax_prepend_ranks function
ps <- tax_prepend_ranks(ps)

# filter out rare taxa (it is often difficult to fit multivariable models to rare taxa)
ps <- ps %>% tax_filter(min_prevalence = 0.1, min_total_abundance = 10000)

# specify variables used for modelling
VARS <- c("female", "overweight", "obese")

# Model first 3 genera using all VARS as predictors (just for a quick test)
models <- tax_model(ps, type = "bbdml", rank = "Genus", taxa = 1:3, variables = VARS)

# Alternative method using formula arg instead of variables to produce identical results
models2 <- tax_model(
  ps = ps, rank = "Genus", type = "bbdml",
  taxa = 1:3, formula = ~ female + overweight + obese, return_psx = FALSE
)
all.equal(models, models2) # should be TRUE

# Model only one genus, NOTE the modified name,
# which was returned by tax_prepend_ranks defaults
models3 <- ps %>%
  tax_model(
    rank = "Genus", type = "bbdml",
    taxa = "G: Bacteroides fragilis et rel.", variables = VARS
  )

# Model all taxa at multiple taxonomic ranks (ranks 1 and 2)
# using only female variable as predictor
models4 <- taxatree_models(
  ps = ps, type = "bbdml", ranks = 1:2, formula = ~female, verbose = FALSE
)

# modelling proportions with simple linear regression is also possible via type = lm
# and transforming the taxa to compositional first
models_lm <- ps %>%
  tax_transform("compositional") %>%
  tax_model(rank = "Genus", taxa = 1:3, variables = VARS, type = "lm")
library(corncob)
library(dplyr)

data(dietswap, package = "microbiome")
ps <- dietswap

# create some binary variables for easy visualization
ps <- ps %>% ps_mutate(
  female = if_else(sex == "female", 1, 0, NaN),
  overweight = if_else(bmi_group == "overweight", 1, 0, NaN),
  obese = if_else(bmi_group == "obese", 1, 0, NaN)
)

# This example HITChip dataset has some taxa with the same name for phylum and family...
# We can fix problems like this with the tax_prepend_ranks function
ps <- tax_prepend_ranks(ps)

# filter out rare taxa (it is often difficult to fit multivariable models to rare taxa)
ps <- ps %>% tax_filter(min_prevalence = 0.1, min_total_abundance = 10000)

# specify variables used for modelling
VARS <- c("female", "overweight", "obese")

# Model first 3 genera using all VARS as predictors (just for a quick test)
models <- tax_model(ps, type = "bbdml", rank = "Genus", taxa = 1:3, variables = VARS)

# Alternative method using formula arg instead of variables to produce identical results
models2 <- tax_model(
  ps = ps, rank = "Genus", type = "bbdml",
  taxa = 1:3, formula = ~ female + overweight + obese, return_psx = FALSE
)
all.equal(models, models2) # should be TRUE

# Model only one genus, NOTE the modified name,
# which was returned by tax_prepend_ranks defaults
models3 <- ps %>%
  tax_model(
    rank = "Genus", type = "bbdml",
    taxa = "G: Bacteroides fragilis et rel.", variables = VARS
  )

# Model all taxa at multiple taxonomic ranks (ranks 1 and 2)
# using only female variable as predictor
models4 <- taxatree_models(
  ps = ps, type = "bbdml", ranks = 1:2, formula = ~female, verbose = FALSE
)

# modelling proportions with simple linear regression is also possible via type = lm
# and transforming the taxa to compositional first
models_lm <- ps %>%
  tax_transform("compositional") %>%
  tax_model(rank = "Genus", taxa = 1:3, variables = VARS, type = "lm")

Modify or compute new taxonomic ranks in phyloseq

Description

Add or overwrite tax_table ranks. Use dplyr::mutate() syntax.

Usage

tax_mutate(ps, ...)
tax_mutate(ps, ...)

Arguments

`ps`	phyloseq object with a tax_table, or just a tax_table
`...`	passed straight to dplyr::mutate (see examples and dplyr::mutate help)

Value

phyloseq object with modified tax_table

Examples

library(phyloseq)
library(dplyr)
data("dietswap", package = "microbiome")

# compute new rank
tax_mutate(dietswap, loud_genus = toupper(Genus)) %>%
  tt_get() %>%
  head()

# overwrite a current rank
tax_mutate(dietswap, Genus = toupper(Genus)) %>%
  tt_get() %>%
  head()

# overwrite all ranks
tax_mutate(dietswap, across(everything(), .fns = toupper)) %>%
  tt_get() %>%
  head()

# add a new rank at the beginning
tax_mutate(dietswap, Root = "Bacteria", .before = 1) %>%
  tt_get() %>%
  head()

# this is an error as ranks can't be any other class than character
# tax_mutate(dietswap, Genus = 1:ntaxa(dietswap))
library(phyloseq)
library(dplyr)
data("dietswap", package = "microbiome")

# compute new rank
tax_mutate(dietswap, loud_genus = toupper(Genus)) %>%
  tt_get() %>%
  head()

# overwrite a current rank
tax_mutate(dietswap, Genus = toupper(Genus)) %>%
  tt_get() %>%
  head()

# overwrite all ranks
tax_mutate(dietswap, across(everything(), .fns = toupper)) %>%
  tt_get() %>%
  head()

# add a new rank at the beginning
tax_mutate(dietswap, Root = "Bacteria", .before = 1) %>%
  tt_get() %>%
  head()

# this is an error as ranks can't be any other class than character
# tax_mutate(dietswap, Genus = 1:ntaxa(dietswap))

Simple way to set unique taxa_names for phyloseq object

Description

If your current taxa_names aren't what you want (e.g. they are long DNA sequences), this function will help you set sensible unique names.

It combines:

a prefix like tax, asv, or otu (pick an appropriate prefix or set your own)
a unique (sequential) number
classification information from a chosen taxonomic rank (optional)

Usage

tax_name(
  ps,
  prefix = c("tax", "asv", "otu")[1],
  rank = NA,
  pad_number = TRUE,
  sep = "_"
)
tax_name(
  ps,
  prefix = c("tax", "asv", "otu")[1],
  rank = NA,
  pad_number = TRUE,
  sep = "_"
)

Arguments

`ps`	phyloseq object
`prefix`	e.g. 'tax', 'asv', or 'otu' (or set your own)
`rank`	name of taxonomic rank from which to use classifications in new names
`pad_number`	should unique numbers have zeros added to the front (e.g. 001, 002) to be made the same number of characters?
`sep`	character to separate the unique number and any taxonomic classification info (relevant if rank given)

Details

Don't confuse this with the phyloseq function taxa_names() or the newer microViz function tax_rename().

Value

phyloseq object

Examples

library(phyloseq)
# get example data
data("enterotype")
ps <- enterotype
head(taxa_names(ps)) # these are mostly fine (except the -1), but imagine you wanted new names

# consider storing the original names for reference (e.g. if they are DNA sequences)
old_taxa_names <- taxa_names(ps)

ps <- tax_name(ps)
taxa_names(ps) %>% head()

# probably better to include the genus info to make these names more informative
ps <- tax_name(ps, rank = "Genus")
taxa_names(ps) %>% head()

# store new names with old names in dataframe for reference
names_df <- tibble::tibble(old = old_taxa_names, new = taxa_names(ps))

# alternative settings
tax_name(ps, pad_number = FALSE) %>%
  taxa_names() %>%
  head()
tax_name(ps, prefix = "whateveryoulike") %>%
  taxa_names() %>%
  head()
tax_name(ps, rank = "Genus", sep = "-") %>%
  taxa_names() %>%
  head()
library(phyloseq)
# get example data
data("enterotype")
ps <- enterotype
head(taxa_names(ps)) # these are mostly fine (except the -1), but imagine you wanted new names

# consider storing the original names for reference (e.g. if they are DNA sequences)
old_taxa_names <- taxa_names(ps)

ps <- tax_name(ps)
taxa_names(ps) %>% head()

# probably better to include the genus info to make these names more informative
ps <- tax_name(ps, rank = "Genus")
taxa_names(ps) %>% head()

# store new names with old names in dataframe for reference
names_df <- tibble::tibble(old = old_taxa_names, new = taxa_names(ps))

# alternative settings
tax_name(ps, pad_number = FALSE) %>%
  taxa_names() %>%
  head()
tax_name(ps, prefix = "whateveryoulike") %>%
  taxa_names() %>%
  head()
tax_name(ps, rank = "Genus", sep = "-") %>%
  taxa_names() %>%
  head()

Add taxa_names as last column in phyloseq tax_table

Description

The taxa names in your phyloseq may specify a further unique classification of your taxa, e.g. ASVs, that is not otherwise represented in the tax_table itself. This function fixes that, and allows you to include this level in taxatree_plots for example.

Usage

tax_names2rank(data, colname = "unique")
tax_names2rank(data, colname = "unique")

Arguments

`data`	phyloseq object, or psExtra or tax_table (taxonomyTable)
`colname`	name of new rank to add at right side of tax_table

Value

same class object as passed in to data

Make a fixed taxa colour palette e.g. for comp_barplot

Description

Makes a named palette vector from your phyloseq dataset considering overall abundance to assign colours (or some other sorting)

Usage

tax_palette(
  data,
  rank,
  n,
  by = sum,
  pal = "brewerPlus",
  add = c(other = "lightgrey"),
  ...
)
tax_palette(
  data,
  rank,
  n,
  by = sum,
  pal = "brewerPlus",
  add = c(other = "lightgrey"),
  ...
)

Arguments

`data`	phyloseq or psExtra
`rank`	taxonomic rank name or "unique"
`n`	number of colours / taxa (not including "other")
`by`	tax sorting method for tax_sort e.g. sum
`pal`	palette name from distinct_palette function
`add`	name = value pairs appended to end of output, or NA for none
`...`	other args are passed to tax_sort

Value

named character vector

Examples

library(ggplot2)
data(dietswap, package = "microbiome")

myPal <- tax_palette(dietswap, rank = "Genus", pal = "brewerPlus", n = 40)
myPal %>% tax_palette_plot() # just to check the palette

# plot one subset of data
dietswap %>%
  ps_filter(nationality == "AFR", timepoint == 1, sex == "male") %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15,
    bar_outline_colour = NA, bar_width = 0.7,
    palette = myPal, label = NULL
  )

# plot a different subset of data (top taxa differ but colours are the same)
dietswap %>%
  ps_filter(nationality != "AFR", timepoint == 1, sex == "male") %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15,
    bar_outline_colour = NA, bar_width = 0.7,
    palette = myPal, label = NULL
  )

library(ggplot2)
data(dietswap, package = "microbiome")

myPal <- tax_palette(dietswap, rank = "Genus", pal = "brewerPlus", n = 40)
myPal %>% tax_palette_plot() # just to check the palette

# plot one subset of data
dietswap %>%
  ps_filter(nationality == "AFR", timepoint == 1, sex == "male") %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15,
    bar_outline_colour = NA, bar_width = 0.7,
    palette = myPal, label = NULL
  )

# plot a different subset of data (top taxa differ but colours are the same)
dietswap %>%
  ps_filter(nationality != "AFR", timepoint == 1, sex == "male") %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15,
    bar_outline_colour = NA, bar_width = 0.7,
    palette = myPal, label = NULL
  )

tax_palette plotting helper function

Description

Check the named palette colour vector you created with tax_palette()

Usage

tax_palette_plot(named_pal_vec, max_n = NA)
tax_palette_plot(named_pal_vec, max_n = NA)

Arguments

`named_pal_vec`	vector of colours named by taxa (e.g. tax_palette output)
`max_n`	NA to display all colours, or limit this

Value

ggplot

Examples

library(ggplot2)
data(dietswap, package = "microbiome")

myPal <- tax_palette(dietswap, rank = "Genus", pal = "brewerPlus", n = 40)
myPal %>% tax_palette_plot() # just to check the palette

library(ggplot2)
data(dietswap, package = "microbiome")

myPal <- tax_palette(dietswap, rank = "Genus", pal = "brewerPlus", n = 40)
myPal %>% tax_palette_plot() # just to check the palette

Add rank prefixes to phyloseq tax_table values

Description

Prepend the start of rank names to each taxon at each rank (useful particularly in case of duplicated taxa names across ranks, e.g. dietswap dataset)

Usage

tax_prepend_ranks(ps, sep = ": ", nchar = 1)
tax_prepend_ranks(ps, sep = ": ", nchar = 1)

Arguments

`ps`	phyloseq object
`sep`	characters to paste in between rank initial and taxon name
`nchar`	number of characters to use from start of rank_names

Value

phyloseq

Examples

data("dietswap", package = "microbiome")
phyloseq::tax_table(dietswap) %>% head()
dietswap %>%
  tax_prepend_ranks() %>%
  phyloseq::tax_table() %>%
  head()
data("dietswap", package = "microbiome")
phyloseq::tax_table(dietswap) %>% head()
dietswap %>%
  tax_prepend_ranks() %>%
  phyloseq::tax_table() %>%
  head()

Make new phyloseq taxa names from classification and taxon abundance info

Description

Pairs classification at the given rank with a numeric ranking suffix (based on abundance or prevalence data) to automatically create informative taxa names.

Usage

tax_rename(
  ps,
  rank,
  sort_by = sum,
  transform_for_sort = "identity",
  pad_digits = "auto",
  sep = " ",
  ...
)
tax_rename(
  ps,
  rank,
  sort_by = sum,
  transform_for_sort = "identity",
  pad_digits = "auto",
  sep = " ",
  ...
)

Arguments

`ps`	phyloseq object
`rank`	name of rank to use in new taxa names
`sort_by`	how to sort taxa for numbering within rank-based groups (a tax_sort option)
`transform_for_sort`	named of transformation to apply to taxa before sorting
`pad_digits`	how long should the numeric suffixes be? see details
`sep`	character to separate the rank prefixes from numeric suffixes
`...`	additional arguments passed to tax_sort

Details

e.g. "Bacteroides 003" for the third most abundant Bacteroides OTU or ASV.

Taxa are returned in original order, and otu_table is returned un-transformed.

pad_digits options:

"auto" –> minimum digits to have equal length numbers within groups
"max" –> minimum digits to have equal length numbers across all groups
A number: e.g.
- 3 –> 001, 002, ..., 042, ..., 180, ...
- 1 –> 1, 2, ..., 42, ..., 180, ...

Value

phyloseq object

Examples

library(phyloseq)
data("ibd", package = "microViz")

ps <- ibd %>%
  tax_filter(min_prevalence = 3) %>%
  tax_fix()

# show a few of the current, uninformative names
taxa_names(ps) %>% head(15)
taxa_names(ps) %>% tail(15)

# change names to genus classification plus number
psNewNames <- ps %>% tax_rename(rank = "Genus")

taxa_names(psNewNames) %>% head(15)
taxa_names(psNewNames) %>% tail(15)

# demonstrate some alternative argument settings
psNewNames2 <- ps %>% tax_rename(
  rank = "Family", sort_by = prev, pad_digits = "max", sep = "-"
)

taxa_names(psNewNames2) %>% head(15)
taxa_names(psNewNames2) %>% tail(15)

ps %>%
  tax_rename(rank = "Genus", pad_digits = 2) %>%
  taxa_names() %>%
  head(15)

# naming improvement on plots example
library(ggplot2)
library(patchwork)

# Overly aggressive OTU filtering to simplify and speed up example
psExample <- ps %>% tax_filter(min_prevalence = 0.4)

# before OTU renaming
before <- psExample %>%
  ps_filter(activity == "inactive") %>%
  tax_names2rank("Taxon") %>%
  comp_barplot(
    tax_level = "Taxon", n_taxa = 12, other_name = "Other",
    merge_other = FALSE, bar_outline_colour = "grey60"
  ) +
  coord_flip() +
  ggtitle("Original taxon names :(")

# after OTU renaming
after <- psExample %>%
  ps_filter(activity == "inactive") %>%
  tax_rename(rank = "Genus", pad_digits = "max") %>%
  tax_names2rank("Taxon") %>%
  comp_barplot(
    tax_level = "Taxon", n_taxa = 12, other_name = "Other",
    merge_other = FALSE, bar_outline_colour = "grey60"
  ) +
  coord_flip() +
  ggtitle("New taxon names :)", "tax_rename(rank = 'Genus', sort_by = sum)")

before + after & theme(legend.text = element_text(size = 8))

# ordination example
psExample %>%
  tax_rename(rank = "Genus", sort_by = sum) %>%
  tax_names2rank("otu") %>%
  tax_transform("clr", rank = "otu") %>%
  ord_calc() %>%
  ord_plot(
    size = 2, colour = "ibd", shape = "circle", alpha = 0.5,
    plot_taxa = 1:10,
    tax_vec_length = 0.5,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, check_overlap = TRUE,
      size = 2.5, fontface = "bold"
    ),
    tax_vec_style_all = vec_tax_all(alpha = 0.1)
  ) +
  coord_fixed(clip = "off") +
  stat_chull(aes(colour = ibd)) +
  scale_colour_brewer(palette = "Dark2") +
  theme(panel.grid = element_line(size = 0.1))
library(phyloseq)
data("ibd", package = "microViz")

ps <- ibd %>%
  tax_filter(min_prevalence = 3) %>%
  tax_fix()

# show a few of the current, uninformative names
taxa_names(ps) %>% head(15)
taxa_names(ps) %>% tail(15)

# change names to genus classification plus number
psNewNames <- ps %>% tax_rename(rank = "Genus")

taxa_names(psNewNames) %>% head(15)
taxa_names(psNewNames) %>% tail(15)

# demonstrate some alternative argument settings
psNewNames2 <- ps %>% tax_rename(
  rank = "Family", sort_by = prev, pad_digits = "max", sep = "-"
)

taxa_names(psNewNames2) %>% head(15)
taxa_names(psNewNames2) %>% tail(15)

ps %>%
  tax_rename(rank = "Genus", pad_digits = 2) %>%
  taxa_names() %>%
  head(15)

# naming improvement on plots example
library(ggplot2)
library(patchwork)

# Overly aggressive OTU filtering to simplify and speed up example
psExample <- ps %>% tax_filter(min_prevalence = 0.4)

# before OTU renaming
before <- psExample %>%
  ps_filter(activity == "inactive") %>%
  tax_names2rank("Taxon") %>%
  comp_barplot(
    tax_level = "Taxon", n_taxa = 12, other_name = "Other",
    merge_other = FALSE, bar_outline_colour = "grey60"
  ) +
  coord_flip() +
  ggtitle("Original taxon names :(")

# after OTU renaming
after <- psExample %>%
  ps_filter(activity == "inactive") %>%
  tax_rename(rank = "Genus", pad_digits = "max") %>%
  tax_names2rank("Taxon") %>%
  comp_barplot(
    tax_level = "Taxon", n_taxa = 12, other_name = "Other",
    merge_other = FALSE, bar_outline_colour = "grey60"
  ) +
  coord_flip() +
  ggtitle("New taxon names :)", "tax_rename(rank = 'Genus', sort_by = sum)")

before + after & theme(legend.text = element_text(size = 8))

# ordination example
psExample %>%
  tax_rename(rank = "Genus", sort_by = sum) %>%
  tax_names2rank("otu") %>%
  tax_transform("clr", rank = "otu") %>%
  ord_calc() %>%
  ord_plot(
    size = 2, colour = "ibd", shape = "circle", alpha = 0.5,
    plot_taxa = 1:10,
    tax_vec_length = 0.5,
    tax_lab_style = tax_lab_style(
      type = "text", max_angle = 90, check_overlap = TRUE,
      size = 2.5, fontface = "bold"
    ),
    tax_vec_style_all = vec_tax_all(alpha = 0.1)
  ) +
  coord_fixed(clip = "off") +
  stat_chull(aes(colour = ibd)) +
  scale_colour_brewer(palette = "Dark2") +
  theme(panel.grid = element_line(size = 0.1))

Reorder taxa in phyloseq object using vector of names

Description

Reorder taxa in phyloseq object using vector of names

Usage

tax_reorder(
  ps,
  tax_order,
  tree_warn = TRUE,
  unmatched_warn = TRUE,
  ignore = c("other", "Other")
)
tax_reorder(
  ps,
  tax_order,
  tree_warn = TRUE,
  unmatched_warn = TRUE,
  ignore = c("other", "Other")
)

Arguments

`ps`	phyloseq object
`tax_order`	Names of taxa in desired order; at least some must match. (Numerical indices are also possible)
`tree_warn`	If phylogenetic tree is present in phyloseq phy_tree slot, taxa cannot be reordered. Default behaviour of tax_sort is to remove the phylogenetic tree and warn about this. tree_warn = FALSE will suppress the warning message, but still remove the tree!
`unmatched_warn`	Warn if any names (or indices) given in tax_order are not found within (range of) taxa_names(ps) - these will be ignored
`ignore`	Values that you do not want to be used for reordering taxa (useful for comp_barplot when custom palette names are used to set tax_order)

Value

phyloseq object (always without phy_tree)

Examples

data("dietswap", package = "microbiome")
new_order <- c(
  "Fusobacteria", "Cyanobacteria", "Verrucomicrobia", "Spirochaetes",
  "Actinobacteria", "Firmicutes", "Proteobacteria", "Bacteroidetes"
)
tax_agg(dietswap, rank = "Phylum") %>%
  ps_get() %>%
  phyloseq::taxa_names()

tax_agg(dietswap, rank = "Phylum") %>%
  ps_get() %>%
  tax_reorder(tax_order = new_order) %>%
  phyloseq::taxa_names()

# partial reordering (of the frontmost positions only) is possible
tax_agg(dietswap, rank = "Phylum") %>%
  ps_get() %>%
  tax_reorder(tax_order = c("Cyanobacteria", "Bacteroidetes")) %>%
  phyloseq::taxa_names()

data("dietswap", package = "microbiome")
new_order <- c(
  "Fusobacteria", "Cyanobacteria", "Verrucomicrobia", "Spirochaetes",
  "Actinobacteria", "Firmicutes", "Proteobacteria", "Bacteroidetes"
)
tax_agg(dietswap, rank = "Phylum") %>%
  ps_get() %>%
  phyloseq::taxa_names()

tax_agg(dietswap, rank = "Phylum") %>%
  ps_get() %>%
  tax_reorder(tax_order = new_order) %>%
  phyloseq::taxa_names()

# partial reordering (of the frontmost positions only) is possible
tax_agg(dietswap, rank = "Phylum") %>%
  ps_get() %>%
  tax_reorder(tax_order = c("Cyanobacteria", "Bacteroidetes")) %>%
  phyloseq::taxa_names()

Mean-center and SD-scale taxa in phyloseq

Description

Wrapper for applying base scale function to phyloseq otu_table

Usage

tax_scale(data, center = TRUE, scale = TRUE, do = NA, keep_counts = TRUE)
tax_scale(data, center = TRUE, scale = TRUE, do = NA, keep_counts = TRUE)

Arguments

`data`	phyloseq or psExtra or otu_table
`center`	if TRUE: center each taxon by subtracting its mean
`scale`	if TRUE, divide each centred taxon by its standard deviation (or divide by RMS if not centred!)
`do`	alternative argument that overrides center and scale options! takes "both", "scale", "center" or "neither"
`keep_counts`	if TRUE, retain the original count data in psExtra counts slot

Examples

data("dietswap", package = "microbiome")
ps <- dietswap
ps %>%
  otu_get() %>%
  .[1:6, 1:6]

# standard use (mean center and SD scale)
tax_scale(ps) %>%
  otu_get() %>%
  .[1:6, 1:6] # Aerococcus is NaN as standard deviation = 0 (0 prevalence)

# RMS scale only (directly on otu_table)
otu_get(ps) %>%
  tax_scale(center = FALSE) %>%
  .[1:6, 1:6] # Aerococcus is NaN as standard deviation = 0 (0 prevalence)

# example using alternative `do` argument (to center only, no scaling)
tax_scale(ps, do = "center") %>%
  otu_get() %>%
  .[1:6, 1:6]

# preserves existing info
tax_transform(ps, "compositional", rank = "Genus") %>% tax_scale()

# drops other psExtra objects previously calculated with unscaled data
psxDist <- tax_agg(ps, "Genus") %>% dist_calc()
psxDist
psxDist %>% tax_scale()
tax_scale(psxDist) %>% info_get()
data("dietswap", package = "microbiome")
ps <- dietswap
ps %>%
  otu_get() %>%
  .[1:6, 1:6]

# standard use (mean center and SD scale)
tax_scale(ps) %>%
  otu_get() %>%
  .[1:6, 1:6] # Aerococcus is NaN as standard deviation = 0 (0 prevalence)

# RMS scale only (directly on otu_table)
otu_get(ps) %>%
  tax_scale(center = FALSE) %>%
  .[1:6, 1:6] # Aerococcus is NaN as standard deviation = 0 (0 prevalence)

# example using alternative `do` argument (to center only, no scaling)
tax_scale(ps, do = "center") %>%
  otu_get() %>%
  .[1:6, 1:6]

# preserves existing info
tax_transform(ps, "compositional", rank = "Genus") %>% tax_scale()

# drops other psExtra objects previously calculated with unscaled data
psxDist <- tax_agg(ps, "Genus") %>% dist_calc()
psxDist
psxDist %>% tax_scale()
tax_scale(psxDist) %>% info_get()

Subset phyloseq object by (partial) taxa names

Description

Convenient name-based taxa selection/filtering of phyloseq object, including approximate name matching. Takes a phyloseq with tax table and a (partial) taxonomic name, or a list/vector of taxonomic names (full or partial matches).

Usage

tax_select(
  ps,
  tax_list,
  ranks_searched = "all",
  strict_matches = FALSE,
  n_typos = 1,
  deselect = FALSE
)
tax_select(
  ps,
  tax_list,
  ranks_searched = "all",
  strict_matches = FALSE,
  n_typos = 1,
  deselect = FALSE
)

Arguments

`ps`	phyloseq object
`tax_list`	e.g. c('g__Bifidobacterium', 'g__Akkermansia', 'g__Bacteroides', 'g__Streptococcus')
`ranks_searched`	'all' or a list of which taxonomic ranks should be searched for the names in tax_list?
`strict_matches`	only perfect full name matches allowed if TRUE
`n_typos`	how many typos to allow in each name? uses agrep approximate matching if > 0
`deselect`	if TRUE, the matching taxa will be REMOVED instead!

Details

tax_select will also search the otu names/rownames, BUT only for perfect matches.

Value

phyloseq object with fewer taxa

Examples

# Get example phyloseq object data
data("dietswap", package = "microbiome")
pSeq <- dietswap

# SELECTION EXAMPLES #
a <- pSeq %>% tax_select(tax_list = "Bif", n_typos = 0, ranks_searched = "Genus")
b <- pSeq %>% tax_select(tax_list = "Bifidobacterium", n_typos = 0)
c <- pSeq %>% tax_select(tax_list = "Bif", n_typos = 1)
identical(a, b) # TRUE
identical(a, c) # FALSE

pSeq %>% tax_select(tax_list = "Bifidobactrium") # default 1 typo allowed
one <- pSeq %>% tax_select(tax_list = "Akkarmensia", n_typos = 2)
two <- pSeq %>% tax_select(tax_list = "Akkermansia", n_typos = 0)
identical(one, two) # TRUE

# DESELECTION EXAMPLE # #
pSeq %>% tax_select(tax_list = "Bif", strict_matches = FALSE, deselect = TRUE)
# Incorrect example
# pSeq %>% tax_select(tax_list = "Bif", strict_matches = TRUE) # fails
# Get example phyloseq object data
data("dietswap", package = "microbiome")
pSeq <- dietswap

# SELECTION EXAMPLES #
a <- pSeq %>% tax_select(tax_list = "Bif", n_typos = 0, ranks_searched = "Genus")
b <- pSeq %>% tax_select(tax_list = "Bifidobacterium", n_typos = 0)
c <- pSeq %>% tax_select(tax_list = "Bif", n_typos = 1)
identical(a, b) # TRUE
identical(a, c) # FALSE

pSeq %>% tax_select(tax_list = "Bifidobactrium") # default 1 typo allowed
one <- pSeq %>% tax_select(tax_list = "Akkarmensia", n_typos = 2)
two <- pSeq %>% tax_select(tax_list = "Akkermansia", n_typos = 0)
identical(one, two) # TRUE

# DESELECTION EXAMPLE # #
pSeq %>% tax_select(tax_list = "Bif", strict_matches = FALSE, deselect = TRUE)
# Incorrect example
# pSeq %>% tax_select(tax_list = "Bif", strict_matches = TRUE) # fails

Sort taxa in phyloseq otu_table and tax_table

Description

Multiple ways of sorting taxa are possible and determined by the by argument. The by argument must be one of:

'rev' to reverse the current order
'name' (sort alphabetically by at)
'asis' to keep current order as is
a sample name (descending abundance sorting within that sample)
summary stat. function e.g. sum or mean

The at argument must be "names" for sorting unique taxa, or a rank name, for sorting at that rank. at is ignored when by is "rev".

Usage

tax_sort(
  data,
  by = "name",
  at = "names",
  ...,
  tree_warn = TRUE,
  verbose = TRUE,
  trans = "identity",
  use_counts = TRUE,
  counts_warn = TRUE
)
tax_sort(
  data,
  by = "name",
  at = "names",
  ...,
  tree_warn = TRUE,
  verbose = TRUE,
  trans = "identity",
  use_counts = TRUE,
  counts_warn = TRUE
)

Arguments

`data`	psExtra or phyloseq
`by`	how to sort, see description
`at`	"names" or a taxonomic rank to apply sorting method to, as specified in `by`.
`...`	used if summary function given, or pass `undetected` arg for tax_transform("binary") if by = "prev" or "prevalence"
`tree_warn`	If phylogenetic tree is present in phyloseq phy_tree slot, taxa cannot be reordered. Default behaviour of tax_sort is to remove the phylogenetic tree and warn about this. tree_warn = FALSE will suppress the warning message, but still remove the tree!
`verbose`	passed to phyloseq_validate verbose (if TRUE: message about suspicious values in tax_table, and how to fix)
`trans`	name of transformation to apply to taxa before sorting (taxa are returned un-transformed)
`use_counts`	use count data if available, instead of transformed data
`counts_warn`	warn if count data are not available?

Details

Don't forget to pass na.rm = TRUE to ... if using a summary stat function in by

Value

sorted phyloseq or psExtra

Examples

library(phyloseq)
data("dietswap", package = "microbiome")
dietswap

# reverse current order
dietswap %>%
  tax_sort("rev") %>%
  tax_table() %>%
  head(30)

# sort alphabetically by a taxonomic rank (or "names" for taxa_names)
dietswap %>%
  tax_sort(by = "name", at = "Phylum") %>%
  tax_table() %>%
  head(30)

# sequentially sorting by higher ranks
# sets tax_table in nested alphabetical order
dietswap %>%
  tax_sort(at = "names") %>%
  tax_sort(at = "Genus") %>%
  tax_sort(at = "Family") %>%
  tax_sort(at = "Phylum") %>%
  tax_table() %>%
  head(30)

# sort by function e.g. total sum or median abundance
dietswap %>%
  tax_sort(by = sum) %>%
  taxa_names() %>%
  head(20)

# transform to compositional data (proportions) before sorting
# note that abundances are returned untransformed
dietswap %>%
  tax_sort(by = sum, trans = "compositional") %>%
  taxa_names() %>%
  head(20)

# order by descending abundance in a single named sample
dietswap %>%
  tax_sort(by = "Sample-1") %>%
  otu_table() %>%
  .[1:8, 1:4]


# sum order should always equal mean order if non-negative abundances
# don't forget to add na.rm = TRUE if you expect NAs in otu_table somehow
dietswap %>%
  tax_sort(by = sum, na.rm = TRUE) %>%
  taxa_names() %>%
  head(20)

# if your phyloseq object has a phylogenetic tree,
# tax_sort will remove the tree, and warn you about this
# unless you disable that warning with tree_warn = FALSE

# You can sort by abundance at higher taxonomic ranks,
# without losing lower rank info
# e.g. sort (descending) by phyla abundances
dietswap %>%
  tax_sort(by = sum, at = "Phylum") %>%
  tax_table() %>%
  head()

# You can sort by ascending abundance (or prevalence etc) by reversing after
dietswap %>%
  tax_sort(by = "prev", at = "Phylum") %>%
  tax_sort(by = "rev") %>%
  tax_table() %>%
  head()
library(phyloseq)
data("dietswap", package = "microbiome")
dietswap

# reverse current order
dietswap %>%
  tax_sort("rev") %>%
  tax_table() %>%
  head(30)

# sort alphabetically by a taxonomic rank (or "names" for taxa_names)
dietswap %>%
  tax_sort(by = "name", at = "Phylum") %>%
  tax_table() %>%
  head(30)

# sequentially sorting by higher ranks
# sets tax_table in nested alphabetical order
dietswap %>%
  tax_sort(at = "names") %>%
  tax_sort(at = "Genus") %>%
  tax_sort(at = "Family") %>%
  tax_sort(at = "Phylum") %>%
  tax_table() %>%
  head(30)

# sort by function e.g. total sum or median abundance
dietswap %>%
  tax_sort(by = sum) %>%
  taxa_names() %>%
  head(20)

# transform to compositional data (proportions) before sorting
# note that abundances are returned untransformed
dietswap %>%
  tax_sort(by = sum, trans = "compositional") %>%
  taxa_names() %>%
  head(20)

# order by descending abundance in a single named sample
dietswap %>%
  tax_sort(by = "Sample-1") %>%
  otu_table() %>%
  .[1:8, 1:4]


# sum order should always equal mean order if non-negative abundances
# don't forget to add na.rm = TRUE if you expect NAs in otu_table somehow
dietswap %>%
  tax_sort(by = sum, na.rm = TRUE) %>%
  taxa_names() %>%
  head(20)

# if your phyloseq object has a phylogenetic tree,
# tax_sort will remove the tree, and warn you about this
# unless you disable that warning with tree_warn = FALSE

# You can sort by abundance at higher taxonomic ranks,
# without losing lower rank info
# e.g. sort (descending) by phyla abundances
dietswap %>%
  tax_sort(by = sum, at = "Phylum") %>%
  tax_table() %>%
  head()

# You can sort by ascending abundance (or prevalence etc) by reversing after
dietswap %>%
  tax_sort(by = "prev", at = "Phylum") %>%
  tax_sort(by = "rev") %>%
  tax_table() %>%
  head()

Order taxa in phyloseq by their loading vectors

Description

tax_sort_ord reorders taxa in a phyloseq object based on the relative length of their taxa scores / "loading" vector lengths on 1 or 2 ordination axes.

ord_order_taxa gets the taxa names in order from the ordination contained in a psExtra object. This is used internally by tax_sort_ord.

Usage

tax_sort_ord(ps, ord, axes = 1:2, scaling = 2)

ord_order_taxa(ord, axes = 1:2, scaling = 2)
tax_sort_ord(ps, ord, axes = 1:2, scaling = 2)

ord_order_taxa(ord, axes = 1:2, scaling = 2)

Arguments

`ps`	phyloseq object to be sorted
`ord`	psExtra with ordination object
`axes`	which axes to use for sorting? numerical vector of length 1 or 2
`scaling`	Type 2, or type 1 scaling. For more info, see https://sites.google.com/site/mb3gustame/constrained-analyses/redundancy-analysis. Either "species" or "site" scores are scaled by (proportional) eigenvalues, and the other set of scores is left unscaled (from ?vegan::scores.cca)

Get names of "top" n taxa

Description

Simple wrapper around tax_sort that:

optionally aggregates taxa at rank
sorts the aggregated taxa according to by
returns the top n number of taxa names

Usage

tax_top(data, n = 10, by = sum, rank = "unique", use_counts = FALSE, ...)
tax_top(data, n = 10, by = sum, rank = "unique", use_counts = FALSE, ...)

Arguments

`data`	phyloseq object or psExtra
`n`	how many taxa names to return, or NA for all (can return fewer than n values, if there are fewer to return)
`by`	how to sort taxa (see `?tax_sort()`), defaults to `sum` which sorts by total abundance across all samples
`rank`	taxonomic rank to aggregate at before calculating ("unique" = no aggregation)
`use_counts`	use count data if available, instead of transformed data
`...`	Arguments passed on to `tax_sort` `verbose` passed to phyloseq_validate verbose (if TRUE: message about suspicious values in tax_table, and how to fix) `trans` name of transformation to apply to taxa before sorting (taxa are returned un-transformed)

Value

vector of taxa names at chosen rank

Examples

data("dietswap", package = "microbiome")
tax_top(dietswap)
tax_top(dietswap, n = 4, by = "prev", rank = "Phylum", undetected = 30)
data("dietswap", package = "microbiome")
tax_top(dietswap)
tax_top(dietswap, n = 4, by = "prev", rank = "Phylum", undetected = 30)

Transform taxa in phyloseq object and record transformation

Description

Transform taxa features, and optionally aggregate at specified taxonomic rank beforehand. You can pipe the results of tax_agg into tax_transform, or equivalently set the rank argument in tax_transform.

Usage

tax_transform(
  data,
  trans,
  rank = NA,
  keep_counts = TRUE,
  chain = FALSE,
  zero_replace = 0,
  add = 0,
  transformation = NULL,
  ...
)
tax_transform(
  data,
  trans,
  rank = NA,
  keep_counts = TRUE,
  chain = FALSE,
  zero_replace = 0,
  add = 0,
  transformation = NULL,
  ...
)

Arguments

`data`	a phyloseq object or psExtra output from `tax_agg`
`trans`	any valid taxa transformation (e.g. from `microbiome::transform`)
`rank`	If data is phyloseq: data are aggregated at this rank before transforming. If NA, runs tax_agg(data, rank = NA). If rank is NA and data is already psExtra, any preceding aggregation is left as is.
`keep_counts`	if TRUE, store the pre-transformation count data in psExtra counts slot
`chain`	if TRUE, transforming again is possible when data are already transformed i.e. multiple transformations can be chained with multiple tax_transform calls
`zero_replace`	Replace any zeros with this value before transforming. Either a numeric, or "halfmin" which replaces zeros with half of the smallest value across the entire dataset. Beware: the choice of zero replacement is not tracked in the psExtra output.
`add`	Add this value to the otu_table before transforming. If `add` != 0, `zero_replace` does nothing. Either a numeric, or "halfmin". Beware: this choice is not tracked in the psExtra output.
`transformation`	deprecated, use `trans` instead!
`...`	any extra arguments passed to `microbiome::transform` or pass undetected = `⁠a number⁠` when using trans = "binary"

Details

This function often uses microbiome::transform internally and can perform the same transformations, including many from vegan::decostand (where the default MARGIN = 2). See below for notes about some of the available transformations.

tax_transform returns a psExtra containing the transformed phyloseq object and extra info (used for annotating ord_plot ordinations):

tax_transform (a string recording the transformation),
tax_agg (a string recording the taxonomic aggregation rank if specified here or earlier in tax_agg).

A few commonly used transformations:

"clr", or "rclr", perform the centered log ratio transformation, or the robust clr, using microbiome::transform
"compositional" converts the data into proportions, from 0 to 1.
"identity" does not transform the data, and records this choice for ord_plot
"binary" can be used to transform tax abundances into presence/abundance data.
"log2" which performs a log base 2 transformation (don't forget to set zero_replace if there are any zeros in your data)

Value

psExtra object including phyloseq and extra info

(r)clr transformation note

If any values are zero, the clr transform routine first adds a small pseudocount of min(relative abundance)/2 to all values. To avoid this, you can replace any zeros in advance by setting zero_replace to a number > 0.

The rclr transform does not replace zeros. Instead, only non-zero features are transformed, using the geometric mean of non-zero features as denominator.

Binary transformation notes

By default, otu_table values of 0 are kept as 0, and all positive values are converted to 1 (like decostand(method = "pa")). You can set a different threshold, by passing e.g. undetected = 10, for example, in which case all abundances of 10 or below would be converted to 0. All abundances above 10 would be converted to 1s.

Beware that the choice of detection threshold is not tracked in the psExtra.

Examples

data("dietswap", package = "microbiome")

# aggregate taxa at Phylum level and center log ratio transform the phyla counts
tax_transform(dietswap, trans = "clr", rank = "Phylum")

# this is equivalent to the two-step method (agg then transform)
tax_agg(dietswap, rank = "Phylum") %>% tax_transform("clr")

# does nothing except record tax_agg as "unique" and tax_transform as "identity" in psExtra info
dietswap %>% tax_transform("identity", rank = NA)

# binary transformation (convert abundances to presence/absence or detected/undetected)
tax_transform(dietswap, trans = "binary", rank = "Genus")
# change detection threshold by setting undetected argument (default is 0)
tax_transform(dietswap, trans = "binary", rank = "Genus", undetected = 50) %>%
  otu_get() %>%
  .[1:6, 1:4]

# log2 transformation after replacing all zeros with a pseudocount of 1
tax_transform(dietswap, trans = "log2", rank = "Family", zero_replace = 1)

# log2 transformation after replacing all zeros with a pseudocount of half
# the minimum non-zero count value in the aggregated dataset
tax_transform(dietswap, trans = "log2", rank = "Family", zero_replace = "halfmin")
data("dietswap", package = "microbiome")

# aggregate taxa at Phylum level and center log ratio transform the phyla counts
tax_transform(dietswap, trans = "clr", rank = "Phylum")

# this is equivalent to the two-step method (agg then transform)
tax_agg(dietswap, rank = "Phylum") %>% tax_transform("clr")

# does nothing except record tax_agg as "unique" and tax_transform as "identity" in psExtra info
dietswap %>% tax_transform("identity", rank = NA)

# binary transformation (convert abundances to presence/absence or detected/undetected)
tax_transform(dietswap, trans = "binary", rank = "Genus")
# change detection threshold by setting undetected argument (default is 0)
tax_transform(dietswap, trans = "binary", rank = "Genus", undetected = 50) %>%
  otu_get() %>%
  .[1:6, 1:4]

# log2 transformation after replacing all zeros with a pseudocount of 1
tax_transform(dietswap, trans = "log2", rank = "Family", zero_replace = 1)

# log2 transformation after replacing all zeros with a pseudocount of half
# the minimum non-zero count value in the aggregated dataset
tax_transform(dietswap, trans = "log2", rank = "Family", zero_replace = "halfmin")

Helper to specify a HeatmapAnnotation for taxa

Description

Helper to specify a HeatmapAnnotation for taxa

Usage

taxAnnotation(
  ...,
  name,
  annotation_legend_param = list(),
  show_legend = TRUE,
  gp = grid::gpar(col = NA),
  border = FALSE,
  gap = grid::unit(2, "mm"),
  show_annotation_name = TRUE,
  annotation_label = NULL,
  annotation_name_gp = grid::gpar(),
  annotation_name_offset = NULL,
  annotation_name_rot = NULL,
  annotation_name_align = TRUE,
  annotation_name_side = "auto",
  .data = NULL,
  .taxa = NULL,
  .side = NULL
)
taxAnnotation(
  ...,
  name,
  annotation_legend_param = list(),
  show_legend = TRUE,
  gp = grid::gpar(col = NA),
  border = FALSE,
  gap = grid::unit(2, "mm"),
  show_annotation_name = TRUE,
  annotation_label = NULL,
  annotation_name_gp = grid::gpar(),
  annotation_name_offset = NULL,
  annotation_name_rot = NULL,
  annotation_name_align = TRUE,
  annotation_name_side = "auto",
  .data = NULL,
  .taxa = NULL,
  .side = NULL
)

Arguments

`...`	Name-value pairs where the names correspond to annotation names and values are the output of taxon annotation functions such as anno_tax_prev() or manually specified AnnotationFunction objects
`name`	Name of the heatmap annotation, optional.
`annotation_legend_param`	A list which contains parameters for annotation legends. See `color_mapping_legend,ColorMapping-method` for all possible options.
`show_legend`	Whether show annotation legends. The value can be one single value or a vector.
`gp`	Graphic parameters for simple annotations (with `fill` parameter ignored).
`border`	border of single annotations.
`gap`	Gap between annotations. It can be a single value or a vector of `unit` objects.
`show_annotation_name`	Whether show annotation names? For column annotation, annotation names are drawn either on the left or the right, and for row annotations, names are draw either on top or at the bottom. The value can be a vector.
`annotation_label`	Labels for the annotations. By default it is the same as individual annotation names.
`annotation_name_gp`	Graphic parameters for annotation names. Graphic parameters can be vectors.
`annotation_name_offset`	Offset to the annotation names, a `unit` object. The value can be a vector.
`annotation_name_rot`	Rotation of the annotation names. The value can be a vector.
`annotation_name_align`	Whether to align the annotation names.
`annotation_name_side`	Side of the annotation names.
`.data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`.taxa`	OPTIONAL selection vector of taxa (names, numbers or logical), only set this if providing .data argument to override default
`.side`	OPTIONAL string, indicating the side the taxa annotation should be placed: only set this to override default

Value

HeatmapAnnotation object

Examples

library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")

customAxis <- list(labels_rot = 0, at = c(0, 0.5, 1))

# makes a function that takes data, taxa and which (at minimum)
fun <- taxAnnotation(
  gap = grid::unit(2.5, "mm"),
  Prev. = anno_tax_prev(axis_param = customAxis, ylim = c(0, 1), extend = 0),
  `Prop. Abd.` = anno_tax_box(size = unit(40, "mm"), axis_param = customAxis),
  `Log10p Abd.` = anno_tax_density(type = "heatmap")
)

# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(.data = psq, .side = "top", .taxa = taxa)

# draw the annotation without a heatmap, you will never normally do this!
grid.newpage()
vp <- viewport(width = 0.65, height = 0.75)
pushViewport(vp)
draw(heatmapAnnoFunction)

# try again as a row annotation
grid.newpage()
pushViewport(vp)
draw(fun(.data = psq, .side = "right", .taxa = rev(taxa)))
library("ComplexHeatmap")
data("ibd", package = "microViz")
psq <- tax_filter(ibd, min_prevalence = 5)
psq <- tax_mutate(psq, Species = NULL)
psq <- tax_fix(psq)
psq <- tax_agg(psq, rank = "Family")
taxa <- tax_top(psq, n = 15, rank = "Family")

customAxis <- list(labels_rot = 0, at = c(0, 0.5, 1))

# makes a function that takes data, taxa and which (at minimum)
fun <- taxAnnotation(
  gap = grid::unit(2.5, "mm"),
  Prev. = anno_tax_prev(axis_param = customAxis, ylim = c(0, 1), extend = 0),
  `Prop. Abd.` = anno_tax_box(size = unit(40, "mm"), axis_param = customAxis),
  `Log10p Abd.` = anno_tax_density(type = "heatmap")
)

# manually specify the prevalence barplot function by giving it data etc.
heatmapAnnoFunction <- fun(.data = psq, .side = "top", .taxa = taxa)

# draw the annotation without a heatmap, you will never normally do this!
grid.newpage()
vp <- viewport(width = 0.65, height = 0.75)
pushViewport(vp)
draw(heatmapAnnoFunction)

# try again as a row annotation
grid.newpage()
pushViewport(vp)
draw(fun(.data = psq, .side = "right", .taxa = rev(taxa)))

Create node and edge dataframes for taxatree_plots

Description

Mostly you will not have to use these functions directly: instead call taxatree_plots with the output of taxatree_stats

taxatree_nodes creates taxon nodes and calculates a summary statistic about each taxon (given by fun). Takes a psExtra or phyloseq object.
taxatree_edges uses the output of taxatree_nodes to create a dataframe of edges.

Usage

taxatree_nodes(
  ps,
  fun = list(sum = sum),
  ranks = "all",
  .sort = NULL,
  .use_counts = TRUE
)

taxatree_edges(nodes_df)
taxatree_nodes(
  ps,
  fun = list(sum = sum),
  ranks = "all",
  .sort = NULL,
  .use_counts = TRUE
)

taxatree_edges(nodes_df)

Arguments

`ps`	phyloseq object or psExtra
`fun`	function to calculate for each taxon/node
`ranks`	selection of taxonomic ranks to make nodes for ("all", or names)
`.sort`	sort nodes by "increasing" or "decreasing" values of fun function result
`.use_counts`	use count data if available (instead of transformed data)
`nodes_df`	dataframe output from taxatree_nodes

Details

taxatree_nodes makes nodes for taxa at all ranks or for a list of consecutive ranks (plus a root rank if tree is not rooted).

Add logical label column to taxatree_stats dataframe

Description

taxatree_label is used internally by taxatree_plotkey, but can also be used prior to taxatree_plots to label those plots directly

... arguments are passed to dplyr::filter(), so all filter syntax can be used.

Usage

taxatree_label(
  data,
  ...,
  .label_var = "label",
  .node_fun = list(prevalence = prev)
)
taxatree_label(
  data,
  ...,
  .label_var = "label",
  .node_fun = list(prevalence = prev)
)

Arguments

`data`	psExtra (or phyloseq)
`...`	REQUIRED logical conditions for labelling e.g. rank == "Phylum", p.value < 0.1 \| taxon %in% listOfTaxa
`.label_var`	name of label indicator variable to be created. If you change this, beware that taxatree_plotkey will not work, you will need to called taxatree_plot_label with
`.node_fun`	named list of length 1 providing `taxatree_nodes` `fun` arg. (name of list iterm is available for use in ...)

Details

If taxatree_stats missing (or if data is a phyloseq) it will create a plain taxatree_stats dataframe using only taxatree_nodes

node_fun can also be a precalculated dataframe (output of taxatree_nodes) but you should probably not use this option. This is used internally for efficiency inside taxatree_plotkey()

Value

psExtra with (modified) taxatree_stats dataframe

Examples

# simple example with plain phyloseq input
data("dietswap", package = "microbiome")
labelled <- dietswap %>%
  tax_prepend_ranks() %>%
  taxatree_label(rank == "Phylum", prevalence > 0.1)

# Note that "prevalence" column was available in data
# because it is created by `taxatree_nodes()` using the named function
# provided to the `node_fun` argument

# psExtra is returned
labelled

# notice how both conditions must be met for label column to be TRUE
labelled %>% taxatree_stats_get()
# simple example with plain phyloseq input
data("dietswap", package = "microbiome")
labelled <- dietswap %>%
  tax_prepend_ranks() %>%
  taxatree_label(rank == "Phylum", prevalence > 0.1)

# Note that "prevalence" column was available in data
# because it is created by `taxatree_nodes()` using the named function
# provided to the `node_fun` argument

# psExtra is returned
labelled

# notice how both conditions must be met for label column to be TRUE
labelled %>% taxatree_stats_get()

Statistical modelling for individual taxa across multiple ranks

Description

taxatree_models runs tax_model on every taxon at multiple taxonomic ranks (you choose which ranks with the plural ranks argument). It returns the results as a named nested list of models attached to a psExtra object. One list per rank, one model per taxon at each rank.

The result can then be used with taxatree_models2stats to extract a dataframe of statistics for use with taxatree_plots.

Usage

taxatree_models(
  ps,
  ranks = NULL,
  type = "lm",
  variables = NULL,
  formula = NULL,
  use_future = FALSE,
  checkVars = TRUE,
  checkNA = "warning",
  verbose = "rank",
  trans = "identity",
  trans_args = list(),
  ...
)
taxatree_models(
  ps,
  ranks = NULL,
  type = "lm",
  variables = NULL,
  formula = NULL,
  use_future = FALSE,
  checkVars = TRUE,
  checkNA = "warning",
  verbose = "rank",
  trans = "identity",
  trans_args = list(),
  ...
)

Arguments

`ps`	phyloseq object or psExtra
`ranks`	vector of rank names at which to aggregate taxa for modelling
`type`	name of regression modelling function, or the function itself
`variables`	vector of variable names, to be used as model formula right hand side. If variables is a list, not a vector, a model is fit for each entry in list.
`formula`	Right hand side of a formula, as a formula object or character string. Or a list of these. (alternative to variables argument, do not provide both)
`use_future`	if TRUE parallel processing with future is possible, see details of ?tax_model.
`checkVars`	check variance of variables?
`checkNA`	check variables for NAs?
`verbose`	message about progress: "rank" only notifies which rank is being processed; TRUE notifies you about each taxon being processed; FALSE for no messages.
`trans`	name of tax_transform transformation to apply to aggregated taxa before fitting statistical models
`trans_args`	named list of any additional arguments to tax_transform e.g. list(zero_replace = "halfmin")
`...`	extra arguments are passed directly to modelling function

Extract statistics from taxatree_models or tax_model output

Description

Runs a function e.g. broom::tidy on a list of models, i.e. the output of taxatree_models or tax_model

Usage

taxatree_models2stats(data, fun = "auto", ..., .keep_models = FALSE)

tax_models2stats(data, rank = NULL, fun = "auto", ..., .keep_models = FALSE)
taxatree_models2stats(data, fun = "auto", ..., .keep_models = FALSE)

tax_models2stats(data, rank = NULL, fun = "auto", ..., .keep_models = FALSE)

Arguments

`data`	psExtra with attached tax_models or taxatree_models list, or just the list of models
`fun`	function to assist extraction of stats dataframe from models, or "auto"
`...`	extra arguments passed to fun
`.keep_models`	should the models list be kept in the psExtra output?
`rank`	string naming rank at which tax_model was run (needed if data is a list)

Value

data.frame, attached to psExtra

Functions

taxatree_models2stats(): Extract stats from list or psExtra output of taxatree_models
tax_models2stats(): Extract stats from list or psExtra output of tax_model

Examples

# This example is an abbreviated excerpt from article on taxon modelling on
# the microViz documentation website

library(dplyr)
data("ibd", package = "microViz")

# We'll keep only the Ulcerative Colitis and Healthy Control samples, to
# simplify the analyses for this example. We'll also remove the Species
# rank information, as most OTUs in this dataset are not assigned to a
# species. We'll also use `tax_fix` to fill any gaps where the Genus is
# unknown, with the family name or whatever higher rank classification is
# known.

phylo <- ibd %>%
  ps_filter(DiseaseState %in% c("UC", "nonIBD")) %>%
  tax_mutate(Species = NULL) %>%
  tax_fix()

# Let's make some sample data variables that are easier to use and compare
# in the statistical modelling ahead. We will convert dichotomous
# categorical variables into similar binary variables (values: 1 for true,
# or 0 for false). We will also scale and center the numeric variable for
# age.

phylo <- phylo %>%
  ps_mutate(
    UC = ifelse(DiseaseState == "UC", yes = 1, no = 0),
    female = ifelse(gender == "female", yes = 1, no = 0),
    antibiotics = ifelse(abx == "abx", yes = 1, no = 0),
    steroids = ifelse(steroids == "steroids", yes = 1, no = 0),
    age_scaled = scale(age, center = TRUE, scale = TRUE)
  )

lm_models <- phylo %>%
  tax_fix() %>%
  tax_prepend_ranks() %>%
  tax_transform("compositional", rank = "Genus") %>%
  tax_filter(min_prevalence = 0.1, use_counts = TRUE) %>%
  taxatree_models(
    type = lm,
    trans = "log2", trans_args = list(zero_replace = "halfmin"),
    ranks = c("Phylum", "Class", "Genus"),
    variables = c("UC", "female", "antibiotics", "steroids", "age_scaled")
  )

lm_stats <- lm_models %>% taxatree_models2stats()

# inspect the psExtra returned, now with taxatree_stats dataframe added
lm_stats

# inspect the dataframe itself
lm_stats %>% taxatree_stats_get()

# keep the models on the psExtra object
lm_models %>% taxatree_models2stats(.keep_models = TRUE)

# you can adjust the p values with taxatree_stats_p_adjust

# you can plot the results with taxatree_plots
# This example is an abbreviated excerpt from article on taxon modelling on
# the microViz documentation website

library(dplyr)
data("ibd", package = "microViz")

# We'll keep only the Ulcerative Colitis and Healthy Control samples, to
# simplify the analyses for this example. We'll also remove the Species
# rank information, as most OTUs in this dataset are not assigned to a
# species. We'll also use `tax_fix` to fill any gaps where the Genus is
# unknown, with the family name or whatever higher rank classification is
# known.

phylo <- ibd %>%
  ps_filter(DiseaseState %in% c("UC", "nonIBD")) %>%
  tax_mutate(Species = NULL) %>%
  tax_fix()

# Let's make some sample data variables that are easier to use and compare
# in the statistical modelling ahead. We will convert dichotomous
# categorical variables into similar binary variables (values: 1 for true,
# or 0 for false). We will also scale and center the numeric variable for
# age.

phylo <- phylo %>%
  ps_mutate(
    UC = ifelse(DiseaseState == "UC", yes = 1, no = 0),
    female = ifelse(gender == "female", yes = 1, no = 0),
    antibiotics = ifelse(abx == "abx", yes = 1, no = 0),
    steroids = ifelse(steroids == "steroids", yes = 1, no = 0),
    age_scaled = scale(age, center = TRUE, scale = TRUE)
  )

lm_models <- phylo %>%
  tax_fix() %>%
  tax_prepend_ranks() %>%
  tax_transform("compositional", rank = "Genus") %>%
  tax_filter(min_prevalence = 0.1, use_counts = TRUE) %>%
  taxatree_models(
    type = lm,
    trans = "log2", trans_args = list(zero_replace = "halfmin"),
    ranks = c("Phylum", "Class", "Genus"),
    variables = c("UC", "female", "antibiotics", "steroids", "age_scaled")
  )

lm_stats <- lm_models %>% taxatree_models2stats()

# inspect the psExtra returned, now with taxatree_stats dataframe added
lm_stats

# inspect the dataframe itself
lm_stats %>% taxatree_stats_get()

# keep the models on the psExtra object
lm_models %>% taxatree_models2stats(.keep_models = TRUE)

# you can adjust the p values with taxatree_stats_p_adjust

# you can plot the results with taxatree_plots

Add labels to taxatree plots/key

Description

Finer control over label drawing for taxatree_plotkey (with .draw_label = FALSE), and label drawing for taxatree_plots output too.

Usage

taxatree_plot_labels(
  p,
  circular = TRUE,
  taxon_renamer = identity,
  fun = ggrepel::geom_text_repel,
  label_var = "label",
  x_nudge = 0.1,
  y_nudge = 0.025,
  rotate = 0,
  fontface = "bold",
  size = 2.5,
  colour = "grey15",
  max.overlaps = Inf,
  min.segment.length = 0,
  segment.size = 0.15,
  segment.color = "grey15",
  point.padding = 0.05,
  box.padding = 0.1,
  seed = NA,
  ...
)
taxatree_plot_labels(
  p,
  circular = TRUE,
  taxon_renamer = identity,
  fun = ggrepel::geom_text_repel,
  label_var = "label",
  x_nudge = 0.1,
  y_nudge = 0.025,
  rotate = 0,
  fontface = "bold",
  size = 2.5,
  colour = "grey15",
  max.overlaps = Inf,
  min.segment.length = 0,
  segment.size = 0.15,
  segment.color = "grey15",
  point.padding = 0.05,
  box.padding = 0.1,
  seed = NA,
  ...
)

Arguments

`p`	taxatree_plotkey or taxatree_plots output plot
`circular`	is the plot layout circular? labels are drawn differently for circular trees
`taxon_renamer`	function that takes taxon names and returns modified names for labels
`fun`	ggrepel labelling function: geom_text_repel or geom_label_repel
`label_var`	name of variable in taxatree_stats that indicates which taxa to label
`x_nudge`	absolute amount by which the initial position of taxon labels is nudged (relevant only for circular layouts, use nudge_x for other layouts)
`y_nudge`	absolute amount by which the initial position of taxon labels is nudged (relevant only for circular layouts, use nudge_y for other layouts)
`rotate`	angle to rotate labels' outer edges away from horizontal (relevant only for circular layouts, use angle for other layouts)
`fontface`	fontface of label text
`size`	size of labels
`colour`	colour of label outlines and text
`max.overlaps`	max number of overlapping labels tolerated
`min.segment.length`	min length of label line segment to bother drawing
`segment.size`	thickness of line segment
`segment.color`	colour of line segment
`point.padding`	padding around node points (for label positioning)
`box.padding`	padding around labels/text (for label positioning)
`seed`	set this for reproducible label positions
`...`	Arguments passed on to `ggrepel::geom_text_repel` `arrow` specification for arrow heads, as created by `arrow` `force` Force of repulsion between overlapping text labels. Defaults to 1. `force_pull` Force of attraction between a text label and its corresponding data point. Defaults to 1. `max.time` Maximum number of seconds to try to resolve overlaps. Defaults to 0.5. `max.iter` Maximum number of iterations to try to resolve overlaps. Defaults to 10000. `xlim,ylim` Limits for the x and y axes. Text labels will be constrained to these limits. By default, text labels are constrained to the entire plot area. `direction` "both", "x", or "y" – direction in which to adjust position of labels `verbose` If `TRUE`, some diagnostics of the repel algorithm are printed

Draw labelled key to accompany taxatree_plots

Description

Draw labelled key to accompany taxatree_plots

Usage

taxatree_plotkey(
  data,
  ...,
  size_stat = list(prevalence = prev),
  node_size_range = c(1.5, 5),
  edge_width_range = node_size_range * 0.8,
  size_guide = "none",
  size_trans = "identity",
  colour = "lightgrey",
  edge_alpha = 0.7,
  title = "Key",
  title_size = 14,
  taxon_renamer = identity,
  .combine_label = any,
  .draw_label = TRUE,
  .calc_label = TRUE,
  layout = "tree",
  layout_seed = NA,
  circular = identical(layout, "tree"),
  node_sort = NULL,
  add_circles = isTRUE(circular),
  drop_ranks = TRUE
)
taxatree_plotkey(
  data,
  ...,
  size_stat = list(prevalence = prev),
  node_size_range = c(1.5, 5),
  edge_width_range = node_size_range * 0.8,
  size_guide = "none",
  size_trans = "identity",
  colour = "lightgrey",
  edge_alpha = 0.7,
  title = "Key",
  title_size = 14,
  taxon_renamer = identity,
  .combine_label = any,
  .draw_label = TRUE,
  .calc_label = TRUE,
  layout = "tree",
  layout_seed = NA,
  circular = identical(layout, "tree"),
  node_sort = NULL,
  add_circles = isTRUE(circular),
  drop_ranks = TRUE
)

Arguments

`data`	psExtra (or phyloseq)
`...`	logical conditions for labelling e.g. rank == "Phylum", p.value < 0.1 \| taxon %in% listOfTaxa
`size_stat`	named list of length 1, giving function calculated for each taxon, to determine the size of nodes (and edges). Name used as size legend title.
`node_size_range`	min and max node sizes, decrease to avoid node overlap
`edge_width_range`	min and max edge widths
`size_guide`	guide for node sizes, try "none", "legend" or ggplot2::guide_legend()
`size_trans`	transformation for size scale you can use (the name of) any transformer from the scales package, such as "identity", "log1p", or "sqrt"
`colour`	fixed colour and fill of nodes and edges
`edge_alpha`	fixed alpha value for edges
`title`	title of plot (NULL for none)
`title_size`	font size of title
`taxon_renamer`	function that takes taxon names and returns modified names for labels
`.combine_label`	all or any: function to combine multiple logical "label" values for a taxon (relevant if taxatree_stats already present in data)
`.draw_label`	should labels be drawn, or just the bare tree, set this to FALSE if you want to draw customised labels with taxatree_plot_labels afterwards
`.calc_label`	if you already set labels with taxatree_label: set this to FALSE to use / avoid overwriting that data (ignores `...` if FALSE)
`layout`	any ggraph layout, default is "tree"
`layout_seed`	any numeric, required if a stochastic igraph layout is named
`circular`	should the layout be circular?
`node_sort`	sort nodes by "increasing" or "decreasing" size? NULL for no sorting. Use `tax_sort()` before `taxatree_plots()` for finer control.
`add_circles`	add faint concentric circles to plot, behind each rank?
`drop_ranks`	drop ranks from tree if not included in stats dataframe

Value

ggplot

Examples

library(ggplot2)
# Normally you make a key to accompany taxatree_plots showing stats results
# So see ?taxatree_plots examples too!
#
# You can also use the taxatree key visualization just to help understand your
# tax_table contents and hierarchical structure
shao19 %>%
  ps_filter(family_id %in% 1:5) %>%
  tax_filter(min_prevalence = 7) %>%
  tax_fix() %>%
  tax_agg("genus") %>%
  tax_prepend_ranks() %>%
  taxatree_plotkey(rank %in% c("phylum", "genus"))


# # Let's look at some of the most prevalent Actinobacteria
actinoOnlyPhylo <- shao19 %>%
  tax_select("Actinobacteria", ranks_searched = "phylum") %>%
  tax_filter(min_prevalence = 100)

actinoOnlyPhylo %>%
  tax_mutate(order = NULL) %>%
  tax_sort(by = "prev", at = "species", tree_warn = FALSE) %>%
  taxatree_plotkey(
    circular = FALSE, rank != "genus",
    .draw_label = FALSE, # suppress drawing now so we can make custom labels after
    size_guide = "legend", colour = "skyblue", edge_alpha = 0.2
  ) %>%
  taxatree_plot_labels(
    circular = FALSE, # be sure you match the plotkey layout
    fun = geom_text, fontface = "bold.italic", size = 2.5, hjust = 0,
    nudge_y = 0.1 # nudge y if you want to nudge x (due to coord_flip!)
  ) +
  coord_flip(clip = "off") +
  scale_y_reverse(expand = expansion(add = c(0.2, 2))) +
  theme(legend.position = c(0.15, 0.25), legend.background = element_rect())


# You can even choose other tree layouts from igraph (e.g. kk)
# and configure multiple styles of custom labels on one plot
set.seed(1) # set seed for reproducibility of ggrepel label positions
actinoOnlyPhylo %>%
  tax_mutate(order = NULL) %>%
  taxatree_label(rank == "family", .label_var = "family_lab") %>%
  taxatree_label(rank == "species", .label_var = "species_lab") %>%
  taxatree_plotkey(
    circular = FALSE, rank != "genus", layout = "kk",
    .draw_label = FALSE, # important not to draw the default labels
    colour = "skyblue", edge_alpha = 0.2
  ) %>%
  taxatree_plot_labels(
    circular = FALSE, label_var = "family_lab", fun = geom_label,
    fontface = "bold.italic", colour = "orange", fill = "black", size = 3
  ) %>%
  taxatree_plot_labels(
    circular = FALSE, label_var = "species_lab", fun = ggrepel::geom_text_repel,
    fontface = "bold.italic", size = 2, force = 20, max.time = 0.1
  )
library(ggplot2)
# Normally you make a key to accompany taxatree_plots showing stats results
# So see ?taxatree_plots examples too!
#
# You can also use the taxatree key visualization just to help understand your
# tax_table contents and hierarchical structure
shao19 %>%
  ps_filter(family_id %in% 1:5) %>%
  tax_filter(min_prevalence = 7) %>%
  tax_fix() %>%
  tax_agg("genus") %>%
  tax_prepend_ranks() %>%
  taxatree_plotkey(rank %in% c("phylum", "genus"))


# # Let's look at some of the most prevalent Actinobacteria
actinoOnlyPhylo <- shao19 %>%
  tax_select("Actinobacteria", ranks_searched = "phylum") %>%
  tax_filter(min_prevalence = 100)

actinoOnlyPhylo %>%
  tax_mutate(order = NULL) %>%
  tax_sort(by = "prev", at = "species", tree_warn = FALSE) %>%
  taxatree_plotkey(
    circular = FALSE, rank != "genus",
    .draw_label = FALSE, # suppress drawing now so we can make custom labels after
    size_guide = "legend", colour = "skyblue", edge_alpha = 0.2
  ) %>%
  taxatree_plot_labels(
    circular = FALSE, # be sure you match the plotkey layout
    fun = geom_text, fontface = "bold.italic", size = 2.5, hjust = 0,
    nudge_y = 0.1 # nudge y if you want to nudge x (due to coord_flip!)
  ) +
  coord_flip(clip = "off") +
  scale_y_reverse(expand = expansion(add = c(0.2, 2))) +
  theme(legend.position = c(0.15, 0.25), legend.background = element_rect())


# You can even choose other tree layouts from igraph (e.g. kk)
# and configure multiple styles of custom labels on one plot
set.seed(1) # set seed for reproducibility of ggrepel label positions
actinoOnlyPhylo %>%
  tax_mutate(order = NULL) %>%
  taxatree_label(rank == "family", .label_var = "family_lab") %>%
  taxatree_label(rank == "species", .label_var = "species_lab") %>%
  taxatree_plotkey(
    circular = FALSE, rank != "genus", layout = "kk",
    .draw_label = FALSE, # important not to draw the default labels
    colour = "skyblue", edge_alpha = 0.2
  ) %>%
  taxatree_plot_labels(
    circular = FALSE, label_var = "family_lab", fun = geom_label,
    fontface = "bold.italic", colour = "orange", fill = "black", size = 3
  ) %>%
  taxatree_plot_labels(
    circular = FALSE, label_var = "species_lab", fun = ggrepel::geom_text_repel,
    fontface = "bold.italic", size = 2, force = 20, max.time = 0.1
  )

Plot statistical model results for all taxa on a taxonomic tree

Description

Uses a psExtra object to make a tree graph structure from the taxonomic table.
Then adds statistical results stored in "taxatree_stats" of psExtra data
You must use taxatree_models() first to generate statistical model results.
You can adjust p-values with taxatree_stats_p_adjust()

Usage

taxatree_plots(
  data,
  colour_stat = "estimate",
  palette = "Green-Brown",
  reverse_palette = FALSE,
  colour_lims = NULL,
  colour_oob = scales::oob_squish,
  colour_trans = "abs_sqrt",
  size_stat = list(prevalence = prev),
  node_size_range = c(1, 4),
  edge_width_range = node_size_range * 0.8,
  size_guide = "legend",
  size_trans = "identity",
  sig_stat = "p.value",
  sig_threshold = 0.05,
  sig_shape = "circle filled",
  sig_size = 0.75,
  sig_stroke = 0.75,
  sig_colour = "white",
  edge_alpha = 0.7,
  vars = "term",
  var_renamer = identity,
  title_size = 10,
  layout = "tree",
  layout_seed = NA,
  circular = identical(layout, "tree"),
  node_sort = NULL,
  add_circles = isTRUE(circular),
  drop_ranks = TRUE,
  l1 = if (palette == "Green-Brown") 10 else NULL,
  l2 = if (palette == "Green-Brown") 85 else NULL,
  colour_na = "grey35"
)
taxatree_plots(
  data,
  colour_stat = "estimate",
  palette = "Green-Brown",
  reverse_palette = FALSE,
  colour_lims = NULL,
  colour_oob = scales::oob_squish,
  colour_trans = "abs_sqrt",
  size_stat = list(prevalence = prev),
  node_size_range = c(1, 4),
  edge_width_range = node_size_range * 0.8,
  size_guide = "legend",
  size_trans = "identity",
  sig_stat = "p.value",
  sig_threshold = 0.05,
  sig_shape = "circle filled",
  sig_size = 0.75,
  sig_stroke = 0.75,
  sig_colour = "white",
  edge_alpha = 0.7,
  vars = "term",
  var_renamer = identity,
  title_size = 10,
  layout = "tree",
  layout_seed = NA,
  circular = identical(layout, "tree"),
  node_sort = NULL,
  add_circles = isTRUE(circular),
  drop_ranks = TRUE,
  l1 = if (palette == "Green-Brown") 10 else NULL,
  l2 = if (palette == "Green-Brown") 85 else NULL,
  colour_na = "grey35"
)

Arguments

`data`	psExtra with taxatree_stats, e.g. output of `taxatree_models2stats()`
`colour_stat`	name of variable to scale colour/fill of nodes and edges
`palette`	diverging hcl colour palette name from `colorspace::hcl_palettes("diverging")`
`reverse_palette`	reverse direction of colour palette?
`colour_lims`	limits of colour and fill scale, NULL will infer lims from range of all data
`colour_oob`	scales function to handle colour_stat values outside of colour_lims (default simply squishes "out of bounds" values into the given range)
`colour_trans`	name of transformation for colour scale: default is "abs_sqrt", the square-root of absolute values, but you can use the name of any transformer from the `scales` package, such as "identity" or "exp"
`size_stat`	named list of length 1, giving function calculated for each taxon, to determine the size of nodes (and edges). Name used as size legend title.
`node_size_range`	min and max node sizes, decrease to avoid node overlap
`edge_width_range`	min and max edge widths
`size_guide`	guide for node sizes, try "none", "legend" or ggplot2::guide_legend()
`size_trans`	transformation for size scale you can use (the name of) any transformer from the scales package, such as "identity", "log1p", or "sqrt"
`sig_stat`	name of variable indicating statistical significance
`sig_threshold`	value of sig_stat variable indicating statistical significance (below this)
`sig_shape`	fixed shape for significance marker
`sig_size`	fixed size for significance marker
`sig_stroke`	fixed stroke width for significance marker
`sig_colour`	fixed colour for significance marker (used as fill for filled shapes)
`edge_alpha`	fixed alpha value for edges
`vars`	name of column indicating terms in models (one plot made per term)
`var_renamer`	function to rename variables for plot titles
`title_size`	font size of title
`layout`	any ggraph layout, default is "tree"
`layout_seed`	any numeric, required if a stochastic igraph layout is named
`circular`	should the layout be circular?
`node_sort`	sort nodes by "increasing" or "decreasing" size? NULL for no sorting. Use `tax_sort()` before `taxatree_plots()` for finer control.
`add_circles`	add faint concentric circles to plot, behind each rank?
`drop_ranks`	drop ranks from tree if not included in stats dataframe
`l1`	Luminance value at the scale endpoints, NULL for palette's default
`l2`	Luminance value at the scale midpoint, NULL for palette's default
`colour_na`	colour for NA values in tree. (if unused ranks are not dropped, they will have NA values for colour_stat)

Details

taxatree_plotkey plots same layout as taxatree_plots, but in a fixed colour

See website article for more examples of use: https://david-barnett.github.io/microViz/articles/web-only/modelling-taxa.html

Uses ggraph, see help for main underlying graphing function with ?ggraph::ggraph

It is possible to provide multiple significance markers for multiple thresholds, by passing vectors to the sig_shape, sig_threshold, etc. arguments. It is critically important that the thresholds are provided in decreasing order of severity, e.g. sig_threshold = c(0.001, 0.01, 0.1) and you must provide a shape value for each of them.

Value

list of ggraph ggplots

Examples

# Limited examples, see website article for more

library(dplyr)
library(ggplot2)

data(dietswap, package = "microbiome")
ps <- dietswap

# create some binary variables for easy visualisation
ps <- ps %>% ps_mutate(
  female = if_else(sex == "female", 1, 0, NaN),
  african = if_else(nationality == "AFR", 1, 0, NaN)
)

# This example dataset has some taxa with the same name for phylum and family...
# We can fix problems like this with the tax_prepend_ranks function
# (This will always happen with Actinobacteria!)
ps <- tax_prepend_ranks(ps)

# filter out rare taxa
ps <- ps %>% tax_filter(
  min_prevalence = 0.5, prev_detection_threshold = 100
)

# delete the Family rank as we will not use it for this small example
# this is necessary as taxatree_plots can only plot consecutive ranks
ps <- ps %>% tax_mutate(Family = NULL)

# specify variables used for modelling
models <- taxatree_models(
  ps = ps, type = corncob::bbdml, ranks = c("Phylum", "Genus"),
  formula = ~ female + african, verbose = TRUE
)
# models list stored as attachment in psExtra
models

# get stats from models
stats <- taxatree_models2stats(models, param = "mu")
stats

plots <- taxatree_plots(
  data = stats, colour_trans = "identity",
  size_stat = list(mean = mean),
  size_guide = "legend", node_size_range = c(1, 6)
)

# if you change the size_stat for the plots, do the same for the key!!
key <- taxatree_plotkey(
  data = stats,
  rank == "Phylum" | p.value < 0.05, # labelling criteria
  .combine_label = all, # label only taxa where criteria met for both plots
  size_stat = list(mean = mean),
  node_size_range = c(2, 7), size_guide = "none",
  taxon_renamer = function(x) {
    stringr::str_remove_all(x, "[PG]: | [ae]t rel.")
  }
)

# cowplot is powerful for arranging trees and key and colourbar legend
legend <- cowplot::get_legend(plots[[1]])
plot_col <- cowplot::plot_grid(
  plots[[1]] + theme(legend.position = "none"),
  plots[[2]] + theme(legend.position = "none"),
  ncol = 1
)
cowplot::plot_grid(key, plot_col, legend, nrow = 1, rel_widths = c(4, 2, 1))
# Limited examples, see website article for more

library(dplyr)
library(ggplot2)

data(dietswap, package = "microbiome")
ps <- dietswap

# create some binary variables for easy visualisation
ps <- ps %>% ps_mutate(
  female = if_else(sex == "female", 1, 0, NaN),
  african = if_else(nationality == "AFR", 1, 0, NaN)
)

# This example dataset has some taxa with the same name for phylum and family...
# We can fix problems like this with the tax_prepend_ranks function
# (This will always happen with Actinobacteria!)
ps <- tax_prepend_ranks(ps)

# filter out rare taxa
ps <- ps %>% tax_filter(
  min_prevalence = 0.5, prev_detection_threshold = 100
)

# delete the Family rank as we will not use it for this small example
# this is necessary as taxatree_plots can only plot consecutive ranks
ps <- ps %>% tax_mutate(Family = NULL)

# specify variables used for modelling
models <- taxatree_models(
  ps = ps, type = corncob::bbdml, ranks = c("Phylum", "Genus"),
  formula = ~ female + african, verbose = TRUE
)
# models list stored as attachment in psExtra
models

# get stats from models
stats <- taxatree_models2stats(models, param = "mu")
stats

plots <- taxatree_plots(
  data = stats, colour_trans = "identity",
  size_stat = list(mean = mean),
  size_guide = "legend", node_size_range = c(1, 6)
)

# if you change the size_stat for the plots, do the same for the key!!
key <- taxatree_plotkey(
  data = stats,
  rank == "Phylum" | p.value < 0.05, # labelling criteria
  .combine_label = all, # label only taxa where criteria met for both plots
  size_stat = list(mean = mean),
  node_size_range = c(2, 7), size_guide = "none",
  taxon_renamer = function(x) {
    stringr::str_remove_all(x, "[PG]: | [ae]t rel.")
  }
)

# cowplot is powerful for arranging trees and key and colourbar legend
legend <- cowplot::get_legend(plots[[1]])
plot_col <- cowplot::plot_grid(
  plots[[1]] + theme(legend.position = "none"),
  plots[[2]] + theme(legend.position = "none"),
  ncol = 1
)
cowplot::plot_grid(key, plot_col, legend, nrow = 1, rel_widths = c(4, 2, 1))

Adjust p values in taxatree_stats dataframe

Description

Apply a p value adjustment method from stats::p.adjust.methods, such as false-discovery rate adjustment with "BH", or more conservative family-wise error rate controlling methods such as "holm" or "bonferroni".

Usage

taxatree_stats_p_adjust(
  data,
  method,
  grouping = "rank",
  p = "p.value",
  new_var = NULL
)
taxatree_stats_p_adjust(
  data,
  method,
  grouping = "rank",
  p = "p.value",
  new_var = NULL
)

Arguments

`data`	psExtra with taxatree_stats dataframe, or just the dataframe
`method`	any method from `stats::p.adjust.methods`
`grouping`	defines grouping of p-values into families for adjustment, see details.
`p`	name of variable containing p values for adjustment
`new_var`	name of new variable created for adjusted p values (automatically inferred by default)

Details

Define how to group the p values for adjustment with the grouping argument. The default is to adjust the p values in groups at each taxonomic rank, but you could also adjust per "taxon" or per "term". Or even group by a combination of rank and term with c("rank", "term"). You should specify the name of the new variable containing the adjusted p values in the new_var argument. If left as NULL the new variable name will be created by pasting together p.adj, the method, and the grouping variable(s) separated by ".".

Value

psExtra with dataframe of statistics, or just the data.frame

Examples

# This example is an abbreviated excerpt from article on taxon modelling on
# the microViz documentation website

library(corncob)
library(dplyr)
data("ibd", package = "microViz")

# We'll keep only the Ulcerative Colitis and Healthy Control samples, to
# simplify the analyses for this example. We'll also remove the Species
# rank information, as most OTUs in this dataset are not assigned to a
# species. We'll also use `tax_fix` to fill any gaps where the Genus is
# unknown, with the family name or whatever higher rank classification is
# known.

phylo <- ibd %>%
  ps_filter(DiseaseState %in% c("UC", "nonIBD")) %>%
  tax_mutate(Species = NULL) %>%
  tax_fix()

# Let's make some sample data variables that are easier to use and compare
# in the statistical modelling ahead. We will convert dichotomous
# categorical variables into similar binary variables (values: 1 for true,
# or 0 for false). We will also scale and center the numeric variable for
# age.

phylo <- phylo %>%
  ps_mutate(
    UC = ifelse(DiseaseState == "UC", yes = 1, no = 0),
    female = ifelse(gender == "female", yes = 1, no = 0),
    antibiotics = ifelse(abx == "abx", yes = 1, no = 0),
    steroids = ifelse(steroids == "steroids", yes = 1, no = 0),
    age_scaled = scale(age, center = TRUE, scale = TRUE)
  )

bb_models <- phylo %>%
  tax_fix() %>%
  tax_prepend_ranks() %>%
  tax_filter(min_prevalence = 0.3) %>%
  taxatree_models(
    type = corncob::bbdml,
    ranks = c("Phylum", "Class", "Order"),
    variables = c("UC", "female", "antibiotics", "steroids", "age_scaled")
  )

bb_stats <- bb_models %>%
  taxatree_models2stats(param = "mu") %>%
  taxatree_stats_p_adjust(method = "BH", grouping = "rank")

bb_stats

bb_stats %>% taxatree_stats_get()

# you can also directly modify the dataframe,
# and choose a different variable name
bb_stats %>%
  taxatree_stats_get() %>%
  taxatree_stats_p_adjust(
    method = "holm", grouping = "taxon", new_var = "p_adj_holm"
  )

# see all available adjustment methods
stats::p.adjust.methods
# This example is an abbreviated excerpt from article on taxon modelling on
# the microViz documentation website

library(corncob)
library(dplyr)
data("ibd", package = "microViz")

# We'll keep only the Ulcerative Colitis and Healthy Control samples, to
# simplify the analyses for this example. We'll also remove the Species
# rank information, as most OTUs in this dataset are not assigned to a
# species. We'll also use `tax_fix` to fill any gaps where the Genus is
# unknown, with the family name or whatever higher rank classification is
# known.

phylo <- ibd %>%
  ps_filter(DiseaseState %in% c("UC", "nonIBD")) %>%
  tax_mutate(Species = NULL) %>%
  tax_fix()

# Let's make some sample data variables that are easier to use and compare
# in the statistical modelling ahead. We will convert dichotomous
# categorical variables into similar binary variables (values: 1 for true,
# or 0 for false). We will also scale and center the numeric variable for
# age.

phylo <- phylo %>%
  ps_mutate(
    UC = ifelse(DiseaseState == "UC", yes = 1, no = 0),
    female = ifelse(gender == "female", yes = 1, no = 0),
    antibiotics = ifelse(abx == "abx", yes = 1, no = 0),
    steroids = ifelse(steroids == "steroids", yes = 1, no = 0),
    age_scaled = scale(age, center = TRUE, scale = TRUE)
  )

bb_models <- phylo %>%
  tax_fix() %>%
  tax_prepend_ranks() %>%
  tax_filter(min_prevalence = 0.3) %>%
  taxatree_models(
    type = corncob::bbdml,
    ranks = c("Phylum", "Class", "Order"),
    variables = c("UC", "female", "antibiotics", "steroids", "age_scaled")
  )

bb_stats <- bb_models %>%
  taxatree_models2stats(param = "mu") %>%
  taxatree_stats_p_adjust(method = "BH", grouping = "rank")

bb_stats

bb_stats %>% taxatree_stats_get()

# you can also directly modify the dataframe,
# and choose a different variable name
bb_stats %>%
  taxatree_stats_get() %>%
  taxatree_stats_p_adjust(
    method = "holm", grouping = "taxon", new_var = "p_adj_holm"
  )

# see all available adjustment methods
stats::p.adjust.methods

Convert old format "ps_extra" objects to new "psExtra" objects

Description

This will only be necessary if you have saved old format "ps_extra" objects generated by an old microViz version (< 0.10.0), and you cannot or do not want to regenerate these old format objects from your original phyloseq object.

Usage

upgrade_ps_extra_to_psExtra(ps_extra)
upgrade_ps_extra_to_psExtra(ps_extra)

Arguments

ps_extra

an old format "ps_extra" object, as generated by old microViz versions (< 0.10.0)

Value

new format "psExtra" S4 object

Examples

# read your old saved 'ps_extra' object that you want to keep using
# oldObject <- readRDS("old-object-path.rds")
# newObject <- upgrade_ps_extra_to_psExtra(oldObject)
# continue with your next analysis or plotting steps...
# read your old saved 'ps_extra' object that you want to keep using
# oldObject <- readRDS("old-object-path.rds")
# newObject <- upgrade_ps_extra_to_psExtra(oldObject)
# continue with your next analysis or plotting steps...

Helper to specify a HeatmapAnnotation for variables in cor_heatmap

Description

Helper to specify a HeatmapAnnotation for variables in cor_heatmap

Usage

varAnnotation(
  ...,
  name,
  annotation_legend_param = list(),
  show_legend = TRUE,
  gp = grid::gpar(col = NA),
  border = FALSE,
  gap = grid::unit(2, "mm"),
  show_annotation_name = TRUE,
  annotation_label = NULL,
  annotation_name_gp = grid::gpar(),
  annotation_name_offset = NULL,
  annotation_name_rot = NULL,
  annotation_name_align = FALSE,
  annotation_name_side = "auto",
  .data = NULL,
  .vars = NULL,
  .side = NULL
)
varAnnotation(
  ...,
  name,
  annotation_legend_param = list(),
  show_legend = TRUE,
  gp = grid::gpar(col = NA),
  border = FALSE,
  gap = grid::unit(2, "mm"),
  show_annotation_name = TRUE,
  annotation_label = NULL,
  annotation_name_gp = grid::gpar(),
  annotation_name_offset = NULL,
  annotation_name_rot = NULL,
  annotation_name_align = FALSE,
  annotation_name_side = "auto",
  .data = NULL,
  .vars = NULL,
  .side = NULL
)

Arguments

`...`	Name-value pairs where the names correspond to annotation names and values are the output of variable annotation functions such as anno_var_box(), or manually specified AnnotationFunction objects
`name`	Name of the heatmap annotation, optional.
`annotation_legend_param`	A list which contains parameters for annotation legends. See `color_mapping_legend,ColorMapping-method` for all possible options.
`show_legend`	Whether show annotation legends. The value can be one single value or a vector.
`gp`	Graphic parameters for simple annotations (with `fill` parameter ignored).
`border`	border of single annotations.
`gap`	Gap between annotations. It can be a single value or a vector of `unit` objects.
`show_annotation_name`	Whether show annotation names? For column annotation, annotation names are drawn either on the left or the right, and for row annotations, names are draw either on top or at the bottom. The value can be a vector.
`annotation_label`	Labels for the annotations. By default it is the same as individual annotation names.
`annotation_name_gp`	Graphic parameters for annotation names. Graphic parameters can be vectors.
`annotation_name_offset`	Offset to the annotation names, a `unit` object. The value can be a vector.
`annotation_name_rot`	Rotation of the annotation names. The value can be a vector.
`annotation_name_align`	Whether to align the annotation names.
`annotation_name_side`	Side of the annotation names.
`.data`	OPTIONAL phyloseq or psExtra, only set this to override use of same data as in heatmap
`.vars`	OPTIONAL selection vector of variables (names, numbers or logical), only set this if providing .data argument to override default
`.side`	OPTIONAL string, indicating the side for the variable annotations: only set this to override default

Value

HeatmapAnnotation object

Package 'microViz'

Help Index

Add paths connecting points on a ggplot scatterplot

Description

Usage

Arguments

Value

Examples

Simple heatmap helper to get a default adjacent side for another annotation

Description

Usage

Arguments

Value

Examples

Create colored rectangle annotations for categorical data

Description

Usage

Arguments

Value

Examples

Convenience function for generating a legend for anno_cat annotations.

Description

Usage

Arguments

Value

Examples

Helper to specify simple comp_heatmap annotation for other sample data

Description

Usage

Arguments

Value

See Also

Examples

Helper to specify comp_heatmap annotation for categorical sample data

Description

Usage

Arguments

Value

Examples

Helper to specify heatmap annotation for showing taxa abundance on boxplot

Description

Usage

Arguments

Value

Examples

Helper to specify heatmap annotation for showing taxa abundance density plot

Description

Usage

Arguments

Value

Examples

Helper to specify heatmap annotation for showing taxa prevalence as barplot

Description

Usage

Arguments

Value

Examples

Helper to specify heatmap annotation for variable distribution boxplots

Description

Usage

Arguments

Value

Examples

Helper to specify heatmap annotation for variable distribution density plot

Description

Usage

Arguments

Value

Examples

Helper to specify heatmap annotation for variable distribution histograms

Description

Usage

Arguments

Value

Examples

Plot (grouped and ordered) compositional barplots

Description

Usage

Arguments

Details