Creating Heatmaps from Pairwise Comparisons: A Visual Approach to Multiple Hypothesis Testing in R

Introduction

In modern data analysis, it’s common to have multiple pairwise comparisons performed on grouped factors. The resulting p-values can be difficult to interpret in their raw form, especially when dealing with a large number of comparisons. One potential approach is to create a heatmap that displays the p-values, where colors are used to indicate significance levels. In this article, we’ll explore how to make a heatmap using p values after pairwise comparisons.

Background

Pairwise comparisons involve testing each possible combination of two groups (e.g., comparing means between two different organizations at each site) against each other. The resulting p-values represent the probability that any observed difference between the groups is due to chance rather than a real effect. However, when dealing with multiple comparisons, the issue arises that the probability of observing at least one false positive result (a Type I error) increases.

Tools and Methods

We’ll use R and the ggplot2 package for data visualization. The tidyr library will be used for data manipulation and completion, while rstatix provides functions for statistical inference and comparison of groups.

Step 1: Preparation and Data Wrangling

First, we need to prepare our data in a suitable format for the analysis and visualization.

library(dplyr)
library(rstatix)
library(ggplot2)

data %&gt;%
  group_by(site) %&gt;%
  t_test(variable ~ organism) %&gt;% 
  tidyr::complete(group1 = unique(data$organism), group2 = unique(data$organism), site = unique(data$site))

This step involves grouping our data by the site variable and performing a pairwise comparison using t_test. We then use tidyr::complete to fill in missing values for the group1 and group2 variables, representing each pair of organizations.

Step 2: Data Visualization

We’ll now create a heatmap that displays the p-values from our analysis. This involves:

Using geom_tile to represent the p-values as tiles
Adding the p-values as labels using geom_text
Setting a color scale for the tiles, with lower p-values represented by red and higher ones by green

ggplot(d, aes(group2, rev(group1), fill = p)) +
  geom_tile() +
  geom_text(aes(label = scales::number(p, accuracy = 1e-6))) +
  scale_fill_gradient(low = "red", high = "green", na.value = NA) +
  facet_wrap(~site, ncol = 1)

Step 3: Significance Subdivision

To further illustrate significance, we can subdivide the p-values into categories based on their significance levels (p<0.05, p<0.01, and p<0.001).

We’ll create a separate heatmap that only includes these significant groups.

# Significant group selection
d_significant %&gt;%
  filter(p < 0.05 | p < 0.01 | p < 0.001)

ggplot(d_significant, aes(group2, rev(group1), fill = factor(p))) +
  geom_tile() +
  geom_text(aes(label = scales::number(p, accuracy = 1e-6))) +
  scale_fill_gradient(low = "red", high = "green") +
  facet_wrap(~site, ncol = 1)

Conclusion

In this article, we explored how to create a heatmap using p-values after pairwise comparisons. By leveraging the ggplot2 package for data visualization and tidyr for data manipulation, we were able to transform our data into a suitable format for analysis. We also demonstrated how to further illustrate significance by subdividing p-values into categories.

These methods can be applied to various scenarios involving multiple comparisons, providing insights into complex data sets in an engaging and intuitive manner.

Last modified on 2024-05-16