Customizing Pheatmap Annotations with colData() for Enhanced Visualization of DESeq2 Data

Customizing Pheatmap Annotations with colData()

When working with DESeq2 data in R, it’s common to perform differential expression analysis and visualize the results using heatmaps generated by pheatmap. However, when creating a heatmap from pairwise comparisons of a DESeq object, it’s often desirable to annotate rows and columns with metadata values instead of sample names. In this article, we’ll explore how to customize the annotations in pheatmap to display cell type information based on the colData() function.

Background

pheatmap is a popular R package for generating heatmaps from correlation matrices. When creating a heatmap from a DESeq object, it’s essential to understand that the default behavior involves using sample names as row and column annotations. While this works well in many cases, it may not be suitable when working with metadata-rich data.

The Problem

In our example data, we have three cell types (Alveolar macrophages, Interstitial macrophages, and T cells) with three replicates each. We’ve created a DESeq object dds from the count data matrix only_counts using sample information from sample_info. After obtaining the variance stabilizing transformation (VST) matrix and computing pairwise correlation values, we attempt to create a heatmap using pheatmap.

However, when specifying the annotation column or row, we encounter an error indicating that the number of dimensions is incorrect. This issue arises because the default behavior expects sample names as annotations, but we want to display cell type information instead.

Solution

To resolve this problem, we can modify the annotations by renaming the row and column names using the rownames() and colnames() functions. We’ll create a vector of cell types by extracting the values from the sample_info$Cell_type column and assign it to both the row and column names.

Here’s the corrected code:

vsd_cor <- cor(vsd_mat)
rownames(vsd_cor) <- paste(sample_info$Cell_type)
colnames(vsd_cor) <- paste(sample_info$Cell_type)

pheatmap(vsd_cor,
         main = "Hierarchical clustering",
         annotation_col = colData(dds)$Cell_type)

By renaming the row and column names, we ensure that pheatmap uses the cell type information as annotations instead of sample names. This allows us to visualize the pairwise correlation values in a heatmap with meaningful annotations.

Additional Considerations

When working with large datasets or complex metadata structures, it’s essential to consider the following best practices:

  • Use meaningful and consistent naming conventions for your annotations to ensure ease of interpretation.
  • Verify that your annotation data is correctly formatted and can be used by pheatmap.
  • Experiment with different visualization settings and parameters to optimize the appearance and content of your heatmap.

Conclusion

By customizing the annotations in pheatmap using colData(), we’ve successfully transformed a sample-level heatmap into a cell type-based representation. This approach allows for more informative and engaging visualizations, especially when working with metadata-rich data. Remember to consider best practices for annotation naming conventions and formatting to ensure optimal results.

Step-by-Step Solution

To implement the solution outlined above, follow these steps:

  1. Load the necessary R packages: pheatmap and DESeq2.
  2. Create a DESeq object from your count data matrix using the DESeqDataSetFromMatrix() function.
  3. Extract sample information from your dataset and create a design formula for the DESeq object.
  4. Calculate the variance stabilizing transformation (VST) matrix using the varianceStabilizingTransformation() function.
  5. Compute pairwise correlation values from the VST matrix using the cor() function.
  6. Rename the row and column names in the correlation matrix by extracting cell type information from your sample metadata.
  7. Create a heatmap using pheatmap with the modified correlation matrix.

Here’s the complete code:

# Load necessary packages
library(pheatmap)
library(DESeq2)

# Create DESeq object from count data matrix
dds <- DESeqDataSetFromMatrix(countData = only_counts,
                              colData = sample_info,
                              design= ~ Cell_type)

# Calculate VST matrix
vsd <- varianceStabilizingTransformation(dds, blind=T)
vsd_mat <- assay(vsd)  # Extract the VST matrix

# Compute pairwise correlation values
vsd_cor <- cor(vsd_mat)

# Rename row and column names in correlation matrix
rownames(vsd_cor) <- paste(sample_info$Cell_type)
colnames(vsd_cor) <- paste(sample_info$Cell_type)

# Create heatmap with modified correlation matrix
pheatmap(vsd_cor,
         main = "Hierarchical clustering",
         annotation_col = colData(dds)$Cell_type)

By following these steps, you’ll be able to create a pheatmap with meaningful annotations based on your cell type metadata.


Last modified on 2023-12-17