Summary Statistics Table with mlr Package for Handling Factor Variables.

Summary Statistics Table with Factors and Continuous Variables

In this article, we will explore how to create a summary statistics table that handles both factor variables and continuous variables. We will use the mtcars dataset from R’s built-in datasets package and perform simple modifications to it in order to create a table that includes all values of factor variables.

Introduction

The stargazer and huxtable packages are popular choices for creating summary statistics tables, but they have limitations when dealing with factor variables. In this article, we will explore how to use the mlr package to create a summary statistics table that handles all values of factor variables.

Using Model Matrix

One way to handle factor variables in stargazer is to use the model.matrix() function to create dummy variables for each factor variable. This approach works, but it has its limitations.

options(na.action = "na.pass")  # so that we keep missing values in the data
X <- model.matrix(~ . - 1, data = mtcars_df)
X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")

However, model.matrix() drops the base case when creating dummy variables, so we will not get all values of the factor variable.

Using mlr::createDummyFeatures()

The mlr package provides a function called createDummyFeatures() that creates a dummy for all values, even the base case. We can use this function to create a summary statistics table with all values of factor variables.

library(tidyverse)
library(stargazer)
library(mlr)

mtcars_df <- mtcars
mtcars_df <- mtcars_df %>% 
  mutate(vs = factor(vs), am = factor(am)) %&gt;% 
  select(mpg, vs, am)
head(mtcars_df)


X <- mlr::createDummyFeatures(obj = mtcars_df)
X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")

Output

Using mlr::createDummyFeatures(), we can create a summary statistics table with all values of factor variables.

======================================================
Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg       32 20.091  6.027   10    15.4     22.8   34 
vs.0      32 0.562   0.504    0     0        1      1 
vs.1      32 0.438   0.504    0     0        1      1 
am.0      32 0.594   0.499    0     0        1      1 
am.1      32 0.406   0.499    0     0        1      1 
------------------------------------------------------

Conclusion

In this article, we explored how to create a summary statistics table that handles both factor variables and continuous variables using the mlr package.

Key Takeaways

  • The model.matrix() function can be used to create dummy variables for each factor variable.
  • However, it drops the base case when creating dummy variables, so we will not get all values of the factor variable.
  • The mlr::createDummyFeatures() function creates a dummy for all values, even the base case.
  • We can use this function to create a summary statistics table with all values of factor variables.

Example Use Case

# Load required libraries
library(tidyverse)
library(stargazer)
library(mlr)

# Create a summary statistics table using mlr::createDummyFeatures()
mtcars_df <- mtcars
mtcars_df <- mtcars_df %>% 
  mutate(vs = factor(vs), am = factor(am)) %&gt;% 
  select(mpg, vs, am)
head(mtcars_df)

X <- mlr::createDummyFeatures(obj = mtcars_df)
X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")

References


Last modified on 2024-04-01