Summary Statistics Table with Factors and Continuous Variables
In this article, we will explore how to create a summary statistics table that handles both factor variables and continuous variables. We will use the mtcars dataset from R’s built-in datasets package and perform simple modifications to it in order to create a table that includes all values of factor variables.
Introduction
The stargazer and huxtable packages are popular choices for creating summary statistics tables, but they have limitations when dealing with factor variables. In this article, we will explore how to use the mlr package to create a summary statistics table that handles all values of factor variables.
Using Model Matrix
One way to handle factor variables in stargazer is to use the model.matrix() function to create dummy variables for each factor variable. This approach works, but it has its limitations.
options(na.action = "na.pass") # so that we keep missing values in the data
X <- model.matrix(~ . - 1, data = mtcars_df)
X.df <- data.frame(X) # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")
However, model.matrix() drops the base case when creating dummy variables, so we will not get all values of the factor variable.
Using mlr::createDummyFeatures()
The mlr package provides a function called createDummyFeatures() that creates a dummy for all values, even the base case. We can use this function to create a summary statistics table with all values of factor variables.
library(tidyverse)
library(stargazer)
library(mlr)
mtcars_df <- mtcars
mtcars_df <- mtcars_df %>%
mutate(vs = factor(vs), am = factor(am)) %>%
select(mpg, vs, am)
head(mtcars_df)
X <- mlr::createDummyFeatures(obj = mtcars_df)
X.df <- data.frame(X) # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")
Output
Using mlr::createDummyFeatures(), we can create a summary statistics table with all values of factor variables.
======================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg 32 20.091 6.027 10 15.4 22.8 34
vs.0 32 0.562 0.504 0 0 1 1
vs.1 32 0.438 0.504 0 0 1 1
am.0 32 0.594 0.499 0 0 1 1
am.1 32 0.406 0.499 0 0 1 1
------------------------------------------------------
Conclusion
In this article, we explored how to create a summary statistics table that handles both factor variables and continuous variables using the mlr package.
Key Takeaways
- The
model.matrix()function can be used to create dummy variables for each factor variable. - However, it drops the base case when creating dummy variables, so we will not get all values of the factor variable.
- The
mlr::createDummyFeatures()function creates a dummy for all values, even the base case. - We can use this function to create a summary statistics table with all values of factor variables.
Example Use Case
# Load required libraries
library(tidyverse)
library(stargazer)
library(mlr)
# Create a summary statistics table using mlr::createDummyFeatures()
mtcars_df <- mtcars
mtcars_df <- mtcars_df %>%
mutate(vs = factor(vs), am = factor(am)) %>%
select(mpg, vs, am)
head(mtcars_df)
X <- mlr::createDummyFeatures(obj = mtcars_df)
X.df <- data.frame(X) # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")
References
model.matrix(): Creating a model matrix in Rmlr::createDummyFeatures(): MLR: Creating dummy variables
Last modified on 2024-04-01