While I have some inkling of how {gt} works fresh in my mind, it would be good to also come to grips with {gtsummary}, so here we go on another journey! From a cursory reading of examples I have seen it is a promising package that may replace {stargazer} (one of the workhorses I have come to love).
While I have some inkling of how {gt} works fresh in my mind, it would be good to also come to grips with {gtsummary}, a package that " provides an elegant and flexible way to create publicationready analytical and summary tables using the R programming language. The {gtsummary} package summarizes data sets, regression models, and more, using sensible defaults with highly customizable capabilities." The package’s tutorials are available here. Let us get going and use the trial
data bundled with the package:
Variable  Class  Label 

trt  character  Chemotherapy Treatment 
age  numeric  Age 
marker  numeric  Marker Level (ng/mL) 
stage  factor  T Stage 
grade  factor  Grade 
response  integer  Tumor Response 
death  integer  Patient Died 
ttdeath  numeric  Months to Death/Censor 
library(gtsummary)
data(trial)
%>%
trial select(trt, age) %>%
tbl_summary()
Characteristic  N = 200^{1} 

Chemotherapy Treatment  
Drug A  98 (49%) 
Drug B  102 (51%) 
Age  47 (38, 57) 
Unknown  11 
^{
1
}
n (%); Median (IQR)

The tbl_summary()
command will parse the variable type and then generate either the count and relative frequency (as a percentage of the total) for each unique level of a character or factor, and the median and interquartile range (IQR) if it is a numeric or integer variable. Missing values will show up as Unknown
for each variable in question, unless the label is tweaked (as shown below).
If we want to view the summary statistics by treatment (or any other categorical) group, then the by =
switch can be used. The add_p()
switch will calculate and report pvalues, with the test(s) run added as footnote(s). The add_overall()
switch will give you an additional column with the statistics for the full dataset (i.e., ignoring the by =
switch). Similarly, add_n()
will give you the length of the dataset. If you want to customize the statistics and other elements, a lot is possible. Titles and subtitles are easy to add by leaning on gt
commands.
%>%
trial select(trt, age, stage, ttdeath) %>%
tbl_summary(
by = trt,
statistic = list(
all_continuous() ~ "{median}; {mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"
),digits = all_continuous() ~ 2,
label = c(age ~ "Patient's Age", stage ~ "T Stage"),
missing_text = "(Missing)"
%>%
) add_p() %>%
add_n() %>%
modify_header(label ~ "**Variable**") %>%
as_gt() %>%
::tab_header(
gttitle = "Table 1. Summary Statistics",
subtitle = "(N = 200)"
)
Table 1. Summary Statistics  

(N = 200)  
Variable  N  Drug A, N = 98^{1}  Drug B, N = 102^{1}  pvalue^{2} 
Patient's Age  189  46.00; 47.01 (14.71)  48.00; 47.45 (14.01)  0.7 
(Missing)  7  4  
T Stage  200  0.9  
T1  28 / 98 (29%)  25 / 102 (25%)  
T2  25 / 98 (26%)  29 / 102 (28%)  
T3  22 / 98 (22%)  21 / 102 (21%)  
T4  23 / 98 (23%)  27 / 102 (26%)  
Months to Death/Censor  200  23.51; 20.23 (4.99)  21.22; 19.04 (5.50)  0.14 
^{
1
}
Median; Mean (SD); n / N (%)
^{
2
}
Wilcoxon rank sum test; Pearson's Chisquared test

Summary tables are all well and good, but what about regression results? {stargazer}
has been the package I have used for presenting model results but {gtsummary}
may change that if it fits well in the workflow. I’ll switch datasets here and work with the diamonds
dataset. We can fit a simple linear regression model to have some estimates to work with.
library(tidyverse)
glimpse(diamonds)
Rows: 53,940
Columns: 10
$ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22…
$ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very…
$ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, …
$ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1…
$ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, …
$ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 33…
$ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87…
$ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78…
$ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49…
%>%
diamonds mutate(cut.f = factor(cut, ordered = FALSE)) > diamonds
lm(price ~ carat + cut.f, data = diamonds) > lm01
%>%
lm01 tbl_regression(
intercept = TRUE
%>%
) add_n() %>%
bold_labels() %>%
italicize_levels() %>%
modify_header(label ~ "**Independent Variables**") %>%
add_glance_source_note(
label = list(
~ "Residual df",
df.residual ~ "Adjusted RSquared"
adj.r.squared
),fmt_fun = df ~ style_number#,
#include = c(adj.r.squared, df.residual)
%>%
) as_gt() %>%
::tab_header(
gttitle = "Table 2. Linear Regression Estimates",
subtitle = "The Diamonds Dataset"
)
Table 2. Linear Regression Estimates  

The Diamonds Dataset  
Independent Variables  N  Beta  95% CI^{1}  pvalue 
(Intercept)  53940  3,875  3,955, 3,796  <0.001 
carat  53940  7,871  7,844, 7,898  <0.001 
cut.f  53940  
Fair  —  —  
Good  1,120  1,035, 1,206  <0.001  
Very Good  1,510  1,431, 1,589  <0.001  
Premium  1,439  1,361, 1,517  <0.001  
Ideal  1,801  1,724, 1,878  <0.001  
R² = 0.856; Adjusted RSquared = 0.856; Sigma = 1,511; Statistic = 64,369; pvalue = 0.000; df = 5; Loglikelihood = 471,420; AIC = 942,854; BIC = 942,916; Deviance = 123,212,493,961; Residual df = 53,934; No. Obs. = 53,940  
^{
1
}
CI = Confidence Interval

What about a logit model, for example?
glm(
~ trt + age + grade, data = trial, family = binomial
response > glm01
)
%>%
glm01 tbl_regression(
intercept = TRUE,
exponentiate = TRUE
%>%
) add_n() %>%
bold_labels() %>%
italicize_levels() %>%
modify_header(label ~ "**Independent Variables**") %>%
add_glance_source_note() %>%
as_gt() %>%
::tab_header(
gttitle = "Table 3. Logit Estimates",
subtitle = "The Trial Dataset"
)
Table 3. Logit Estimates  

The Trial Dataset  
Independent Variables  N  OR^{1}  95% CI^{1}  pvalue 
(Intercept)  183  0.18  0.05, 0.61  0.007 
Chemotherapy Treatment  183  
Drug A  —  —  
Drug B  1.13  0.60, 2.13  0.7  
Age  183  1.02  1.00, 1.04  0.10 
Grade  183  
I  —  —  
II  0.85  0.39, 1.85  0.7  
III  1.01  0.47, 2.15  >0.9  
Null deviance = 229; Null df = 182; Loglikelihood = 113; AIC = 235; BIC = 251; Deviance = 225; Residual df = 178; No. Obs. = 183  
^{
1
}
OR = Odds Ratio, CI = Confidence Interval

So, just the basics and little more than a quick tour but things are looking good with {gtsummary}
thus far.
Text and figures are licensed under Creative Commons Attribution CC BYSA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ruhil (2021, Feb. 13). From an Attican Hollow ...: Making Friends with {gtsummary}. Retrieved from https://aniruhil.org/posts/20210213makingfriendswithgtsummary/
BibTeX citation
@misc{ruhil2021making, author = {Ruhil, Ani}, title = {From an Attican Hollow ...: Making Friends with {gtsummary}}, url = {https://aniruhil.org/posts/20210213makingfriendswithgtsummary/}, year = {2021} }