While I have some inkling of how {gt} works fresh in my mind, it would be good to also come to grips with {gtsummary}, so here we go on another journey! From a cursory reading of examples I have seen it is a promising package that may replace {stargazer} (one of the workhorses I have come to love).

While I have some inkling of how {gt} works fresh in my mind, it would be good to also come to grips with {gtsummary}, a package that ” provides an elegant and flexible way to create publication-ready analytical and summary tables using the R programming language. The {gtsummary} package summarizes data sets, regression models, and more, using sensible defaults with highly customizable capabilities.” The package’s tutorials are available here. Let us get going and use the `trial`

data bundled with the package:

Variable | Class | Label |
---|---|---|

trt | character | Chemotherapy Treatment |

age | numeric | Age |

marker | numeric | Marker Level (ng/mL) |

stage | factor | T Stage |

grade | factor | Grade |

response | integer | Tumor Response |

death | integer | Patient Died |

ttdeath | numeric | Months to Death/Censor |

```
library(gtsummary)
data(trial)
%>%
trial select(trt, age) %>%
tbl_summary()
```

Characteristic |
N = 200^{1} |
---|---|

Chemotherapy Treatment | |

Drug A | 98 (49%) |

Drug B | 102 (51%) |

Age | 47 (38, 57) |

Unknown | 11 |

^{1} n (%); Median (IQR) |

The `tbl_summary()`

command will parse the variable type and then generate either the count and relative frequency (as a percentage of the total) for each unique level of a character or factor, and the median and inter-quartile range (IQR) if it is a numeric or integer variable. Missing values will show up as `Unknown`

for each variable in question, unless the label is tweaked (as shown below).

If we want to view the summary statistics by treatment (or any other categorical) group, then the `by =`

switch can be used. The `add_p()`

switch will calculate and report p-values, with the test(s) run added as footnote(s). The `add_overall()`

switch will give you an additional column with the statistics for the full data-set (i.e., ignoring the `by =`

switch). Similarly, `add_n()`

will give you the length of the data-set. If you want to customize the statistics and other elements, a lot is possible. Titles and subtitles are easy to add by leaning on `gt`

commands.

```
%>%
trial select(trt, age, stage, ttdeath) %>%
tbl_summary(
by = trt,
statistic = list(
all_continuous() ~ "{median}; {mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"
),digits = all_continuous() ~ 2,
label = c(age ~ "Patient's Age", stage ~ "T Stage"),
missing_text = "(Missing)"
%>%
) add_p() %>%
add_n() %>%
modify_header(label ~ "**Variable**") %>%
as_gt() %>%
::tab_header(
gttitle = "Table 1. Summary Statistics",
subtitle = "(N = 200)"
)
```

Table 1. Summary Statistics | ||||

(N = 200) | ||||

Variable |
N |
Drug A, N = 98^{1} |
Drug B, N = 102^{1} |
p-value^{2} |
---|---|---|---|---|

Patient's Age | 189 | 46.00; 47.01 (14.71) | 48.00; 47.45 (14.01) | 0.7 |

(Missing) | 7 | 4 | ||

T Stage | 200 | 0.9 | ||

T1 | 28 / 98 (29%) | 25 / 102 (25%) | ||

T2 | 25 / 98 (26%) | 29 / 102 (28%) | ||

T3 | 22 / 98 (22%) | 21 / 102 (21%) | ||

T4 | 23 / 98 (23%) | 27 / 102 (26%) | ||

Months to Death/Censor | 200 | 23.51; 20.23 (4.99) | 21.22; 19.04 (5.50) | 0.14 |

^{1} Median; Mean (SD); n / N (%) |
||||

^{2} Wilcoxon rank sum test; Pearson’s Chi-squared test |

Summary tables are all well and good, but what about regression results? `{stargazer}`

has been the package I have used for presenting model results but `{gtsummary}`

may change that if it fits well in the work-flow. I’ll switch data-sets here and work with the `diamonds`

data-set. We can fit a simple linear regression model to have some estimates to work with.

```
library(tidyverse)
glimpse(diamonds)
```

```
Rows: 53,940
Columns: 10
$ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22…
$ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very…
$ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, …
$ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1…
$ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, …
$ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 33…
$ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87…
$ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78…
$ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49…
```

```
%>%
diamonds mutate(cut.f = factor(cut, ordered = FALSE)) -> diamonds
lm(price ~ carat + cut.f, data = diamonds) -> lm01
%>%
lm01 tbl_regression(
intercept = TRUE
%>%
) add_n() %>%
bold_labels() %>%
italicize_levels() %>%
modify_header(label ~ "**Independent Variables**") %>%
add_glance_source_note(
label = list(
~ "Residual df",
df.residual ~ "Adjusted R-Squared"
adj.r.squared
),fmt_fun = df ~ style_number#,
#include = c(adj.r.squared, df.residual)
%>%
) as_gt() %>%
::tab_header(
gttitle = "Table 2. Linear Regression Estimates",
subtitle = "The Diamonds Data-set"
)
```

Table 2. Linear Regression Estimates | ||||

The Diamonds Data-set | ||||

Independent Variables |
N |
Beta |
95% CI^{1} |
p-value |
---|---|---|---|---|

(Intercept) | 53,940 | -3,875 | -3,955, -3,796 | <0.001 |

carat | 53,940 | 7,871 | 7,844, 7,898 | <0.001 |

cut.f | 53,940 | |||

Fair | — | — | ||

Good | 1,120 | 1,035, 1,206 | <0.001 | |

Very Good | 1,510 | 1,431, 1,589 | <0.001 | |

Premium | 1,439 | 1,361, 1,517 | <0.001 | |

Ideal | 1,801 | 1,724, 1,878 | <0.001 | |

R² = 0.856; Adjusted R-Squared = 0.856; Sigma = 1,511; Statistic = 64,369; p-value = <0.001; df = 5; Log-likelihood = -471,420; AIC = 942,854; BIC = 942,916; Deviance = 123,212,493,961; Residual df = 53,934; No. Obs. = 53,940 | ||||

^{1} CI = Confidence Interval |

What about a logit model, for example?

```
glm(
~ trt + age + grade, data = trial, family = binomial
response -> glm01
)
%>%
glm01 tbl_regression(
intercept = TRUE,
exponentiate = TRUE
%>%
) add_n() %>%
bold_labels() %>%
italicize_levels() %>%
modify_header(label ~ "**Independent Variables**") %>%
add_glance_source_note() %>%
as_gt() %>%
::tab_header(
gttitle = "Table 3. Logit Estimates",
subtitle = "The Trial Data-set"
)
```

Table 3. Logit Estimates | ||||

The Trial Data-set | ||||

Independent Variables |
N |
OR^{1} |
95% CI^{1} |
p-value |
---|---|---|---|---|

(Intercept) | 183 | 0.18 | 0.05, 0.61 | 0.007 |

Chemotherapy Treatment | 183 | |||

Drug A | — | — | ||

Drug B | 1.13 | 0.60, 2.13 | 0.7 | |

Age | 183 | 1.02 | 1.00, 1.04 | 0.10 |

Grade | 183 | |||

I | — | — | ||

II | 0.85 | 0.39, 1.85 | 0.7 | |

III | 1.01 | 0.47, 2.15 | >0.9 | |

Null deviance = 229; Null df = 182; Log-likelihood = -113; AIC = 235; BIC = 251; Deviance = 225; Residual df = 178; No. Obs. = 183 | ||||

^{1} OR = Odds Ratio, CI = Confidence Interval |

So, just the basics and little more than a quick tour but things are looking good with `{gtsummary}`

thus far.

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Ruhil (2021, Feb. 13). From an Attican Hollow: Making Friends with {gtsummary}. Retrieved from https://aniruhil.org/posts/2021-02-13-making-friends-with-gtsummary/

BibTeX citation

@misc{ruhil2021making, author = {Ruhil, Ani}, title = {From an Attican Hollow: Making Friends with {gtsummary}}, url = {https://aniruhil.org/posts/2021-02-13-making-friends-with-gtsummary/}, year = {2021} }