Structure
import pandas as pd
from great_tables import GT, from_column, style, loc
= pd.read_csv("data/coffee-sales.csv")
coffee coffee
product revenue_dollars revenue_pct profit_dollars profit_pct
0 Grinder 904500 0.03 567960 0.04
1 Moka pot 2045250 0.07 181080 0.01
2 Cold brew 288750 0.01 241770 0.02
3 Filter 404250 0.01 70010 0.00
4 Drip machine 2632000 0.09 1374450 0.09
5 AeroPress 2601500 0.09 1293780 0.09
6 Pour over 846000 0.03 364530 0.02
7 French press 1113250 0.04 748120 0.05
8 Cezve 2512500 0.09 1969520 0.13
9 Chemex 3137250 0.11 817680 0.06
10 Scale 3801000 0.13 2910290 0.20
11 Kettle 756250 0.03 617520 0.04
12 Espresso Machine 8406000 0.29 3636440 0.25
13 Total 29448500 1.00 14793150 1.00
Starting a GT Table
= GT(coffee)
gt_tbl1 gt_tbl1
product | revenue_dollars | revenue_pct | profit_dollars | profit_pct |
---|---|---|---|---|
Grinder | 904500 | 0.03 | 567960 | 0.04 |
Moka pot | 2045250 | 0.07 | 181080 | 0.01 |
Cold brew | 288750 | 0.01 | 241770 | 0.02 |
Filter | 404250 | 0.01 | 70010 | 0.0 |
Drip machine | 2632000 | 0.09 | 1374450 | 0.09 |
AeroPress | 2601500 | 0.09 | 1293780 | 0.09 |
Pour over | 846000 | 0.03 | 364530 | 0.02 |
French press | 1113250 | 0.04 | 748120 | 0.05 |
Cezve | 2512500 | 0.09 | 1969520 | 0.13 |
Chemex | 3137250 | 0.11 | 817680 | 0.06 |
Scale | 3801000 | 0.13 | 2910290 | 0.2 |
Kettle | 756250 | 0.03 | 617520 | 0.04 |
Espresso Machine | 8406000 | 0.29 | 3636440 | 0.25 |
Total | 29448500 | 1.0 | 14793150 | 1.0 |
Working with the Header
We use tab_header()
to modify the header of our table. We can supply both a title
and subtitle
for our Table.
Here we’re using the pipe |>
to chain our commands together.
Here we chain commands together using .
- we usually use this in Python to select methods.
= GT(coffee) \
gt_tbl_2 ="Sales of Coffee Equipment")
.tab_header(title
gt_tbl_2
Sales of Coffee Equipment | ||||
---|---|---|---|---|
product | revenue_dollars | revenue_pct | profit_dollars | profit_pct |
Grinder | 904500 | 0.03 | 567960 | 0.04 |
Moka pot | 2045250 | 0.07 | 181080 | 0.01 |
Cold brew | 288750 | 0.01 | 241770 | 0.02 |
Filter | 404250 | 0.01 | 70010 | 0.0 |
Drip machine | 2632000 | 0.09 | 1374450 | 0.09 |
AeroPress | 2601500 | 0.09 | 1293780 | 0.09 |
Pour over | 846000 | 0.03 | 364530 | 0.02 |
French press | 1113250 | 0.04 | 748120 | 0.05 |
Cezve | 2512500 | 0.09 | 1969520 | 0.13 |
Chemex | 3137250 | 0.11 | 817680 | 0.06 |
Scale | 3801000 | 0.13 | 2910290 | 0.2 |
Kettle | 756250 | 0.03 | 617520 | 0.04 |
Espresso Machine | 8406000 | 0.29 | 3636440 | 0.25 |
Total | 29448500 | 1.0 | 14793150 | 1.0 |
Selecting Columns
How do we specify columns to modify? We can use column selectors (in Great Tables
in Python) or tidy selectors in R.
= ["profit_dollars", "profit_pct"]
columns = [4, 5]
columns # anonymous function
=lambda x: "profit" in x columns
Much more info about column selectors in Python: https://posit-dev.github.io/great-tables/get-started/column-selection.html
Column Spanners
We can specify a spanner with tab_spanner()
, which takes two main arguments:
label
- what we want the spanner to saycolumns
- which columns we want to apply the spanner to.
= (GT(coffee) \
gt_tbl_3 ="Revenue", columns=["revenue_dollars", "revenue_pct"])
.tab_spanner(label="Profit", columns=["profit_dollars", "profit_pct"]))
.tab_spanner(label gt_tbl_3
product | Revenue | Profit | ||
---|---|---|---|---|
revenue_dollars | revenue_pct | profit_dollars | profit_pct | |
Grinder | 904500 | 0.03 | 567960 | 0.04 |
Moka pot | 2045250 | 0.07 | 181080 | 0.01 |
Cold brew | 288750 | 0.01 | 241770 | 0.02 |
Filter | 404250 | 0.01 | 70010 | 0.0 |
Drip machine | 2632000 | 0.09 | 1374450 | 0.09 |
AeroPress | 2601500 | 0.09 | 1293780 | 0.09 |
Pour over | 846000 | 0.03 | 364530 | 0.02 |
French press | 1113250 | 0.04 | 748120 | 0.05 |
Cezve | 2512500 | 0.09 | 1969520 | 0.13 |
Chemex | 3137250 | 0.11 | 817680 | 0.06 |
Scale | 3801000 | 0.13 | 2910290 | 0.2 |
Kettle | 756250 | 0.03 | 617520 | 0.04 |
Espresso Machine | 8406000 | 0.29 | 3636440 | 0.25 |
Total | 29448500 | 1.0 | 14793150 | 1.0 |
Column Labels
Change Column labels for readability. The column names are: product
, revenue_dollars
, revenue_pct
, profit_dollars
, profit_pct
, and monthly_sales
.
= (
gt_tbl_4
gt_tbl_3
.cols_label(="Product",
product="Amount",
revenue_dollars="Percent",
revenue_pct="Amount",
profit_dollars="Percent"
profit_pct
)
)
gt_tbl_4
Product | Revenue | Profit | ||
---|---|---|---|---|
Amount | Percent | Amount | Percent | |
Grinder | 904500 | 0.03 | 567960 | 0.04 |
Moka pot | 2045250 | 0.07 | 181080 | 0.01 |
Cold brew | 288750 | 0.01 | 241770 | 0.02 |
Filter | 404250 | 0.01 | 70010 | 0.0 |
Drip machine | 2632000 | 0.09 | 1374450 | 0.09 |
AeroPress | 2601500 | 0.09 | 1293780 | 0.09 |
Pour over | 846000 | 0.03 | 364530 | 0.02 |
French press | 1113250 | 0.04 | 748120 | 0.05 |
Cezve | 2512500 | 0.09 | 1969520 | 0.13 |
Chemex | 3137250 | 0.11 | 817680 | 0.06 |
Scale | 3801000 | 0.13 | 2910290 | 0.2 |
Kettle | 756250 | 0.03 | 617520 | 0.04 |
Espresso Machine | 8406000 | 0.29 | 3636440 | 0.25 |
Total | 29448500 | 1.0 | 14793150 | 1.0 |
Formatting the Stub
I have to use a separate table to show these. We have two columns in this dataset: row
, which serve as the row labels, and group
, which serve as the column labels.
from great_tables import data
= data.exibble
exibble
= (GT(exibble)\
out = "row")
.tab_stub(rowname_col
)
out
num | char | fctr | date | time | datetime | currency | group | |
---|---|---|---|---|---|---|---|---|
row_1 | 0.1111 | apricot | one | 2015-01-15 | 13:35 | 2018-01-01 02:22 | 49.95 | grp_a |
row_2 | 2.222 | banana | two | 2015-02-15 | 14:40 | 2018-02-02 14:33 | 17.95 | grp_a |
row_3 | 33.33 | coconut | three | 2015-03-15 | 15:45 | 2018-03-03 03:44 | 1.39 | grp_a |
row_4 | 444.4 | durian | four | 2015-04-15 | 16:50 | 2018-04-04 15:55 | 65100.0 | grp_a |
row_5 | 5550.0 | five | 2015-05-15 | 17:55 | 2018-05-05 04:00 | 1325.81 | grp_b | |
row_6 | fig | six | 2015-06-15 | 2018-06-06 16:11 | 13.255 | grp_b | ||
row_7 | 777000.0 | grapefruit | seven | 19:10 | 2018-07-07 05:22 | grp_b | ||
row_8 | 8880000.0 | honeydew | eight | 2015-08-15 | 20:20 | 0.44 | grp_b |
= (GT(exibble)\
out = "row", groupname_col = "group")
.tab_stub(rowname_col
)
out
num | char | fctr | date | time | datetime | currency | |
---|---|---|---|---|---|---|---|
grp_a | |||||||
row_1 | 0.1111 | apricot | one | 2015-01-15 | 13:35 | 2018-01-01 02:22 | 49.95 |
row_2 | 2.222 | banana | two | 2015-02-15 | 14:40 | 2018-02-02 14:33 | 17.95 |
row_3 | 33.33 | coconut | three | 2015-03-15 | 15:45 | 2018-03-03 03:44 | 1.39 |
row_4 | 444.4 | durian | four | 2015-04-15 | 16:50 | 2018-04-04 15:55 | 65100.0 |
grp_b | |||||||
row_5 | 5550.0 | five | 2015-05-15 | 17:55 | 2018-05-05 04:00 | 1325.81 | |
row_6 | fig | six | 2015-06-15 | 2018-06-06 16:11 | 13.255 | ||
row_7 | 777000.0 | grapefruit | seven | 19:10 | 2018-07-07 05:22 | ||
row_8 | 8880000.0 | honeydew | eight | 2015-08-15 | 20:20 | 0.44 |
We can make the group its own column with row_group_as_column = TRUE
. This does not appear to be the case in Python, however:
= (GT(exibble)\
out = "group")
.tab_stub(groupname_col
)
out
num | char | fctr | date | time | datetime | currency | row |
---|---|---|---|---|---|---|---|
0.1111 | apricot | one | 2015-01-15 | 13:35 | 2018-01-01 02:22 | 49.95 | row_1 |
2.222 | banana | two | 2015-02-15 | 14:40 | 2018-02-02 14:33 | 17.95 | row_2 |
33.33 | coconut | three | 2015-03-15 | 15:45 | 2018-03-03 03:44 | 1.39 | row_3 |
444.4 | durian | four | 2015-04-15 | 16:50 | 2018-04-04 15:55 | 65100.0 | row_4 |
5550.0 | five | 2015-05-15 | 17:55 | 2018-05-05 04:00 | 1325.81 | row_5 | |
fig | six | 2015-06-15 | 2018-06-06 16:11 | 13.255 | row_6 | ||
777000.0 | grapefruit | seven | 19:10 | 2018-07-07 05:22 | row_7 | ||
8880000.0 | honeydew | eight | 2015-08-15 | 20:20 | 0.44 | row_8 |
```