diff --git a/a-latex_examples/index.pdf b/a-latex_examples/index.pdf
index b47f17892..abbadacb8 100644
Binary files a/a-latex_examples/index.pdf and b/a-latex_examples/index.pdf differ
diff --git a/blog/bring-your-own-df/index.html b/blog/bring-your-own-df/index.html
index ee027049a..4558794de 100644
--- a/blog/bring-your-own-df/index.html
+++ b/blog/bring-your-own-df/index.html
@@ -252,7 +252,7 @@
The
The challenge of removing hard dependencies
Removing hard dependencies on DataFrame libraries is worthwhile, but requires special handling for all DataFrame specific actions. To illustrate consider the Great Tables output below, which is produced from a Pandas DataFrame:
-
+
import pandas as pdimport polars as plfrom great_tables import GT
@@ -262,52 +262,52 @@
GT(df_pandas)
-
+
@@ -345,7 +345,7 @@
Getting column names
The code below shows the different methods required to get column names as a list from Pandas and Polars.
Notice that the two lines of code aren’t too different—Pandas just requires an extra .tolist() piece. We could create a special function, that returns a list of names, depending on the type of the input DataFrame.
-
+
def get_column_names(data) ->list[str]:# pandas specific ----
@@ -380,7 +380,7 @@
How we made Pa
Inverting dependency with databackend
Inverting dependency on DataFrame libraries means that we check whether something is a specific type of DataFrame, without using imports. This is done through the package databackend, which we copied into Great Tables.
It works by creating placeholder classes, which stand in for the DataFrames they’re detecting:
-
+
from great_tables._databackend import AbstractBackend
@@ -409,7 +409,7 @@
Inve
Separating concerns with singledispatch
While databackend removes dependencies, the use of singledispatch from the built-in functools module separates out the logic for handling Polars DataFrames from the logic for Pandas DataFrames. This makes it easier to think one DataFrame at a time, and also gets us better type hinting.
Here’s a basic example, showing the get_column_names() function re-written using singledispatch:
-
+
from functools import singledispatch
@@ -440,7 +440,7 @@
Se
The use of PdDataFrame is what signifies “run this for Pandas DataFrames”.
With the get_column_names implementations defined, we can call it like a normal function:
-
+
get_column_names(df_pandas) # pandas versionget_column_names(df_polars) # polars version
Let’s look at an example of a simple table with actual data to tie this theory to practice.
-
+
-
+
@@ -451,7 +451,7 @@
A
Table Footer: a place for additional information pertaining to the table content
Here’s a table that takes advantage of the different components available in Great Tables. It contains the names and addresses of people.
-
+
Show the code
from great_tables import GT, md, system_fonts
@@ -475,52 +475,52 @@
A
)
-
+
@@ -623,7 +623,7 @@
Formatting
a compact integer value (fmt_integer()): 134K
The problem grows worse when values need to be conveyed as images or plots. If you’re a medical analyst, for example, you might need to effectively convey whether test results for a patient are improving or worsening over time. Reading such data as a sequence of numbers across a row can slow interpretation. But by using nanoplots, available as the fmt_nanoplot() formatting method, readers can spot trends right away. Here’s an example that provides test results over a series of days.
-
+
Show the code
from great_tables import GT, md
@@ -651,52 +651,52 @@
Using fmt_flag() to incorporate country flag icons
When tables contain country-level data, having a more visual representation for a country can help the reader more quickly parse the table contents. The new fmt_flag() method makes this easy to accomplish. You just need to have either two-letter country codes or three-letter country codes in a column.
Here’s an example where country flags, shown as simplified circular icons, can be added to a table with fmt_flag():
This slice of the peeps dataset has country codes in their 3-letter form (i.e., "USA", "SVN", and "CAN") within the country column. So long as they are correct, fmt_flag() will perform the conversion to flag icons. Also, there’s a little bit of interactivity here: when hovering over a flag, the country name will appear as a tooltip!
We have the power to display multiple flag icons within a single cell. To make this happen, the country codes need to be combined in a single string where each code is separated by a comma (e.g., "US,DE,GB"). Here’s an example that uses a portion of the films dataset:
The new fmt_icon() method gives you the ability to easily include FontAwesome icons in a table. It uses a similar input/output scheme as with fmt_flag(): provide the short icon name (e.g., "table", "music", "globe", etc.) or a comma-separated list of them, and fmt_icon() will provide the Font Awesome icon in place. Let’s see it in action with an example that uses the metro dataset:
-
+
from great_tables "left"))
-
+
@@ -664,7 +664,7 @@ font-style: inherit;">"left")
fmt_percent()
Here’s a comprehensive example table that demonstrates how this type of formatting looks.
People doing analytics in public transit are active in developing open data standards (like GTFS, GTFS-RT, and TIDES). These open data sources are complex—they cover schedules that change from week to week, busses moving in realtime, and passenger events. As people like me work more and more on open source tools, we start to lose touch with data analysis in realistic, complex settings. Working on open source transit data is an opportunity for me to ensure my open source tooling work helps people solve real, complex problems.
+
People doing analytics in public transit are active in developing open data standards (like GTFS, GTFS-RT, and TIDES). These open data sources are complex—they cover schedules that change from week to week, buses moving in realtime, and passenger events. As people like me work more and more on open source tools, we start to lose touch with data analysis in realistic, complex settings. Working on open source transit data is an opportunity for me to ensure my open source tooling work helps people solve real, complex problems.
An inspiration for this angle is the book R for Data Science, which uses realistic datasets—like NYC flights data—to teach data analysis using an ecosystem of packages called the Tidyverse. The Tidyverse packages have dozens of example datasets, and I think this focus on working through examples is part of what made their design so great.
A few years ago, I worked with the Cal-ITP project to build out a warehouse for their GTFS schedule and realtime data. This left a profound impression on me: transit data is perfect for educating on data analyses in R and Python, as well as analytics engineering with tools like dbt or sqlmesh. Many analysts in public transit are querying warehouses, which opens up interesting use-cases with tools like dbplyr (in R) and ibis (in Python).
If you’ve seen the Great Tables documentation for GT.fmt_image(), then you’ve basked in this beautiful example from our Paris metro dataset.
-
+
Code
"passengers"))
-
+
@@ -1248,7 +1248,7 @@ Scheduling a workshop
Collaboration
I’m interested in understanding major challenges analytics teams working on public transit face, and the kind of strategic and tooling support they’d most benefit from. If you’re working on analytics in public transit, I would love to hear about what you’re working on, and the tools you use most.
-
One topic I’ve discussed with a few agencies is ghost busses, which is when a bus is scheduled but never shows up. This is an interesting analysis because it combines GTFS schedule data with GTFS-RT realtime bus data.
+
One topic I’ve discussed with a few agencies is ghost buses, which is when a bus is scheduled but never shows up. This is an interesting analysis because it combines GTFS schedule data with GTFS-RT realtime bus data.
Another is passenger events (e.g. people tapping on or off a bus). This data is challenging because different vendors data record and deliver this data in different ways. This can make it hard for analysts across agencies to discuss analyses—every analysis is different in its own way.
@@ -1311,7 +1311,7 @@ Finding the Right Case for Your Needs
Preparations
For this demonstration, we’ll use the first five rows of the built-in metro dataset, specifically the name and lines columns.
To ensure a smooth walkthrough, we’ll manipulate the data (a Python dictionary) directly. However, in real-world applications, such operations are more likely performed at the DataFrame level to leverage the benefits of vectorized operations.
-
+
Show the Code
""")
Case 1: Local File Paths
Case 1 demonstrates how to simulate a column containing strings representing local file paths. We’ll use images stored in the data/metro_images directory of Great Tables:
Local file paths can vary depending on the operating system, which makes it easy to accidentally construct invalid paths. A good practice to mitigate this is to use Python’s built-in pathlib module to construct paths first and then convert them to strings. In this example, img_local_paths is actually an instance of pathlib.Path.
-
+
from pathlib # True
The case1 column is quite lengthy due to the inclusion of img_local_paths. In Case 3, we’ll share a useful trick to avoid repeating the directory name each time—stay tuned!
For now, let’s use GT.fmt_image() to render images by passing "case1" as the first argument:
Case 2 demonstrates how to simulate a column containing strings representing HTTP/HTTPS URLs. We’ll use the same images as in Case 1, but this time, retrieve them from the Great Tables GitHub repository:
Case 3 demonstrates how to use the path= argument to specify images relative to a base directory or URL. This approach eliminates much of the repetition in file names, offering a solution to the issues in Case 1 and Case 2.
Below is a Pandas DataFrame called metro_mini3, where the case3 column contains file names that we aim to render as images.
-
+
Show the Code
metro_mini3 "lines"]
Now we can use GT.fmt_image() to render the images by passing "case3" as the first argument and specifying either img_local_paths or img_url_paths as the path= argument:
Case 4: Image Names Using Both the path= and file_pattern= Arguments
Case 4 demonstrates how to use path= and file_pattern= to specify images with names following a common pattern. For example, you could use file_pattern="metro_{}.svg" to reference images like metro_1.svg, metro_2.svg, and so on.
Below is a Pandas DataFrame called metro_mini4, where the case4 column contains a copy of data["lines"], which we aim to render as images.
-
+
Show the Code
metro_mini4 "lines"]})
First, define a string pattern to illustrate the file naming convention, using {} to indicate the variable portion:
-
+
file_pattern =.svg"
Next, pass "case4" as the first argument, along with img_local_paths or img_url_paths as the path= argument, and file_pattern as the file_pattern= argument. This allows GT.fmt_image() to render the images:
@@ -2379,7 +2379,7 @@ Using file_pattern= Independently
The file_pattern= argument is typically used in conjunction with the path= argument, but this is not a strict rule. If your local file paths or HTTP/HTTPS URLs follow a pattern, you can use file_pattern= alone without path=. This allows you to include the shared portion of the file paths or URLs directly in file_pattern, as shown below:
Remember, you can always use html() to manually construct your desired output. For example, the previous table can be created without relying on vals.fmt_image() like this:
-
+
( GT(metro_mini) .fmt_image(
Generating a LaTeX table with Great Tables
We can use the GT.as_latex() method to generate LaTeX table code. This code includes important structural pieces like titles, spanners, and value formatting. For example, here’s a simple table output as LaTeX code:
-
+
Show the Code
```
Current limitations of LaTeX table output
The as_latex() method is still experimental and has some limitations. The following table lists the work epics that have been done and those planned:
-
+
-
+
@@ -3667,7 +3667,7 @@ font-style: inherit;">```
Starting things off with a big GT table
The table we’ll make uses the nuclides dataset (available in the great_tables.data module). Through use of the tab_*() methods, quite a few table components (hence locations) will be added. We have hidden the code here because it is quite lengthy but you’re encouraged to check it out to glean some interesting GT tricks.
-
+
Show the code
2)gt_tbl
-
+
@@ -4133,7 +4133,7 @@ font-style: inherit;">2)
Make the values in the atomic_mass and half_life use a monospace font.
fill the background of isotopes with STABLE half lives to be PaleTurquoise.
Aside from decking out the loc module with all manner of location methods, we’ve added a little something to the style module: style.css()! What’s it for? It lets you supply style declarations to its single rule= argument.
As an example, I might want to indent some text in one or more table cells. You can’t really do that with the style.text() method since it doesn’t have an indent= argument. So, in Great Tables 0.13.0 you can manually indent the row label text for the ‘STABLE’ rows using a CSS style rule:
The combined location helpers: loc.column_header() and loc.footer()
Look, I know we brought up the expression fine-grained before—right in the first paragraph—but sometimes you need just the opposite. There are lots of little locations in a GT table and some make for logical groupings. To that end, we have the concept of combined location helpers.
Let’s set a grey background fill on the stubhead, column header, and footer:
Although it really doesn’t appear to have separate locations, the table header (produced by way of tab_header()) can have two of them: the title and the subtitle (the latter is optional). These can be targeted via loc.title() and loc.subtitle(). Let’s focus in on the title location and set an aliceblue background fill on the title, along with some font and border adjustments.
Looks good. Notice that the title location is separate from the subtitle one, the background fill reveals the extent of its area.
A subtitle is an optional part of the header. We do have one in our table example, so let’s style that as well. The style.css() method will be used to give the subtitle text some additional top and bottom padding, and, we’ll put in a fancy background involving a linear gradient.
None of what was done above could be done prior to v0.13.0. The style.css() method makes this all possible.
The combined location helper for the title and the subtitle locations is loc.header(). As mentioned before, it can be used as a shorthand for locations=[loc.title(), loc_subtitle()] and it’s useful here where we want to change the font for the title and subtitle text.
When it comes to styling, you can use tab_options() for some of the basics and use tab_style() for the more demanding styling tasks. And you could combine the usage of both in your table. Let’s set a default honeydew background fill on the body values:
Looking good! And we don’t have to apply the font to the entire table. We might just wanted to use a Google Font in the table body. For that use case, tab_style() is the preferred method. Here’s an example that uses the IBM Plex Mono typeface.