Grammar of Graphics & Plotnine

Authors

Kamble Pushkar Sidharth

Kathan Vishal Shah

Ramji Purwar

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotnine import *

1 Grammar of Graphics

1.1 What is Grammar of Graphics

Key Concept

The Grammar of Graphics is a systematic approach to creating data visualizations by breaking charts into different components.

The Grammar of Graphics breaks a chart into different components such as:

  • Data: The dataset being visualized.
  • Aesthetics: Mapping variables to visual properties.
  • Geometries: Shapes used to represent data (points, lines, bars, etc.).
  • Facets: Splitting data into multiple panels.
  • Statistics: Transformations applied to data before plotting.
  • Coordinates: The space where the plot is drawn (cartesian, polar, etc.).

Components of Grammar of Graphics

1.2 Why is Grammar of Graphics Important?

The Grammar of Graphics allows users to build complex plots easily by layering components instead of hardcoding each visualization.

Let’s see that with an example:

Code
mpg = sns.load_dataset("mpg")
mpg.head()
mpg cylinders displacement horsepower weight acceleration model_year origin name
0 18.0 8 307.0 130.0 3504 12.0 70 usa chevrolet chevelle malibu
1 15.0 8 350.0 165.0 3693 11.5 70 usa buick skylark 320
2 18.0 8 318.0 150.0 3436 11.0 70 usa plymouth satellite
3 16.0 8 304.0 150.0 3433 12.0 70 usa amc rebel sst
4 17.0 8 302.0 140.0 3449 10.5 70 usa ford torino

1.2.0.1 Traditional Matplotlib plot

Code
plt.figure(figsize=(8,5))
plt.scatter(mpg['weight'], mpg['mpg'], color='blue')

plt.xlabel("Car Weight")
plt.ylabel("Miles per Gallon (MPG)")
plt.title("Car Weight vs. MPG")
plt.show()

1.2.0.2 Grammar of Graphics plot with Plotnine

Code
p = (
    ggplot(mpg) +
    aes(x="weight", y="mpg", color="origin") +
    geom_point() +
    labs(x="Car Weight", y="Miles per Gallon (MPG)") +
    theme_minimal()
)

p.draw()
p.show()

Observation

The Grammar of Graphics approach allows us to easily add additional information (car origin) to our plot without significantly changing the code structure.

2 Plotnine: A Python Implementation of Grammar of Graphics

Plotnine is a Python library based on the Grammar of Graphics, providing a structured and systematic way to create data visualizations.

2.1 Key Features

  1. Grammar of Graphics Approach
  2. Simple and Concise Syntax
  3. Multiple Geometries
  4. Statistical Transformations
  5. Faceting (Subplots)
  6. Themes for Customization
  7. Supports Custom Labels & Titles
  8. Works with Pandas & DataFrames
  9. Exporting & Saving Plots

2.2 Creating Plots with Plotnine

2.2.1 How to Install plotnine?

Open command prompt or teminal and give this command:

pip install plotnine

2.2.2 Import and Version check

Code
import plotnine
print("Plotnine version:", plotnine.__version__)
Plotnine version: 0.14.5

2.2.3 Basic Scatter Plot

Code
df = pd.DataFrame({
    'x': range(10),
    'y': [4, 5, 9, 1, 7, 3, 2, 2, 8, 9]
})

p = (
    ggplot(df) +
    aes(x='x', y='y') +
    geom_point()
)

p.draw()
p.show()

2.2.4 Basic Histogram

Code
df_hist = pd.DataFrame({'data': np.random.randn(1000)})

p = (
    ggplot(df_hist) +
    aes(x='data') +
    geom_histogram(binwidth=0.5, fill='lightblue', color='black')
)

p.draw()
p.show()

2.2.5 Simple Line plot

Code
x = np.arange(1, 11)
y = [i*i for i in x]

df = pd.DataFrame({
'x': x,
'y': y
})

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_line(color='blue') +
    labs(title="Line Plot Example", x="X-Axis", y="Y-Axis")
)

p.draw()
p.show()

2.2.6 Simple Bar Plot

Code
df_bar = pd.DataFrame({
  'vegetable': ['Potata', 'Carrot', 'Peas', 'Tomata'], 
  'price': [40, 25, 20, 60]}
)

p = (
  ggplot(df_bar) +
  aes(x='vegetable', y='price') +
  geom_bar(stat='identity', fill='skyblue')
)

p.draw()
p.show()

3 Demonstration of Plotnine Key Features

In this section, we’ll demonstrate each of the key features of Plotnine, showing how they contribute to creating powerful and flexible visualizations using the Grammar of Graphics approach.

3.1 Grammar of Graphics Approach

The Grammar of Graphics approach allows us to build plots layer by layer. And as plotnine is build on the principles of Grammer of Graphics, it is easier to appply it with help of plotnine.

Let’s start with a basic scatter plot and then add layers to it.

Code
df = pd.DataFrame({
    'x': range(10),
    'y': np.random.randint(1, 11, 10)
})

p = (
    ggplot(df) +
    aes(x='x', y='y') +
    geom_point()
)

p.draw()
p.show()

3.2 Simple and Concise Syntax

Notice how we can easily add layers to our plot using the + operator. This makes the syntax simple and intuitive.

p = (ggplot(df, aes(x='x', y='y', color='category'))
     + geom_point()
     + geom_smooth(method='lm', se=False)
     + labs(title="Scatter Plot with Trend Lines",
            x="X-axis",
            y="Y-axis",
            color="Category")
     + theme_minimal()
)

3.3 Multiple Geometries

plotnine supports various geometries through its geom_* functions, which allow you to create different types of plots. These geometries are based on the Grammar of Graphics concept, similar to ggplot2 in R. Some key geometries include:

  • geom_point(): Creates scatter plots1.
  • geom_line(): Draws line plots1.
  • geom_bar(): Produces bar charts1.
  • geom_polygon(): Generates polygon shapes2.
  • geom_map(): Specifically designed for plotting geographic data and creating maps57.

These geometries can be combined and layered to create complex visualizations. Let’s combine points and lines in one plot.

Code
np.random.seed(42)
df = pd.DataFrame({
    'x': np.random.normal(0, 1, 100),
    'y': np.random.normal(0, 1, 100),
    'category': np.random.choice(['A', 'B', 'C'], 100)
})

p = (ggplot(df, aes(x='x', y='y', color='category'))
     + geom_point()
     + geom_smooth(method='lm', se=False)
     + labs(title="Scatter Plot with Trend Lines",
            x="X-axis",
            y="Y-axis",
            color="Category")
     + theme_minimal()
)

p.draw()
p.show()

3.4 Statistical Transformations

Statistical transformations in plotnine are an important feature that allow you to aggregate and transform your data before plotting. Statistical transformations can compute new values based on the input data, enabling you to display summary statistics or derived metrics instead of raw data points. Here are some common transformations:

  • stat_count(): Counts the number of cases at each x position
  • stat_bin(): Bins continuous data for histograms
  • stat_smooth(): Adds a smoothed conditional mean
  • stat_summary(): Summarizes y values for each unique x value

By leveraging statistical transformations, plotnine enables you to create informative visualizations that go beyond simply plotting raw data, allowing for more insightful data exploration and presentation. Here’s an example…

Code
x = range(1, 11)
y = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
df = pd.DataFrame({
    'x': x, 
    'y': y
})

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_point() +
    geom_smooth(method='lm')
) 

p.draw()
p.show()

3.5 Faceting (Subplots)

Faceting in plotnine is a powerful technique that allows you to create multiple subplots based on categorical variables in your dataset. This feature enables you to split your main plot into several smaller plots, each representing a different category or combination of categories.

Types of faceting:

  • facet_wrap(): Creates a wrapped layout of subplots based on a single categorical variable.
  • facet_grid(): Forms a grid of subplots based on two categorical variables (rows and columns).
Code
df = pd.DataFrame({
    'x': np.arange(1, 11),
    'y': np.random.randint(1, 26, 10), 
    'category': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'] 
})

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_point() +
    facet_wrap("~category")
)

p.draw()
p.show()

3.6 Themes for Customization

Plotnine offers a variety of themes for customizing the appearance of your plots.

  • Built-in themes: Plotnine provides several pre-defined themes, such as theme_void(), which creates a minimal base theme1.
  • Custom themes: You can create custom themes using the theme() function. This allows you to modify various elements of your plot, including axis titles, legend appearance, plot title, and more.
Code
from plotnine.themes import theme_minimal

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_point() +
    theme_minimal()
)
 
p.draw()
p.show()

3.7 Supports Custom Labels & Titles

We can easily customize labels and titles for our plots.

ggplot(df, aes(x='x', y='y')) + 
geom_point() + 
labs(title = "Scatter Plot", x = "X-Axis", y = "Y-Axis")

3.8 Works with Pandas & DataFrames

As you’ve seen in all examples, Plotnine works seamlessly with Pandas DataFrames.

3.9 Exporting & Saving Plots

Plotnine allows you to save plots easily. Here’s how you can save a plot:

p = ggplot(df, aes(x='x', y='y')) + geom_point()
p.save("plot.png")

This demonstration showcases the power and flexibility of Plotnine in implementing the Grammar of Graphics. Each feature contributes to making data visualization more intuitive, customizable, and powerful in Python.

4 Conclusion

Plotnine brings the Grammar of Graphics to Python, offering a clear and flexible way to create data visualizations. It breaks down plots into components like data, aesthetics, and geometries, allowing users to build visualizations layer by layer.

With seamless integration into Pandas and support for statistical transformations, Plotnine is a valuable tool for data scientists. From simple scatter plots to complex faceted charts, it enables customization and clarity in data presentation.

As data visualization remains key to analysis and communication, Plotnine helps create clear, reproducible, and visually appealing charts with ease.

5 Useful Resources