Grammar of Graphics & Plotnine

Authors

Kamble Pushkar Sidharth

Kathan Vishal Shah

Ramji Purwar

Code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotnine import *

1 Grammar of Graphics

1.1 What is Grammar of Graphics

Key Concept

The Grammar of Graphics is a systematic approach to creating data visualizations by breaking charts into different components.

The Grammar of Graphics breaks a chart into different components such as:

Data: The dataset being visualized.
Aesthetics: Mapping variables to visual properties.
Geometries: Shapes used to represent data (points, lines, bars, etc.).
Facets: Splitting data into multiple panels.
Statistics: Transformations applied to data before plotting.
Coordinates: The space where the plot is drawn (cartesian, polar, etc.).

1.2 Why is Grammar of Graphics Important?

The Grammar of Graphics allows users to build complex plots easily by layering components instead of hardcoding each visualization.

Let’s see that with an example:

Code

mpg = sns.load_dataset("mpg")
mpg.head()

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite
3	16.0	8	304.0	150.0	3433	12.0	70	usa	amc rebel sst
4	17.0	8	302.0	140.0	3449	10.5	70	usa	ford torino

1.2.0.1 Traditional Matplotlib plot

Code

plt.figure(figsize=(8,5))
plt.scatter(mpg['weight'], mpg['mpg'], color='blue')

plt.xlabel("Car Weight")
plt.ylabel("Miles per Gallon (MPG)")
plt.title("Car Weight vs. MPG")
plt.show()

1.2.0.2 Grammar of Graphics plot with Plotnine

Code

p = (
    ggplot(mpg) +
    aes(x="weight", y="mpg", color="origin") +
    geom_point() +
    labs(x="Car Weight", y="Miles per Gallon (MPG)") +
    theme_minimal()
)

p.draw()
p.show()

Observation

The Grammar of Graphics approach allows us to easily add additional information (car origin) to our plot without significantly changing the code structure.

2 Plotnine: A Python Implementation of Grammar of Graphics

Plotnine is a Python library based on the Grammar of Graphics, providing a structured and systematic way to create data visualizations.

2.1 Key Features

Grammar of Graphics Approach
Simple and Concise Syntax
Multiple Geometries
Statistical Transformations
Faceting (Subplots)
Themes for Customization
Supports Custom Labels & Titles
Works with Pandas & DataFrames
Exporting & Saving Plots

2.2 Creating Plots with Plotnine

2.2.1 How to Install plotnine?

Open command prompt or teminal and give this command:

pip install plotnine

2.2.2 Import and Version check

Code

import plotnine
print("Plotnine version:", plotnine.__version__)

Plotnine version: 0.14.5

2.2.3 Basic Scatter Plot

Code

df = pd.DataFrame({
    'x': range(10),
    'y': [4, 5, 9, 1, 7, 3, 2, 2, 8, 9]
})

p = (
    ggplot(df) +
    aes(x='x', y='y') +
    geom_point()
)

p.draw()
p.show()

2.2.4 Basic Histogram

Code

df_hist = pd.DataFrame({'data': np.random.randn(1000)})

p = (
    ggplot(df_hist) +
    aes(x='data') +
    geom_histogram(binwidth=0.5, fill='lightblue', color='black')
)

p.draw()
p.show()

2.2.5 Simple Line plot

Code

x = np.arange(1, 11)
y = [i*i for i in x]

df = pd.DataFrame({
'x': x,
'y': y
})

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_line(color='blue') +
    labs(title="Line Plot Example", x="X-Axis", y="Y-Axis")
)

p.draw()
p.show()

2.2.6 Simple Bar Plot

Code

df_bar = pd.DataFrame({
  'vegetable': ['Potata', 'Carrot', 'Peas', 'Tomata'], 
  'price': [40, 25, 20, 60]}
)

p = (
  ggplot(df_bar) +
  aes(x='vegetable', y='price') +
  geom_bar(stat='identity', fill='skyblue')
)

p.draw()
p.show()

3 Demonstration of Plotnine Key Features

In this section, we’ll demonstrate each of the key features of Plotnine, showing how they contribute to creating powerful and flexible visualizations using the Grammar of Graphics approach.

3.1 Grammar of Graphics Approach

The Grammar of Graphics approach allows us to build plots layer by layer. And as plotnine is build on the principles of Grammer of Graphics, it is easier to appply it with help of plotnine.

Let’s start with a basic scatter plot and then add layers to it.

Code

df = pd.DataFrame({
    'x': range(10),
    'y': np.random.randint(1, 11, 10)
})

p = (
    ggplot(df) +
    aes(x='x', y='y') +
    geom_point()
)

p.draw()
p.show()

3.2 Simple and Concise Syntax

Notice how we can easily add layers to our plot using the + operator. This makes the syntax simple and intuitive.

p = (ggplot(df, aes(x='x', y='y', color='category'))
     + geom_point()
     + geom_smooth(method='lm', se=False)
     + labs(title="Scatter Plot with Trend Lines",
            x="X-axis",
            y="Y-axis",
            color="Category")
     + theme_minimal()
)

3.3 Multiple Geometries

plotnine supports various geometries through its geom_* functions, which allow you to create different types of plots. These geometries are based on the Grammar of Graphics concept, similar to ggplot2 in R. Some key geometries include:

geom_point(): Creates scatter plots1.
geom_line(): Draws line plots1.
geom_bar(): Produces bar charts1.
geom_polygon(): Generates polygon shapes2.
geom_map(): Specifically designed for plotting geographic data and creating maps57.

These geometries can be combined and layered to create complex visualizations. Let’s combine points and lines in one plot.

Code

np.random.seed(42)
df = pd.DataFrame({
    'x': np.random.normal(0, 1, 100),
    'y': np.random.normal(0, 1, 100),
    'category': np.random.choice(['A', 'B', 'C'], 100)
})

p = (ggplot(df, aes(x='x', y='y', color='category'))
     + geom_point()
     + geom_smooth(method='lm', se=False)
     + labs(title="Scatter Plot with Trend Lines",
            x="X-axis",
            y="Y-axis",
            color="Category")
     + theme_minimal()
)

p.draw()
p.show()

3.4 Statistical Transformations

Statistical transformations in plotnine are an important feature that allow you to aggregate and transform your data before plotting. Statistical transformations can compute new values based on the input data, enabling you to display summary statistics or derived metrics instead of raw data points. Here are some common transformations:

stat_count(): Counts the number of cases at each x position
stat_bin(): Bins continuous data for histograms
stat_smooth(): Adds a smoothed conditional mean
stat_summary(): Summarizes y values for each unique x value

By leveraging statistical transformations, plotnine enables you to create informative visualizations that go beyond simply plotting raw data, allowing for more insightful data exploration and presentation. Here’s an example…

Code

x = range(1, 11)
y = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
df = pd.DataFrame({
    'x': x, 
    'y': y
})

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_point() +
    geom_smooth(method='lm')
) 

p.draw()
p.show()

3.5 Faceting (Subplots)

Faceting in plotnine is a powerful technique that allows you to create multiple subplots based on categorical variables in your dataset. This feature enables you to split your main plot into several smaller plots, each representing a different category or combination of categories.

Types of faceting:

facet_wrap(): Creates a wrapped layout of subplots based on a single categorical variable.
facet_grid(): Forms a grid of subplots based on two categorical variables (rows and columns).

Code

df = pd.DataFrame({
    'x': np.arange(1, 11),
    'y': np.random.randint(1, 26, 10), 
    'category': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'] 
})

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_point() +
    facet_wrap("~category")
)

p.draw()
p.show()

3.6 Themes for Customization

Plotnine offers a variety of themes for customizing the appearance of your plots.

Built-in themes: Plotnine provides several pre-defined themes, such as theme_void(), which creates a minimal base theme1.
Custom themes: You can create custom themes using the theme() function. This allows you to modify various elements of your plot, including axis titles, legend appearance, plot title, and more.

Code

from plotnine.themes import theme_minimal

p = (
    ggplot(df, aes(x='x', y='y')) +
    geom_point() +
    theme_minimal()
)
 
p.draw()
p.show()

3.7 Supports Custom Labels & Titles

We can easily customize labels and titles for our plots.

ggplot(df, aes(x='x', y='y')) + 
geom_point() + 
labs(title = "Scatter Plot", x = "X-Axis", y = "Y-Axis")

3.8 Works with Pandas & DataFrames

As you’ve seen in all examples, Plotnine works seamlessly with Pandas DataFrames.

3.9 Exporting & Saving Plots

Plotnine allows you to save plots easily. Here’s how you can save a plot:

p = ggplot(df, aes(x='x', y='y')) + geom_point()
p.save("plot.png")

This demonstration showcases the power and flexibility of Plotnine in implementing the Grammar of Graphics. Each feature contributes to making data visualization more intuitive, customizable, and powerful in Python.

4 Conclusion

Plotnine brings the Grammar of Graphics to Python, offering a clear and flexible way to create data visualizations. It breaks down plots into components like data, aesthetics, and geometries, allowing users to build visualizations layer by layer.

With seamless integration into Pandas and support for statistical transformations, Plotnine is a valuable tool for data scientists. From simple scatter plots to complex faceted charts, it enables customization and clarity in data presentation.

As data visualization remains key to analysis and communication, Plotnine helps create clear, reproducible, and visually appealing charts with ease.

1 Grammar of Graphics

1.1 What is Grammar of Graphics

1.2 Why is Grammar of Graphics Important?

1.2.0.1 Traditional Matplotlib plot

1.2.0.2 Grammar of Graphics plot with Plotnine

2 Plotnine: A Python Implementation of Grammar of Graphics

2.1 Key Features

2.2 Creating Plots with Plotnine

2.2.1 How to Install plotnine?

2.2.2 Import and Version check

2.2.3 Basic Scatter Plot

2.2.4 Basic Histogram

2.2.5 Simple Line plot

2.2.6 Simple Bar Plot

3 Demonstration of Plotnine Key Features

3.1 Grammar of Graphics Approach

3.2 Simple and Concise Syntax

3.3 Multiple Geometries

3.4 Statistical Transformations

3.5 Faceting (Subplots)

3.6 Themes for Customization

3.7 Supports Custom Labels & Titles

3.8 Works with Pandas & DataFrames

3.9 Exporting & Saving Plots

4 Conclusion

5 Useful Resources