Performance Guide
Performance characteristics and optimization tips for PlotSmith.
Performance Characteristics
PlotSmith is designed for efficiency with vectorized operations where possible. However, performance depends on several factors:
Data Size - Small datasets (< 1,000 points): Very fast (< 10ms) - Medium datasets (1,000 - 10,000 points): Fast (< 100ms) - Large datasets (10,000 - 100,000 points): Moderate (< 1s) - Very large datasets (> 100,000 points): Consider sampling
Chart Type - Simple plots (line, scatter): Fastest - Statistical plots (box, violin): Moderate (requires computation) - Heatmaps: Moderate (depends on matrix size) - Complex charts (waterfall, waffle): Slower (more computation)
Optimization Tips
1. Sample Large Datasets
For very large datasets, consider sampling:
import pandas as pd
from plotsmith import plot_timeseries
# Original large dataset
large_data = pd.Series(...) # 1M+ points
# Sample for plotting
if len(large_data) > 10000:
sample_data = large_data.sample(10000)
fig, ax = plot_timeseries(sample_data)
else:
fig, ax = plot_timeseries(large_data)
2. Use Appropriate Backends
For non-interactive use (CI, scripts), use Agg backend:
import matplotlib
matplotlib.use('Agg') # Non-interactive, faster
from plotsmith import plot_timeseries
3. Close Figures Explicitly
Always close figures to free memory:
fig, ax = plot_timeseries(data)
# ... use plot ...
plt.close(fig) # Free memory
4. Batch Operations
When creating multiple plots, reuse figure objects:
from plotsmith import figure
fig, axes = figure(nrows=2, ncols=2)
# Plot to each axis
plt.close(fig)
5. Avoid Redundant Computations
Cache expensive computations:
# Compute once
correlation_matrix = df.corr()
# Plot multiple times
fig1, ax1 = plot_heatmap(correlation_matrix)
fig2, ax2 = plot_heatmap(correlation_matrix, cmap='viridis')
Memory Usage
PlotSmith uses memory efficiently, but large plots can consume significant memory:
Base memory: ~10-50 MB
Per plot: ~1-5 MB (depends on data size)
Large plots: Up to 100+ MB for very large datasets
To reduce memory usage:
Close figures when done
Sample large datasets
Use vectorized operations
Avoid storing many figure objects
Benchmarking
PlotSmith includes performance benchmarks. Run them with:
pytest tests/test_performance.py --benchmark-only
This will show timing information for various operations.
Performance Regression Testing
The CI pipeline includes performance benchmarks. If performance degrades significantly, the benchmarks will fail.
For local performance testing:
pytest tests/test_performance.py -m slow --benchmark-compare
Known Performance Considerations
Waterfall Charts: Require cumulative calculations - slower for many categories
Waffle Charts: Require grid computation - slower for large grids
Violin Plots: Require kernel density estimation - slower for large datasets
Heatmaps: Annotation slows down large matrices significantly
For production use with very large datasets, consider: - Data preprocessing and sampling - Asynchronous plotting - Caching plot results - Using specialized visualization libraries for extreme scales