Performance Guide ================= Performance characteristics and optimization tips for PlotSmith. Performance Characteristics --------------------------- PlotSmith is designed for efficiency with vectorized operations where possible. However, performance depends on several factors: **Data Size** - Small datasets (< 1,000 points): Very fast (< 10ms) - Medium datasets (1,000 - 10,000 points): Fast (< 100ms) - Large datasets (10,000 - 100,000 points): Moderate (< 1s) - Very large datasets (> 100,000 points): Consider sampling **Chart Type** - Simple plots (line, scatter): Fastest - Statistical plots (box, violin): Moderate (requires computation) - Heatmaps: Moderate (depends on matrix size) - Complex charts (waterfall, waffle): Slower (more computation) Optimization Tips ----------------- **1. Sample Large Datasets** For very large datasets, consider sampling: .. code-block:: python import pandas as pd from plotsmith import plot_timeseries # Original large dataset large_data = pd.Series(...) # 1M+ points # Sample for plotting if len(large_data) > 10000: sample_data = large_data.sample(10000) fig, ax = plot_timeseries(sample_data) else: fig, ax = plot_timeseries(large_data) **2. Use Appropriate Backends** For non-interactive use (CI, scripts), use Agg backend: .. code-block:: python import matplotlib matplotlib.use('Agg') # Non-interactive, faster from plotsmith import plot_timeseries **3. Close Figures Explicitly** Always close figures to free memory: .. code-block:: python fig, ax = plot_timeseries(data) # ... use plot ... plt.close(fig) # Free memory **4. Batch Operations** When creating multiple plots, reuse figure objects: .. code-block:: python from plotsmith import figure fig, axes = figure(nrows=2, ncols=2) # Plot to each axis plt.close(fig) **5. Avoid Redundant Computations** Cache expensive computations: .. code-block:: python # Compute once correlation_matrix = df.corr() # Plot multiple times fig1, ax1 = plot_heatmap(correlation_matrix) fig2, ax2 = plot_heatmap(correlation_matrix, cmap='viridis') Memory Usage ------------ PlotSmith uses memory efficiently, but large plots can consume significant memory: - **Base memory**: ~10-50 MB - **Per plot**: ~1-5 MB (depends on data size) - **Large plots**: Up to 100+ MB for very large datasets To reduce memory usage: 1. Close figures when done 2. Sample large datasets 3. Use vectorized operations 4. Avoid storing many figure objects Benchmarking ------------ PlotSmith includes performance benchmarks. Run them with: .. code-block:: bash pytest tests/test_performance.py --benchmark-only This will show timing information for various operations. Performance Regression Testing ------------------------------- The CI pipeline includes performance benchmarks. If performance degrades significantly, the benchmarks will fail. For local performance testing: .. code-block:: bash pytest tests/test_performance.py -m slow --benchmark-compare Known Performance Considerations --------------------------------- 1. **Waterfall Charts**: Require cumulative calculations - slower for many categories 2. **Waffle Charts**: Require grid computation - slower for large grids 3. **Violin Plots**: Require kernel density estimation - slower for large datasets 4. **Heatmaps**: Annotation slows down large matrices significantly For production use with very large datasets, consider: - Data preprocessing and sampling - Asynchronous plotting - Caching plot results - Using specialized visualization libraries for extreme scales