When something goes wrong, the data is already there.
Continuously samples PostgreSQL system state via pg_cron. No external agents, sidecars, or polling required.
postgres=# SELECT component, status, details FROM pgfr_record.health_check(); component | status | details ------------------------+---------+--------------------------------- Flight Recorder System | ENABLED | Mode: normal Schema Size | OK | 186 MB / 1024 MB (18.2%) Circuit Breaker | OK | 0 trips in last hour Sample Collection | OK | Last: 00:00:34 ago Snapshot Collection | OK | Last: 00:00:42 ago Data Volume | INFO | Samples: 12483, Snapshots: 8762 pg_stat_statements | Healthy | Utilization: 28% (1387/5000) pg_cron Jobs | OK | 4/4 jobs active and running (8 rows)
Flight Recorder samples PostgreSQL state every minute — wait events, locks, and active sessions in lightweight ring buffers; WAL, I/O, and query stats in durable snapshots — so when an incident happens, the data's already there.
Per-minute sampling into UNLOGGED ring buffers for low overhead. Automatically flushed to durable aggregates and archives.
Captures full system state: WAL activity, checkpoint behavior, buffer I/O, table and index statistics, replication, and configuration.
Pure SQL. No C extensions, no external processes. Just pg_cron and PL/pgSQL capturing every dimension of PostgreSQL performance.
Sampled wait events with type classification. See exactly what backends are waiting on.
Blocked sessions with lock types, durations, and blocker identification.
WAL generation rates, checkpoint frequency and duration, forced checkpoint detection.
Per-backend I/O by checkpointer, autovacuum, clients, and bgwriter with timing.
pg_stat_statements deltas: execution time, rows, buffer hits, temp spills, WAL per query.
Sequential scans, index usage, HOT updates, dead tuples, bloat estimation, unused indexes.
Replica lag (write, flush, replay), LSN positions, sync state per connection.
Configuration snapshot diffs, GUC change detection, per-role overrides, health checks.
Pre-built functions for every stage of database operations, from daily health checks to post-incident analysis.
One function tells you if everything is healthy. The report gives you a summary of the last hour's activity, anomalies, and trends.
-- Quick health check SELECT * FROM pgfr_record.health_check(); -- Full diagnostic report SELECT pgfr_analyze.report('1 hour'); -- Check for alerts SELECT * FROM pgfr_analyze.check_alerts('1 hour'); -- Preflight check before deployment SELECT * FROM pgfr_analyze.preflight_check();
Apply the troubleshooting profile to keep collecting under load, then use time-travel functions to reconstruct exactly what happened.
-- Apply troubleshooting profile SELECT * FROM pgfr_record.apply_profile('troubleshooting'); -- What was happening at a specific time? SELECT * FROM pgfr_analyze.what_happened_at( '2024-01-15 14:32' ); -- Reconstruct an incident timeline SELECT * FROM pgfr_analyze.incident_timeline( '2024-01-15 14:00'::timestamptz, '2024-01-15 15:00'::timestamptz ); -- Return to normal after incident SELECT * FROM pgfr_record.apply_profile('default');
Find regressions, query storms, table hotspots, and unused indexes across any time window.
-- Find performance regressions SELECT * FROM pgfr_analyze.detect_regressions('1 day'); -- Find query storms SELECT * FROM pgfr_analyze.detect_query_storms('1 hour'); -- Table hotspots SELECT * FROM pgfr_analyze.table_hotspots( now() - '1 day', now() ); -- Find unused indexes SELECT * FROM pgfr_analyze.unused_indexes('7 days');
Track growth trends, generate quarterly reviews, and monitor vacuum health for long-term capacity management.
-- Capacity summary with growth trends SELECT * FROM pgfr_analyze.capacity_summary('7 days'); -- Quarterly review SELECT * FROM pgfr_analyze.quarterly_review(); -- Capacity dashboard view SELECT * FROM pgfr_analyze.capacity_dashboard;
Automatic protections ensure Flight Recorder never impacts your production workload. Every collection run is guarded.
Automatically skips collection when recent runs exceeded the time threshold.
Skips collection when the database is under heavy connection pressure.
Each data collection query has its own timeout to prevent catalog lock hangs.
Outer statement_timeout on pg_cron collector jobs provides a final safety net.
A modular architecture. The core recorder stands alone; add analysis as needed.
Tables, collection functions, scheduling, ring buffers, and safety mechanisms.
Reporting, anomaly detection, time travel, and capacity planning.
Requires PostgreSQL 15+, pg_cron, and superuser privileges. No compilation, no agents, no config files.
Download from GitHub Releases or clone the repo, then run the SQL scripts.
psql --single-transaction -f pgfr_record/install.sql psql --single-transaction -f pgfr_analyze/install.sql
One function call schedules all pg_cron jobs and starts continuous collection.
SELECT pgfr_record.enable(); SELECT * FROM pgfr_record.health_check();
Your database is now recording. Use the built-in analysis functions or query the tables directly.
SELECT pgfr_analyze.report('1 hour'); SELECT * FROM pgfr_record.snapshots ORDER BY captured_at DESC;