Recording

pg_flight_recorder

When something goes wrong, the data is already there.

Continuously samples PostgreSQL system state via pg_cron. No external agents, sidecars, or polling required.

psql — postgres

postgres=# SELECT component, status, details FROM pgfr_record.health_check();

       component        | status  |             details
------------------------+---------+---------------------------------
 Flight Recorder System | ENABLED | Mode: normal
 Schema Size            | OK      | 186 MB / 1024 MB (18.2%)
 Circuit Breaker        | OK      | 0 trips in last hour
 Sample Collection      | OK      | Last: 00:00:34 ago
 Snapshot Collection    | OK      | Last: 00:00:42 ago
 Data Volume            | INFO    | Samples: 12483, Snapshots: 8762
 pg_stat_statements     | Healthy | Utilization: 28% (1387/5000)
 pg_cron Jobs           | OK      | 4/4 jobs active and running
(8 rows)

Architecture

Two data paths.
Complete visibility.

Flight Recorder samples PostgreSQL state every minute — wait events, locks, and active sessions in lightweight ring buffers; WAL, I/O, and query stats in durable snapshots — so when an incident happens, the data's already there.

Sampled Activity

Wait events, sessions, locks

Per-minute sampling into UNLOGGED ring buffers for low overhead. Automatically flushed to durable aggregates and archives.

pg_stat_activity → Ring Buffers → Aggregates → Archives

Frequency: 1 min

Ring: 2h

Archives: 7d

Periodic Snapshots

WAL, checkpoints, I/O, stats

Captures full system state: WAL activity, checkpoint behavior, buffer I/O, table and index statistics, replication, and configuration.

pg_stat_* → snapshot() → Durable Tables

Frequency: 1 min

Retention: 30 days

Capabilities

Everything your database does.
Recorded.

Pure SQL. No C extensions, no external processes. Just pg_cron and PL/pgSQL capturing every dimension of PostgreSQL performance.

Wait Events

Sampled wait events with type classification. See exactly what backends are waiting on.

Lock Contention

Blocked sessions with lock types, durations, and blocker identification.

WAL & Checkpoints

WAL generation rates, checkpoint frequency and duration, forced checkpoint detection.

I/O Analysis

Per-backend I/O by checkpointer, autovacuum, clients, and bgwriter with timing.

Query Performance

pg_stat_statements deltas: execution time, rows, buffer hits, temp spills, WAL per query.

Table & Index Stats

Sequential scans, index usage, HOT updates, dead tuples, bloat estimation, unused indexes.

Replication

Replica lag (write, flush, replay), LSN positions, sync state per connection.

Config Tracking

Configuration snapshot diffs, GUC change detection, per-role overrides, health checks.

Workflows

From monitoring to
incident response.

Pre-built functions for every stage of database operations, from daily health checks to post-incident analysis.

Daily health checks

One function tells you if everything is healthy. The report gives you a summary of the last hour's activity, anomalies, and trends.

Collection status and ring buffer health
Disk usage and circuit breaker state
Connection utilization and XID age
Anomaly detection with severity levels

Daily monitoring

-- Quick health check
SELECT * FROM pgfr_record.health_check();

-- Full diagnostic report
SELECT pgfr_analyze.report('1 hour');

-- Check for alerts
SELECT * FROM pgfr_analyze.check_alerts('1 hour');

-- Preflight check before deployment
SELECT * FROM pgfr_analyze.preflight_check();

Active incident response

Apply the troubleshooting profile to keep collecting under load, then use time-travel functions to reconstruct exactly what happened.

Apply troubleshooting profile for detailed capture
Point-in-time activity reconstruction
Incident timeline with correlated events
Blast radius analysis across tables and queries

Incident response

-- Apply troubleshooting profile
SELECT * FROM pgfr_record.apply_profile('troubleshooting');

-- What was happening at a specific time?
SELECT * FROM pgfr_analyze.what_happened_at(
  '2024-01-15 14:32'
);

-- Reconstruct an incident timeline
SELECT * FROM pgfr_analyze.incident_timeline(
  '2024-01-15 14:00'::timestamptz,
  '2024-01-15 15:00'::timestamptz
);

-- Return to normal after incident
SELECT * FROM pgfr_record.apply_profile('default');

Performance analysis

Find regressions, query storms, table hotspots, and unused indexes across any time window.

Query regression detection with root cause analysis
Query storm identification (sudden call spikes)
Table hotspot analysis by modification rate
Unused index detection for cleanup

Performance analysis

-- Find performance regressions
SELECT * FROM pgfr_analyze.detect_regressions('1 day');

-- Find query storms
SELECT * FROM pgfr_analyze.detect_query_storms('1 hour');

-- Table hotspots
SELECT * FROM pgfr_analyze.table_hotspots(
  now() - '1 day', now()
);

-- Find unused indexes
SELECT * FROM pgfr_analyze.unused_indexes('7 days');

Capacity planning

Track growth trends, generate quarterly reviews, and monitor vacuum health for long-term capacity management.

Weekly capacity summary with growth projections
Quarterly review with trend analysis
Vacuum diagnostics and bloat estimation
OID and XID consumption tracking

Capacity planning

-- Capacity summary with growth trends
SELECT * FROM pgfr_analyze.capacity_summary('7 days');

-- Quarterly review
SELECT * FROM pgfr_analyze.quarterly_review();

-- Capacity dashboard view
SELECT * FROM pgfr_analyze.capacity_dashboard;

Production Safety

Built to never be
the problem.

Automatic protections ensure Flight Recorder never impacts your production workload. Every collection run is guarded.

Circuit Breaker

Automatically skips collection when recent runs exceeded the time threshold.

avg_time > 1000ms → skip

Load Shedding

Skips collection when the database is under heavy connection pressure.

active > 70% max → skip

Section Timeouts

Each data collection query has its own timeout to prevent catalog lock hangs.

per-query timeout: 250ms

Job Timeouts

Outer statement_timeout on pg_cron collector jobs provides a final safety net.

job timeout: 500ms–60s

Extensions

Two extensions.
Install what you need.

A modular architecture. The core recorder stands alone; add analysis as needed.

Core

pgfr_record

Tables, collection functions, scheduling, ring buffers, and safety mechanisms.

Ring buffer sampling & archival
Configuration profiles
Circuit breaker & load shedding
Health check & diagnostics

psql -f pgfr_record/install.sql

Optional

pgfr_analyze

Reporting, anomaly detection, time travel, and capacity planning.

Anomaly & regression detection
Point-in-time reconstruction
Incident timeline & blast radius
Quarterly reviews & capacity

psql -f pgfr_analyze/install.sql

Get Started

Three steps to visibility.

Requires PostgreSQL 15+, pg_cron, and superuser privileges. No compilation, no agents, no config files.

Install

Download from GitHub Releases or clone the repo, then run the SQL scripts.

psql --single-transaction -f pgfr_record/install.sql
psql --single-transaction -f pgfr_analyze/install.sql

Enable

One function call schedules all pg_cron jobs and starts continuous collection.

SELECT pgfr_record.enable();
SELECT * FROM pgfr_record.health_check();

Query

Your database is now recording. Use the built-in analysis functions or query the tables directly.

SELECT pgfr_analyze.report('1 hour');
SELECT * FROM pgfr_record.snapshots ORDER BY captured_at DESC;

pg_flight_recorder

Two data paths.Complete visibility.

Wait events, sessions, locks

WAL, checkpoints, I/O, stats

Everything your database does.Recorded.

Wait Events

Lock Contention

WAL & Checkpoints

I/O Analysis

Query Performance

Table & Index Stats

Replication

Config Tracking

From monitoring toincident response.

Daily health checks

Active incident response

Performance analysis

Capacity planning

Built to never bethe problem.

Circuit Breaker

Load Shedding

Section Timeouts

Job Timeouts

Two extensions.Install what you need.

pgfr_record

pgfr_analyze

Three steps to visibility.

Install

Enable

Query

Two data paths.
Complete visibility.

Everything your database does.
Recorded.

From monitoring to
incident response.

Built to never be
the problem.

Two extensions.
Install what you need.