-
-
Notifications
You must be signed in to change notification settings - Fork 529
Description
Please complete the following information:
- OS: Debian GNU/Linux 12 (tested in containerized environments and bare metal)
- Kernel: 6.12.51-ts1-amd64
- libpq: 16.0.7 (runtime - verified via PQlibVersion())
- Psycopg version: 2.9.10
- Python version: 3.11.11
- PostgreSQL version: 16.7
- pip version: 24.0
- Authentication: Kerberos/GSSAPI
- Encryption: SSL
Describe the bug
psycopg2 2.9.10 exhibits a memory leak of ~800 KB/min when using libpq 16.x with repeated connection creation/destruction. The leak does NOT occur with libpq 13.x or when using psycopg3.
This appears to be psycopg2-specific:
Tom Lane (PostgreSQL core maintainer) tested pure C code using libpq 16.x with the same connection pattern and observed no leak. This indicates the issue is in how psycopg2 uses libpq 16.x, not in libpq itself.
(Reference: PostgreSQL bug report #19411)
We tested:
- Pure psycopg2 (no SQLAlchemy) with libpq 16.x: ~800 KB/min leak
- pycopg3 with libpq 15.x: Only 29 KB/min (28x better)
Here's our minimal reproducer:
#!/usr/bin/env python3
"""
Reproduces ~800 KB/min memory leak in psycopg2 2.9.10 with libpq 16.x
MUST monitor with 'ps aux', NOT Python's resource module!
"""
import datetime
import os
import subprocess
from time import sleep
import psycopg2
print(f"psycopg2: {psycopg2.__version__}")
print(f"PID: {os.getpid()}")
print("Monitor with: watch -n 10 'ps -p <PID> -o pid,rss,vsz,cmd'")
print()
# Verify runtime libpq version
import ctypes
so_path = os.path.dirname(psycopg2.__file__) + "/_psycopg.cpython-311-x86_64-linux-gnu.so"
result = subprocess.run(["ldd", so_path], capture_output=True, text=True)
for line in result.stdout.split('\n'):
if 'libpq.so.5 =>' in line:
libpq_path = line.split('=>')[1].split('(')[0].strip()
libpq = ctypes.CDLL(libpq_path)
PQlibVersion = libpq.PQlibVersion
PQlibVersion.restype = ctypes.c_int
runtime_ver = PQlibVersion()
major = runtime_ver // 10000
minor = (runtime_ver % 10000) // 100
patch = runtime_ver % 100
print(f"libpq runtime: {major}.{minor}.{patch}")
if major < 16:
print("WARNING: This reproducer requires libpq 16.x to demonstrate the leak")
break
# Connection string - adjust for your database
conn_string = "host=your-host dbname=your-database"
# Create test table
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS leak_test (
id SERIAL PRIMARY KEY,
updated_at TIMESTAMP WITH TIME ZONE
)
""")
cursor.execute("INSERT INTO leak_test (updated_at) VALUES (NOW()) ON CONFLICT DO NOTHING")
conn.commit()
conn.close()
# Monitor RSS using ps aux (Python's resource module won't show the leak!)
def get_rss_kb():
"""Get RSS from ps aux - DO NOT use resource.getrusage()!"""
pid = os.getpid()
result = subprocess.run(["ps", "aux"], capture_output=True, text=True)
for line in result.stdout.split('\n'):
if str(pid) in line and 'python' in line:
return int(line.split()[5])
return None
print("Starting 10-minute test...")
print(f"{'Time(s)':<8} | {'Iteration':<10} | {'RSS(KB)':<10} | {'Growth(KB)':<10}")
print("-" * 55)
start_time = datetime.datetime.now()
baseline_rss = None
for i in range(1, 601):
# Create connection, execute query, commit, close
# This pattern mimics SQLAlchemy NullPool or any connection-per-request pattern
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("UPDATE leak_test SET updated_at = %s WHERE id = 1",
(datetime.datetime.now(datetime.timezone.utc),))
conn.commit()
cursor.close()
conn.close()
if i % 30 == 0:
rss = get_rss_kb()
if rss:
if i == 30:
baseline_rss = rss
growth = 0
else:
growth = rss - baseline_rss
elapsed = (datetime.datetime.now() - start_time).total_seconds()
print(f"{int(elapsed):<8} | {i:<10} | {rss:<10} | {growth:<10}", flush=True)
sleep(1)
final_rss = get_rss_kb()
total_elapsed = (datetime.datetime.now() - start_time).total_seconds()
growth_total = final_rss - baseline_rss
leak_rate = growth_total / (total_elapsed / 60)
print()
print("=" * 60)
print(f"Baseline RSS: {baseline_rss} KB")
print(f"Final RSS: {final_rss} KB")
print(f"Total Growth: {growth_total} KB")
print(f"Leak Rate: {leak_rate:.1f} KB/min")
print("=" * 60)
Results with libpq 16.x (runtime):
Baseline: ~22,000 KB
Final: ~32,000 KB
Growth: ~10,000 KB in 10 minutes
Leak Rate: ~800 KB/min
Pattern: Linear, continuous growth
Results with libpq 13.x (runtime):
Baseline: ~29,000 KB
Final: ~29,200 KB
Growth: ~200 KB (stabilizes)
Leak Rate: ~17 KB/min
Pattern: Small initial growth, then flat
We conducted tests with different configurations:
libpq SQLAlchemy Connection Config Leak Rate
(runtime) (KB/min)
---------------------------------------------------------------------------
16.0.7 Yes (NullPool) Default 803
16.0.7 NO Default 801
16.0.7 NO gssencmode=disable 858
16.0.7 NO sslmode=require gssencmode=disable 861
16.0.4 Yes (NullPool) Default 797
16.0.4 Yes (NullPool) Default (PG 16.7 server) 806
13.0.11 Yes (NullPool) Default 14
13.0.11 NO Default 17
Conclusion: The leak is consistent at ~800 KB/min with libpq 16.x regardless of configuration.
Comparison with psycopg3 3.2.9 (with libpq 15.0.15):
- Same connection pattern (create/execute/commit/close in loop)
- Leak rate: 29 KB/min (28x better)
- Pattern: Stabilizes after initial growth
This strongly suggests the issue is specific to psycopg2's interaction with libpq 16.x.
Note for monitoring purposes:
The leak is NOT visible using Python's resource module!
This indicates the leak is in memory allocated by C code (libpq via psycopg2) that Python's resource module doesn't track.
Workarounds:
- Use psycopg3
- Use libpq 13.x
- Reuse connections (i.e., use connection pooling)
Questions:
- What changed in psycopg2's usage of libpq that could cause memory accumulation with libpq 16.x?
- Does psycopg2 2.9.10 properly free all libpq resources on connection close when using libpq 16.x?
- Prepared statements: Does psycopg2 handle libpq 16.x's prepared statement caching differently?
- Why doesn't psycopg3 leak? psycopg3 with libpq 15.x shows minimal leak. What's different in psycopg3's design?
- Is this a known issue?
- Is this already fixed in a newer psycopg2 version? (2.9.10 is latest we could test)
- Should we report to libpq? Tom Lane (PostgreSQL maintainer) tested pure C and saw no leak, suggesting it's psycopg2-specific. Do you agree?
All tests use this pattern:
for i in range(600): # 10 minutes
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("UPDATE table SET column = %s WHERE id = 1", (value,))
conn.commit()
cursor.close()
conn.close()
sleep(1)
We monitored externally via ps aux RSS column.
Example output with libpq 16.0.7:
Time | RSS (KB) | Growth
36s | 21,912 | baseline
150s | 23,552 | +1,640 KB
301s | 25,664 | +3,752 KB
452s | 27,780 | +5,868 KB
603s | 29,860 | +7,948 KB
755s | 31,956 | +10,044 KB
Leak rate: 801.4 KB/min
Pattern: Linear, continuous, no stabilization
We tested multiple connection configurations with libpq 16.x to isolate the issue:
- Default connection (SSL negotiated): 801 KB/min
- gssencmode=disable (no GSSAPI encryption): 858 KB/min
- sslmode=require gssencmode=disable (SSL only): 861 KB/min
All leak similarly, ruling out encryption as the specific cause.
Given that PostgreSQL maintainer Tom Lane tested a simple C program with libpq 16.x running a tight loop of Connect, BEGIN/UPDATE/COMMIT, Disconnect and got zero leakage, the issue seems to be not in libpq itself, but in how psycopg2 uses it.
Let me know if I can provide any other debugging information.
Thank you for maintaining psycopg2!