Performance Issue: N+1 Query Pattern in /api/v1/cognify endpoint
Environment
- Cognee version:
main (Docker image cognee/cognee:main)
- Database: PostgreSQL with pgvector
- Python: 3.12.13
Problem
The /api/v1/cognify endpoint triggers slow database queries due to an N+1 query pattern. Each data item triggers a separate query instead of batch loading.
Slow Query (from Sentry traces)
SELECT data_1.id AS data_1_id,
datasets.id AS datasets_id,
datasets.name AS datasets_name,
datasets.created_at AS datasets_created_at,
datasets.updated_at AS datasets_updated_at,
datasets.owner_id AS datasets_owner_id,
datasets.tenant_id AS datasets_tenant_id
FROM data AS data_1
JOIN dataset_data AS dataset_data_1 ON data_1.id = dataset_data_1.data_id
JOIN datasets ON datasets.id = dataset_data_1.dataset_id
WHERE data_1.id IN ($1::UUID)
Impact
- 281 occurrences over 5 days
- Query executed per individual UUID instead of batched
- Causes noticeable latency on cognify operations
Expected Behavior
The query should batch multiple UUIDs: WHERE data_1.id IN ($1::UUID, $2::UUID, $3::UUID, ...)
Suggested Fix
Use SQLAlchemy selectinload() or joinedload() instead of lazy loading when fetching data-dataset relationships.
Workaround
Add composite index to improve JOIN performance:
CREATE INDEX CONCURRENTLY idx_dataset_data_composite
ON dataset_data(data_id, dataset_id);
Performance Issue: N+1 Query Pattern in /api/v1/cognify endpoint
Environment
main(Docker imagecognee/cognee:main)Problem
The
/api/v1/cognifyendpoint triggers slow database queries due to an N+1 query pattern. Each data item triggers a separate query instead of batch loading.Slow Query (from Sentry traces)
Impact
Expected Behavior
The query should batch multiple UUIDs:
WHERE data_1.id IN ($1::UUID, $2::UUID, $3::UUID, ...)Suggested Fix
Use SQLAlchemy
selectinload()orjoinedload()instead of lazy loading when fetching data-dataset relationships.Workaround
Add composite index to improve JOIN performance: