Skip to content

Performance: N+1 query pattern in /api/v1/cognify causes slow DB queries #2532

@DrifterOne

Description

@DrifterOne

Performance Issue: N+1 Query Pattern in /api/v1/cognify endpoint

Environment

  • Cognee version: main (Docker image cognee/cognee:main)
  • Database: PostgreSQL with pgvector
  • Python: 3.12.13

Problem

The /api/v1/cognify endpoint triggers slow database queries due to an N+1 query pattern. Each data item triggers a separate query instead of batch loading.

Slow Query (from Sentry traces)

SELECT data_1.id AS data_1_id, 
       datasets.id AS datasets_id, 
       datasets.name AS datasets_name, 
       datasets.created_at AS datasets_created_at, 
       datasets.updated_at AS datasets_updated_at, 
       datasets.owner_id AS datasets_owner_id, 
       datasets.tenant_id AS datasets_tenant_id 
FROM data AS data_1 
JOIN dataset_data AS dataset_data_1 ON data_1.id = dataset_data_1.data_id 
JOIN datasets ON datasets.id = dataset_data_1.dataset_id 
WHERE data_1.id IN ($1::UUID)

Impact

  • 281 occurrences over 5 days
  • Query executed per individual UUID instead of batched
  • Causes noticeable latency on cognify operations

Expected Behavior

The query should batch multiple UUIDs: WHERE data_1.id IN ($1::UUID, $2::UUID, $3::UUID, ...)

Suggested Fix

Use SQLAlchemy selectinload() or joinedload() instead of lazy loading when fetching data-dataset relationships.

Workaround

Add composite index to improve JOIN performance:

CREATE INDEX CONCURRENTLY idx_dataset_data_composite 
ON dataset_data(data_id, dataset_id);

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions