Vector Databases

A vector DB stores embeddings and supports approximate nearest-neighbor (ANN) search. The space has fragmented; choosing one matters less than picking any serviceable one and writing portable code around it.

What a vector DB does

Three core operations:

Index vectors with metadata.
Query for nearest neighbors of a given vector.
Filter by metadata (e.g. only docs from 2025+).

Most also support: hybrid search (dense + sparse), full-text search, multi-tenancy, replication, snapshots.

ANN algorithms underneath

HNSW (Hierarchical Navigable Small World): graph-based. Default in most modern DBs. Fast, accurate, memory-hungry.
IVF (Inverted File Index): partition vector space into Voronoi cells; search only relevant cells. Good for larger-than-memory.
PQ (Product Quantization): compress vectors into small codes. Used to reduce memory.
DiskANN: disk-resident graph index, designed for billion-scale on a single node.

In practice, most DBs use HNSW or HNSW + PQ. For most apps the algorithm is invisible — just tune ef_construction, ef_search, M parameters when needed.

The landscape (early 2026)

Postgres extensions

`pgvector`

Postgres extension. Default choice for most teams.
Supports HNSW + IVF.
Limitations: ~2k-dim cap, single-node, slower than dedicated DBs at huge scale.
Wins because: same database as your app data, familiar tooling, transactions, joins.

`pgvectorscale` (Timescale)

Builds on pgvector with optimized index structures, better performance.

Dedicated vector DBs

Qdrant

Rust, self-hostable or managed.
Strong filtering, payload schemas, hybrid search.
Solid choice for self-hosted production.

Weaviate

Multimodal-first; good at combining text + image search.
GraphQL API; modular ML modules.

Milvus / Zilliz

Distributed, very large-scale (billion-vector).
More operational overhead than the simpler options.

LanceDB

Embedded (file-based), like SQLite for vectors.
Lightweight; good for prototypes and edge.

ChromaDB

Embedded, very simple Python API.
Great for hackathons, getting started.
Less battle-tested in production.

Managed services

Pinecone

Managed-only, easy. Good if you want zero infra.
Pricing scales with vector count and queries; can get expensive.

MongoDB Atlas Vector Search

Vector search on top of MongoDB. Integrated with broader Atlas stack.

Elasticsearch / OpenSearch

Mature search engines with added vector capabilities. Strong hybrid search.

Vespa

Yahoo-origin; battle-tested at huge scale; complex to operate.

”I just want it to work”

For most teams, in 2026:

Already on Postgres: use pgvector. Done.
Want managed, simple: Pinecone or LanceDB Cloud.
Need self-hosted, not on Postgres: Qdrant.
Hackathon/prototype: ChromaDB or LanceDB.
Billion-scale or multi-modal critical: Milvus, Vespa, or Weaviate depending on tradeoffs.

Decision factors

Scale

< 1M vectors: anything works.
1M–100M: most DBs fine; configure HNSW carefully.
100M–1B: dedicated DBs (Milvus, Vespa, managed Pinecone), or pgvector with sharding/replication.
1B: only a few real options; expect serious infra investment.

Filtering

If you need rich metadata filtering (“docs from 2025, owned by user X, type=invoice, embedding cluster Y”), check that the DB supports it efficiently. Some DBs filter post-retrieval (slow); others integrate filters into the index (fast).

Hybrid search

Some DBs natively support BM25 + dense (hybrid-search-and-reranking.md). Others require you to run two separate searches and combine.

Native: Weaviate, Vespa, Elasticsearch, Qdrant (since 1.10). Requires plumbing: ChromaDB, LanceDB, pgvector + Postgres FTS.

Operations

Backups and snapshots: production-grade DBs offer them.
Multi-tenancy: per-collection isolation, namespaces, or row-level security.
Updates / deletes: vectors that change should be efficiently re-indexable.
Replication and HA: depends on your tier.

Cost

Pricing models vary wildly:

pgvector: cost of your Postgres.
Pinecone: per pod-hour or serverless per query.
Open-source self-hosted: your infrastructure.

For 10M vectors at typical traffic, costs range $50/month (pgvector on a small box) to $500–$5000/month (managed services).

Index parameter tuning

For HNSW (most common):

M (graph connections per node): 16–64. Higher = better recall, more memory.
ef_construction: 100–500. Higher = better build quality, slower indexing.
ef_search: 50–500 at query time. Higher = better recall, slower queries.

Start with defaults; tune ef_search for the recall/latency tradeoff you want.

Loading data efficiently

For initial ingest of millions of vectors:

Batch inserts (1k–10k vectors per call).
Disable index updates during bulk load if possible.
Build the index after loading.
Parallelize across multiple workers.

Code patterns

A portable retrieval interface:

from typing import Protocol

class VectorStore(Protocol):
    def upsert(self, ids: list[str], vectors: list[list[float]], metadata: list[dict]) -> None: ...
    def query(self, vector: list[float], k: int, filter: dict | None = None) -> list[dict]: ...
    def delete(self, ids: list[str]) -> None: ...

Implement this for whichever DB you start with; switch later if needed. The hard parts (chunking, embedding, prompt assembly) shouldn’t depend on the DB choice.

Pitfalls

Mixing dimension counts in a single collection — some DBs error, some return garbage.
Forgetting to normalize vectors when using inner-product similarity — results look weird.
Filter applied after vector search when you needed it inside the index — slow queries.
Re-indexing during peak traffic — most DBs degrade during build.
No backup strategy — losing your index means re-embedding terabytes.
Treating it as a database for everything — vectors yes, document content sometimes, transactional data no.

Re-embedding strategy

When your embedding model upgrades:

Provision a parallel index with the new vectors.
Migrate clients to the new index.
Delete the old.

Don’t try to mix vectors from different models in one index; quality breaks.