Vector Databases
A vector DB stores embeddings and supports approximate nearest-neighbor (ANN) search. The space has fragmented; choosing one matters less than picking any serviceable one and writing portable code around it.
What a vector DB does
Three core operations:
- Index vectors with metadata.
- Query for nearest neighbors of a given vector.
- Filter by metadata (e.g. only docs from 2025+).
Most also support: hybrid search (dense + sparse), full-text search, multi-tenancy, replication, snapshots.
ANN algorithms underneath
- HNSW (Hierarchical Navigable Small World): graph-based. Default in most modern DBs. Fast, accurate, memory-hungry.
- IVF (Inverted File Index): partition vector space into Voronoi cells; search only relevant cells. Good for larger-than-memory.
- PQ (Product Quantization): compress vectors into small codes. Used to reduce memory.
- DiskANN: disk-resident graph index, designed for billion-scale on a single node.
In practice, most DBs use HNSW or HNSW + PQ. For most apps the algorithm is invisible — just tune ef_construction, ef_search, M parameters when needed.
The landscape (early 2026)
Postgres extensions
pgvector
- Postgres extension. Default choice for most teams.
- Supports HNSW + IVF.
- Limitations: ~2k-dim cap, single-node, slower than dedicated DBs at huge scale.
- Wins because: same database as your app data, familiar tooling, transactions, joins.
pgvectorscale (Timescale)
- Builds on pgvector with optimized index structures, better performance.
Dedicated vector DBs
Qdrant
- Rust, self-hostable or managed.
- Strong filtering, payload schemas, hybrid search.
- Solid choice for self-hosted production.
Weaviate
- Multimodal-first; good at combining text + image search.
- GraphQL API; modular ML modules.
Milvus / Zilliz
- Distributed, very large-scale (billion-vector).
- More operational overhead than the simpler options.
LanceDB
- Embedded (file-based), like SQLite for vectors.
- Lightweight; good for prototypes and edge.
ChromaDB
- Embedded, very simple Python API.
- Great for hackathons, getting started.
- Less battle-tested in production.
Managed services
Pinecone
- Managed-only, easy. Good if you want zero infra.
- Pricing scales with vector count and queries; can get expensive.
MongoDB Atlas Vector Search
- Vector search on top of MongoDB. Integrated with broader Atlas stack.
Elasticsearch / OpenSearch
- Mature search engines with added vector capabilities. Strong hybrid search.
Vespa
- Yahoo-origin; battle-tested at huge scale; complex to operate.
”I just want it to work”
For most teams, in 2026:
- Already on Postgres: use
pgvector. Done. - Want managed, simple: Pinecone or LanceDB Cloud.
- Need self-hosted, not on Postgres: Qdrant.
- Hackathon/prototype: ChromaDB or LanceDB.
- Billion-scale or multi-modal critical: Milvus, Vespa, or Weaviate depending on tradeoffs.
Decision factors
Scale
- < 1M vectors: anything works.
- 1M–100M: most DBs fine; configure HNSW carefully.
- 100M–1B: dedicated DBs (Milvus, Vespa, managed Pinecone), or pgvector with sharding/replication.
-
1B: only a few real options; expect serious infra investment.
Filtering
If you need rich metadata filtering (“docs from 2025, owned by user X, type=invoice, embedding cluster Y”), check that the DB supports it efficiently. Some DBs filter post-retrieval (slow); others integrate filters into the index (fast).
Hybrid search
Some DBs natively support BM25 + dense (hybrid-search-and-reranking.md). Others require you to run two separate searches and combine.
Native: Weaviate, Vespa, Elasticsearch, Qdrant (since 1.10). Requires plumbing: ChromaDB, LanceDB, pgvector + Postgres FTS.
Operations
- Backups and snapshots: production-grade DBs offer them.
- Multi-tenancy: per-collection isolation, namespaces, or row-level security.
- Updates / deletes: vectors that change should be efficiently re-indexable.
- Replication and HA: depends on your tier.
Cost
Pricing models vary wildly:
- pgvector: cost of your Postgres.
- Pinecone: per pod-hour or serverless per query.
- Open-source self-hosted: your infrastructure.
For 10M vectors at typical traffic, costs range $50/month (pgvector on a small box) to $500–$5000/month (managed services).
Index parameter tuning
For HNSW (most common):
M(graph connections per node): 16–64. Higher = better recall, more memory.ef_construction: 100–500. Higher = better build quality, slower indexing.ef_search: 50–500 at query time. Higher = better recall, slower queries.
Start with defaults; tune ef_search for the recall/latency tradeoff you want.
Loading data efficiently
For initial ingest of millions of vectors:
- Batch inserts (1k–10k vectors per call).
- Disable index updates during bulk load if possible.
- Build the index after loading.
- Parallelize across multiple workers.
Code patterns
A portable retrieval interface:
from typing import Protocol
class VectorStore(Protocol):
def upsert(self, ids: list[str], vectors: list[list[float]], metadata: list[dict]) -> None: ...
def query(self, vector: list[float], k: int, filter: dict | None = None) -> list[dict]: ...
def delete(self, ids: list[str]) -> None: ...
Implement this for whichever DB you start with; switch later if needed. The hard parts (chunking, embedding, prompt assembly) shouldn’t depend on the DB choice.
Pitfalls
- Mixing dimension counts in a single collection — some DBs error, some return garbage.
- Forgetting to normalize vectors when using inner-product similarity — results look weird.
- Filter applied after vector search when you needed it inside the index — slow queries.
- Re-indexing during peak traffic — most DBs degrade during build.
- No backup strategy — losing your index means re-embedding terabytes.
- Treating it as a database for everything — vectors yes, document content sometimes, transactional data no.
Re-embedding strategy
When your embedding model upgrades:
- Provision a parallel index with the new vectors.
- Migrate clients to the new index.
- Delete the old.
Don’t try to mix vectors from different models in one index; quality breaks.