Vector Databases

A vector DB stores embeddings and supports approximate nearest-neighbor (ANN) search. The space has fragmented; choosing one matters less than picking any serviceable one and writing portable code around it.

What a vector DB does

Three core operations:

  • Index vectors with metadata.
  • Query for nearest neighbors of a given vector.
  • Filter by metadata (e.g. only docs from 2025+).

Most also support: hybrid search (dense + sparse), full-text search, multi-tenancy, replication, snapshots.

ANN algorithms underneath

  • HNSW (Hierarchical Navigable Small World): graph-based. Default in most modern DBs. Fast, accurate, memory-hungry.
  • IVF (Inverted File Index): partition vector space into Voronoi cells; search only relevant cells. Good for larger-than-memory.
  • PQ (Product Quantization): compress vectors into small codes. Used to reduce memory.
  • DiskANN: disk-resident graph index, designed for billion-scale on a single node.

In practice, most DBs use HNSW or HNSW + PQ. For most apps the algorithm is invisible — just tune ef_construction, ef_search, M parameters when needed.

The landscape (early 2026)

Postgres extensions

pgvector

  • Postgres extension. Default choice for most teams.
  • Supports HNSW + IVF.
  • Limitations: ~2k-dim cap, single-node, slower than dedicated DBs at huge scale.
  • Wins because: same database as your app data, familiar tooling, transactions, joins.

pgvectorscale (Timescale)

  • Builds on pgvector with optimized index structures, better performance.

Dedicated vector DBs

Qdrant

  • Rust, self-hostable or managed.
  • Strong filtering, payload schemas, hybrid search.
  • Solid choice for self-hosted production.

Weaviate

  • Multimodal-first; good at combining text + image search.
  • GraphQL API; modular ML modules.

Milvus / Zilliz

  • Distributed, very large-scale (billion-vector).
  • More operational overhead than the simpler options.

LanceDB

  • Embedded (file-based), like SQLite for vectors.
  • Lightweight; good for prototypes and edge.

ChromaDB

  • Embedded, very simple Python API.
  • Great for hackathons, getting started.
  • Less battle-tested in production.

Managed services

Pinecone

  • Managed-only, easy. Good if you want zero infra.
  • Pricing scales with vector count and queries; can get expensive.
  • Vector search on top of MongoDB. Integrated with broader Atlas stack.

Elasticsearch / OpenSearch

  • Mature search engines with added vector capabilities. Strong hybrid search.

Vespa

  • Yahoo-origin; battle-tested at huge scale; complex to operate.

”I just want it to work”

For most teams, in 2026:

  1. Already on Postgres: use pgvector. Done.
  2. Want managed, simple: Pinecone or LanceDB Cloud.
  3. Need self-hosted, not on Postgres: Qdrant.
  4. Hackathon/prototype: ChromaDB or LanceDB.
  5. Billion-scale or multi-modal critical: Milvus, Vespa, or Weaviate depending on tradeoffs.

Decision factors

Scale

  • < 1M vectors: anything works.
  • 1M–100M: most DBs fine; configure HNSW carefully.
  • 100M–1B: dedicated DBs (Milvus, Vespa, managed Pinecone), or pgvector with sharding/replication.
  • 1B: only a few real options; expect serious infra investment.

Filtering

If you need rich metadata filtering (“docs from 2025, owned by user X, type=invoice, embedding cluster Y”), check that the DB supports it efficiently. Some DBs filter post-retrieval (slow); others integrate filters into the index (fast).

Some DBs natively support BM25 + dense (hybrid-search-and-reranking.md). Others require you to run two separate searches and combine.

Native: Weaviate, Vespa, Elasticsearch, Qdrant (since 1.10). Requires plumbing: ChromaDB, LanceDB, pgvector + Postgres FTS.

Operations

  • Backups and snapshots: production-grade DBs offer them.
  • Multi-tenancy: per-collection isolation, namespaces, or row-level security.
  • Updates / deletes: vectors that change should be efficiently re-indexable.
  • Replication and HA: depends on your tier.

Cost

Pricing models vary wildly:

  • pgvector: cost of your Postgres.
  • Pinecone: per pod-hour or serverless per query.
  • Open-source self-hosted: your infrastructure.

For 10M vectors at typical traffic, costs range $50/month (pgvector on a small box) to $500–$5000/month (managed services).

Index parameter tuning

For HNSW (most common):

  • M (graph connections per node): 16–64. Higher = better recall, more memory.
  • ef_construction: 100–500. Higher = better build quality, slower indexing.
  • ef_search: 50–500 at query time. Higher = better recall, slower queries.

Start with defaults; tune ef_search for the recall/latency tradeoff you want.

Loading data efficiently

For initial ingest of millions of vectors:

  • Batch inserts (1k–10k vectors per call).
  • Disable index updates during bulk load if possible.
  • Build the index after loading.
  • Parallelize across multiple workers.

Code patterns

A portable retrieval interface:

from typing import Protocol

class VectorStore(Protocol):
    def upsert(self, ids: list[str], vectors: list[list[float]], metadata: list[dict]) -> None: ...
    def query(self, vector: list[float], k: int, filter: dict | None = None) -> list[dict]: ...
    def delete(self, ids: list[str]) -> None: ...

Implement this for whichever DB you start with; switch later if needed. The hard parts (chunking, embedding, prompt assembly) shouldn’t depend on the DB choice.

Pitfalls

  • Mixing dimension counts in a single collection — some DBs error, some return garbage.
  • Forgetting to normalize vectors when using inner-product similarity — results look weird.
  • Filter applied after vector search when you needed it inside the index — slow queries.
  • Re-indexing during peak traffic — most DBs degrade during build.
  • No backup strategy — losing your index means re-embedding terabytes.
  • Treating it as a database for everything — vectors yes, document content sometimes, transactional data no.

Re-embedding strategy

When your embedding model upgrades:

  1. Provision a parallel index with the new vectors.
  2. Migrate clients to the new index.
  3. Delete the old.

Don’t try to mix vectors from different models in one index; quality breaks.

See also