Hybrid Search Architecture: Combining Keyword and Vector Retrieval

Hybrid search architecture combines two retrieval paths inside one search system: a keyword path for exact lexical matching and a vector path for semantic similarity. The architecture works best when these paths are designed as separate components that share metadata, filters, scoring, and observability.

The goal is not simply to run two searches. The goal is to build a retrieval pipeline where keyword indexes, vector indexes, filter indexes, fusion logic, and optional reranking work together without turning the system into a black box.

The Architecture at a High Level

A practical hybrid search architecture has five main layers:

content ingestion
→ indexing layer
→ query planning
→ parallel retrieval
→ fusion, reranking, and response assembly

Each layer has a specific responsibility. Ingestion prepares content and metadata. Indexing creates the keyword and vector lookup structures. Query planning decides filters and weights. Retrieval collects candidates. Fusion and reranking decide the final order.

Layer 1: Content Ingestion

Hybrid search starts before query time. At ingestion, the system needs to prepare content for both keyword search and vector search.

For each document or chunk, the ingestion pipeline usually stores:

  • the searchable text
  • the embedding vector
  • metadata fields for filters
  • source and citation fields
  • version, timestamp, or freshness fields

This matters because hybrid search depends on both representations. A document that is only embedded but not keyword-indexed cannot fully participate in keyword retrieval. A document that is keyword-indexed but not embedded cannot fully participate in semantic retrieval.

Layer 2: Keyword Index

The keyword index supports exact and lexical matching. In many systems, this is an inverted index. It maps terms to the documents or chunks that contain them.

term → matching document IDs

This path is important for product names, error codes, API parameters, citations, field names, legal terms, and other exact phrases. BM25-style scoring can rank matches based on term frequency, rarity, and document length.

Architecturally, the keyword index should know which fields are searchable. Titles, headings, exact-term fields, and body text may deserve different weights.

Layer 3: Vector Index

The vector index supports semantic retrieval. It stores embeddings and returns nearby vectors for a query embedding. This path handles conceptual similarity, paraphrases, and meaning-based matches.

query embedding → nearest document embeddings

The vector index depends on the embedding model, chunking strategy, distance metric, and index type. If the embeddings are weak or chunks are poorly designed, the architecture cannot fix retrieval quality only with fusion.

In a hybrid system, the vector index should be treated as one retrieval signal, not as the entire search engine.

Layer 4: Filter Indexes and Metadata

Most production hybrid search systems need metadata filters. Filters decide which objects are eligible before results are ranked or returned.

Common filters include:

  • tenant or workspace
  • status such as published or active
  • permission groups and roles
  • source system
  • language
  • product version
  • date or freshness window

The architecture should apply these filters consistently to both keyword and vector paths. Otherwise, one branch may retrieve candidates that the other branch would never be allowed to return.

Layer 5: Query Planning

At query time, the system should decide how the request will be executed. Query planning can be simple or advanced, but it usually answers these questions:

  • Which filters are mandatory?
  • Which fields should keyword search use?
  • Which vector space should be searched?
  • How many candidates should each branch retrieve?
  • How much weight should keyword and vector signals receive?
  • Should a reranker run after fusion?

A mature hybrid architecture may use different plans for different query types. A query with an error code may lean more toward keyword retrieval. A broad natural-language question may lean more toward vector retrieval. A general support query may use a balanced plan.

Layer 6: Parallel Candidate Generation

Hybrid retrieval usually creates two candidate lists:

keyword branch → candidates ranked by lexical relevance
vector branch → candidates ranked by semantic similarity

Some candidates appear in both lists. Others appear in only one. A strong exact match may come from the keyword branch. A useful paraphrase may come from the vector branch. A highly relevant document may appear in both.

The architecture should keep enough candidates from each branch so fusion has room to work. If each branch retrieves too few candidates, good results may never reach the fusion stage.

Layer 7: Score Normalization and Fusion

Keyword scores and vector scores are not naturally comparable. A BM25 score and a vector distance do not mean the same thing. Fusion is the layer that turns these separate signals into one ranking.

Common fusion strategies include:

Fusion approachArchitecture ideaBest when
Rank-based fusionCombine each candidate’s position in both lists.Raw scores are hard to compare.
Relative score fusionNormalize each branch’s scores and combine them.Score gaps carry useful information.
Weighted fusionGive one branch more influence than the other.Query type or domain favors one signal.

This layer is where architecture becomes retrieval behavior. If fusion overweights keywords, exact matches dominate. If it overweights vectors, semantic matches dominate. Balanced fusion can support both, but it still needs evaluation.

Layer 8: Optional Reranking

Fusion produces a useful first ranking, but it may not be the final stage. Many RAG and enterprise search systems add a reranker after hybrid retrieval.

hybrid retrieval → top 50 candidates
reranker → top 10 final passages
RAG answer or search result page

Reranking is useful when the first-stage retriever has good recall but imperfect ordering. It is usually more expensive, so it should run on a limited candidate set rather than the entire corpus.

Architecture for RAG

In RAG, hybrid search architecture should be designed around context quality, not just search-result clicks.

user question
→ trusted filters
→ hybrid retrieval
→ reranking or thresholding
→ context packing
→ generation
→ citations and answer validation

The retriever should return chunks that are relevant, allowed, fresh, and citeable. Hybrid search helps with relevance, but the architecture still needs metadata filters, chunk quality, source tracking, and context-window discipline.

Observability and Debugging

A hybrid search architecture should expose enough information to debug results. Otherwise, teams cannot tell whether a failure came from keyword retrieval, vector retrieval, filters, fusion, reranking, or stale indexing.

Useful logs and diagnostics include:

  • query text
  • applied filters
  • keyword branch candidates
  • vector branch candidates
  • fusion scores
  • reranker scores
  • final context passed to the language model

Good observability turns hybrid search from a mysterious ranking system into a tunable retrieval pipeline.

Implementation Example: Weaviate

Weaviate is a useful implementation example because it supports vector indexes, inverted indexes for keyword and filter behavior, hybrid queries, metadata filters, alpha weighting, fusion strategies, and score metadata.

from weaviate.classes.query import Filter, HybridFusion, MetadataQuery

collection = client.collections.use("KnowledgeChunks")

response = collection.query.hybrid(
    query="metadata filtering architecture for RAG",
    alpha=0.5,
    fusion_type=HybridFusion.RELATIVE_SCORE,
    query_properties=["title^2", "chunk_text", "technical_terms^3"],
    limit=20,
    return_metadata=MetadataQuery(score=True, explain_score=True),
    filters=(
        Filter.by_property("status").equal("published") &
        Filter.by_property("source_type").equal("knowledge_base")
    )
)

for obj in response.objects:
    print(obj.properties)
    print(obj.metadata.score)
    print(obj.metadata.explain_score)

This example shows the architectural pieces working together: keyword-searchable fields, vector retrieval, filters, score fusion, weighting, and score inspection. A production RAG system could then pass the top results to a reranker or context-packing step.

Common Architecture Mistakes

  • Indexing text for vectors but forgetting keyword-searchable fields.
  • Applying filters to only one retrieval branch.
  • Retrieving too few candidates before fusion.
  • Using one alpha or weighting setting for every query type without evaluation.
  • Adding reranking before verifying candidate recall.
  • Logging final results but not branch-level retrieval behavior.
  • Assuming hybrid search fixes poor chunking or stale indexes.

Best Practices

  1. Design ingestion so each object supports both keyword and vector retrieval.
  2. Keep exact-term fields searchable and boostable where possible.
  3. Use metadata filters as shared eligibility constraints across retrieval paths.
  4. Separate candidate generation, fusion, reranking, and context packing as clear stages.
  5. Track branch-level scores so failures can be diagnosed.
  6. Tune keyword/vector balance with real query groups, not single examples.
  7. Evaluate retrieval quality before evaluating generated answers.

Summary

Hybrid search architecture combines keyword and vector retrieval by giving each retrieval method its own index and candidate path, then joining the signals through filters, score fusion, and optional reranking. Keyword retrieval protects exact terms. Vector retrieval captures semantic similarity. Fusion turns both into a single ranking.

The strongest architecture treats hybrid search as a pipeline, not a switch. Ingestion, indexing, filters, query planning, fusion, observability, and reranking all need to be designed together for reliable semantic search and RAG.