Dual-Index Migration Pattern for Embedding Model Changes

The dual-index migration pattern is a safe way to change embedding models without taking search offline. Instead of replacing vectors in place, you keep the current production index running while building a second index with the new embedding model. After backfill, evaluation, and shadow testing, traffic is switched to the new index with a rollback path still available.

This pattern is useful because embedding model changes are not ordinary configuration changes. A new model creates a new vector space. Queries embedded by the new model should search documents embedded by the same model, not a partially updated mix of old and new vectors.

The Basic Idea

A dual-index migration has two live search targets for a period of time:

  • Index A: the current production index using the old embedding model.
  • Index B: the new candidate index using the new embedding model.

Index A keeps serving users. Index B is populated in the background, tested, and compared. When Index B is ready, production routing moves from A to B. Index A stays available for rollback until the migration is stable.

When to Use This Pattern

Use the dual-index pattern when a change affects the vector space or retrieval behavior enough that in-place updates are risky.

Good candidates include:

  • Changing embedding model provider or model version.
  • Changing vector dimension or distance metric.
  • Changing vectorizer configuration that cannot be mutated safely.
  • Changing chunking strategy at the same time as embeddings.
  • Moving from generic embeddings to domain-specific embeddings.
  • Rebuilding a production RAG index with a new retrieval strategy.

For small prototypes, an in-place rebuild may be acceptable. For production search or RAG, a separate candidate index is usually safer.

Architecture Components

A dual-index migration needs a few explicit components:

  • Stable route: an alias, feature flag, routing table, or config value used by the application.
  • Old index: the current production target.
  • New index: the candidate target built with the new model.
  • Backfill job: the process that copies source records and generates new embeddings.
  • Change sync: a way to keep new writes, updates, and deletes aligned during migration.
  • Evaluation set: representative queries and expected results.
  • Rollback plan: a tested path back to the old index.

The stable route is important. Application code should not need to know whether the old or new physical index is serving traffic.

Step 1: Freeze the Current Production Definition

Before creating the new index, document the current production setup. Record the embedding model, vector dimension, distance metric, chunking strategy, metadata schema, filters, hybrid search settings, reranker settings, and top-k values.

This snapshot gives you a baseline. If search quality changes, you need to know exactly what the old system did.

Step 2: Create the New Index

Create a second index, collection, namespace, or vector space for the new embedding model. Keep the old index untouched.

The new index should receive the same source IDs, chunk IDs, metadata, permissions, and source references where possible. If the chunking strategy changes, store a mapping back to the parent source document so results remain traceable.

In systems such as Weaviate, one production-friendly option is to create a new collection and later switch a collection alias to point to it. Another option for experimentation is to use named vectors, where the same object can hold more than one vector representation. The right choice depends on cleanup needs, storage cost, and rollback requirements.

Step 3: Backfill Existing Content

Backfill copies existing source content into the new index and generates embeddings with the new model. This job should be restartable, observable, and rate-limited.

Track:

  • Total source documents and chunks expected.
  • Objects successfully indexed.
  • Failed objects and retry counts.
  • Embedding provider errors and rate limits.
  • Indexing throughput and estimated completion time.
  • Metadata and permission-copy failures.

Do not cut over while the new index is partially populated unless the application is deliberately scoped to a completed subset.

Step 4: Double-Write New Changes

If source content changes while the backfill is running, the new index can fall behind. The migration needs a change sync plan.

Common approaches include:

  • Double-write: new creates, updates, and deletes are written to both indexes.
  • Change log replay: all changes are recorded and replayed into the new index.
  • Incremental sync: a job repeatedly indexes records changed since a checkpoint.
  • Write pause: acceptable only for small systems or scheduled maintenance windows.

Double-writing is often simplest conceptually, but it must be monitored. A failed write to the new index should not silently create divergence.

Step 5: Evaluate Before Shadowing

Before using live query traffic, run offline evaluation. Compare the old and new indexes on a representative query set.

Measure retrieval quality with metrics such as precision@k, recall@k, MRR, and nDCG@k. For RAG, check whether the retrieved chunks contain enough evidence for grounded answers.

Segment results by query type. The new model may improve broad semantic queries while hurting exact identifiers, product names, legal terms, code symbols, or support error messages.

Step 6: Shadow Production Queries

Shadow testing means production users still receive results from the old index, while the same queries are also sent to the new index in the background. The application logs both result sets for comparison.

Shadow testing helps catch real-world query patterns that offline evaluation missed. It can reveal latency spikes, filter mismatches, missing permissions, ranking shifts, and RAG context changes.

During shadowing, do not merge results from both indexes unless that is an intentional product design. The purpose is comparison, not user-facing blending.

Step 7: Cut Over With a Small Routing Change

The cutover should be a small routing update. Examples include switching an alias, changing a feature flag, updating a search target in configuration, or promoting an index generation in a control table.

In Weaviate, collection aliases are one implementation of this idea: application code queries a stable alias, and the alias target can move from the old collection to the new collection. If a problem appears, the alias can be switched back.

Avoid cutovers that require many application code changes. The more places you touch, the harder rollback becomes.

Step 8: Monitor and Keep Rollback

After cutover, monitor both technical and relevance signals:

  • Query latency and error rate.
  • Empty-result rate.
  • Top result click-through or success signals.
  • User feedback and support complaints.
  • RAG citation quality and answer grounding.
  • Exact-match query behavior.
  • Filter and permission correctness.

Keep the old index available until the rollback window closes. Deleting it immediately removes the easiest recovery path.

Rollback Plan

A rollback should be fast and already tested. Ideally, it is the same mechanism as cutover in reverse: point the alias, route, or feature flag back to Index A.

Before migration day, confirm:

  • The old index is still complete and queryable.
  • The application can route back without redeploying large code changes.
  • Writes during the migration can be reconciled if needed.
  • Monitoring can confirm rollback success.

If rollback requires re-embedding the entire corpus again, it is not a practical rollback plan.

RAG-Specific Concerns

For RAG systems, dual-index migration changes the evidence sent to the language model. The new index may retrieve different chunks even when the final answer appears similar.

Evaluate whether the new index retrieves directly quotable evidence, preserves citations, respects permissions, and avoids semantically related but unsupported context. Log index generation with every answer so you can trace which retrieval generation produced a response.

Dual Index vs Named Vectors

Named vectors can support multiple vector spaces on the same object. This is useful for experiments and side-by-side comparisons. However, separate indexes or collections are often cleaner for production migrations because they isolate configuration, simplify rollback, and make cleanup easier.

A practical rule: use named vectors when you need temporary comparison or permanent multi-vector retrieval. Use a dual physical index or alias-based migration when you need a clean production replacement.

Common Mistakes

The first mistake is skipping change sync. If the old index receives updates while the new index is backfilling, the two targets diverge.

The second mistake is shadowing before the new index is complete enough to compare fairly.

The third mistake is treating cutover as the end of the migration. The rollback window and monitoring period are part of the migration.

The fourth mistake is deleting the old index before validating RAG answer quality, exact-match behavior, and permissions.

Practical Summary

The dual-index migration pattern keeps the old embedding index in production while a new embedding index is built, synced, evaluated, shadowed, and promoted. It avoids mixing incompatible vector spaces and gives the team a fast rollback path.

For embedding model changes in production vector search or RAG systems, this pattern is one of the safest ways to move carefully: build beside production, compare honestly, cut over with a small route change, and clean up only after the new index has proven itself.