Vector Database Metadata Filtering Architecture Explained

Metadata filtering architecture decides how a vector search system combines semantic similarity with structured constraints. A query may ask for content similar to “pricing strategy,” but the system may also need to enforce region = "APAC", role = "manager", a date range, a tenant boundary, or a permission label.

This guide explains metadata filtering from beginner to advanced levels. It uses Weaviate as the concrete technology example because its filtering model shows the architecture clearly, but the concepts apply broadly to production vector search, semantic search, hybrid search, and RAG systems.

Beginner: What Is Metadata Filtering?

What Is a Filter?

A filter is a condition that narrows search results. Instead of searching every object, the system only returns objects that match specific criteria, such as region = "EMEA", role = "admin", or status = "published".

In vector search, filters are used alongside semantic similarity. The result should be similar to the query and also match the structured conditions.

What Is Metadata?

Metadata is data about your data. In a vector database, metadata usually includes object properties and system fields that describe, classify, or control access to an object.

Object properties, such as product, region, role, category, or status
System metadata, such as creation time, update time, and object ID
Array states, such as whether a tag list is empty or how many values it contains

Why Combine Filters with Vector Search?

A pure vector search finds the most semantically similar objects. But real applications usually need constraints.

Find documents similar to “pricing strategy,” but only for region = "APAC" and role = "manager".

Filters let the system enforce these requirements during retrieval. That is essential for permission-aware RAG, product search, multi-tenant applications, and enterprise search.

Intermediate: Post-Filtering vs Pre-Filtering

There are two common ways to combine filters with vector search: post-filtering and pre-filtering.

Approach	How it works	Problem or benefit
Post-filtering	Run vector search first, then remove non-matching results.	Restrictive filters can remove many top results and produce unstable or empty result sets.
Pre-filtering	Build an eligible object list first, then run vector search within that list.	Structured constraints shape retrieval before final results are selected.

Weaviate uses pre-filtering. This means the filter is resolved before vector search returns final results. Architecturally, this is important because filtering is not just a cleanup step after search. It participates in retrieval.

Intermediate: The Two-Index Architecture

A filter-aware vector search system needs two kinds of lookup working together: one for structured filtering and one for vector similarity.

┌─────────────────────────────────────┐
│          Vector Database Shard       │
│                                     │
│  ┌──────────────┐  ┌─────────────┐  │
│  │ Inverted     │  │ HNSW Vector │  │
│  │ Index        │  │ Index       │  │
│  │ (Filtering)  │  │ (Search)    │  │
│  └──────┬───────┘  └──────┬──────┘  │
│         │                 │         │
│         └──── allow-list ──┘         │
└─────────────────────────────────────┘

In Weaviate, each shard contains an inverted index and an HNSW vector index side by side.

The inverted index resolves the metadata predicate and builds an allow-list of matching object IDs.
The HNSW index performs vector search while using that allow-list to constrain which objects can be returned.

Objects outside the allow-list are not eligible as final results. This avoids relying on brute-force cleanup after retrieval.

Intermediate-Advanced: Inverted Index Types

Metadata filters do not all have the same shape. Equality filters, range filters, and keyword search need different index behavior. In Weaviate, individual inverted indexes can be created per property and per index type.

Index type	Purpose	Default	Typical data types
`indexFilterable`	Fast match-based filtering using a roaring bitmap style index.	`true`	Most property types except unsupported binary or special types
`indexSearchable`	Keyword search for BM25 and hybrid retrieval.	`true` for text fields	`text`, `text[]`
`indexRangeFilters`	Numeric and date range filtering.	`false`	`int`, `number`, `date`

When Match and Range Indexes Are Both Enabled

If both match-oriented and range-oriented indexes are enabled, the database can route different operators to the better index structure.

Operator	Preferred index
Equal / not equal	`indexFilterable`
Greater than / less than	`indexRangeFilters`

This matters because equality filters and range filters have different performance needs.

Intermediate-Advanced: Special Metadata Indexes

Some metadata filters require optional index settings. These should be enabled only when the application needs them.

from weaviate import classes as wvc

client.collections.create(
    name="KnowledgeAsset",
    inverted_index_config=wvc.config.Configure.inverted_index(
        index_timestamps=True,       # Filter by creation/update time
        index_null_state=True,       # Filter by null/non-null properties
        index_property_length=True   # Filter by array/string length
    )
)

These indexes add storage and indexing overhead. Use them when the query pattern is real: time filters, missing-value filters, or array/string length filters. Do not enable them only because they exist.

Advanced: Filter Strategies for HNSW Search

Filtered vector search is difficult when filters are restrictive. If only a small part of the graph matches the filter, the vector search algorithm may spend time around objects that cannot be returned.

Strategy 1: Sweeping

Graph traversal → check filter at each node → skip if not matching

In a sweeping strategy, graph traversal proceeds normally and candidates are checked against the filter as they are encountered. This can work, but it may waste effort when the filter has low correlation with the vector query.

For example, a query for “luxury handbags” with a filter like price < 50 may produce a low-correlation situation. The most semantically similar objects may be expensive, while the filter removes them. A naive traversal can spend too much time around objects that are not eligible.

Strategy 2: ACORN

ACORN stands for ANN Constraint-Optimized Retrieval Network. It is a filter-aware strategy for HNSW-based vector search.

ACORN improves filtered traversal in three important ways:

It avoids distance calculations on non-matching objects.
It uses multi-hop neighborhood evaluation to reach filtered regions faster.
It can seed additional matching entry points so traversal is not trapped around irrelevant graph areas.

In Weaviate, ACORN is the default filtered vector search strategy for new collections from v1.34. This is useful for restrictive filters and low-correlation filters where simple sweeping can waste work.

Enabling ACORN Explicitly

from weaviate.classes.config import Configure

client.collections.create(
    name="KnowledgeAsset",
    vector_index_config=Configure.VectorIndex.hnsw(
        filter_strategy=Configure.VectorIndex.FilterStrategy.acorn
    )
)

Expert: Property-Level Index Configuration

At scale, not every property should have every index enabled. Property-level index configuration lets you decide which fields are filterable, searchable, range-filtered, or excluded from vectorization.

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
    "Products",
    properties=[
        Property(
            name="title",
            data_type=DataType.TEXT,
            index_filterable=True,    # Fast match filtering
            index_searchable=True,    # BM25/hybrid search
        ),
        Property(
            name="price",
            data_type=DataType.NUMBER,
            index_range_filters=True, # Range queries: >, <, >=, <=
        ),
        Property(
            name="region",
            data_type=DataType.TEXT,
            index_filterable=True,
            index_searchable=False,   # No keyword search needed
        ),
        Property(
            name="internal_id",
            data_type=DataType.TEXT,
            index_filterable=False,   # Never filtered
            index_searchable=False,   # Never searched
            skip_vectorization=True,  # Exclude from vector
        ),
    ]
)

The rule of thumb is simple: if a property will never be queried, do not index it for filtering or keyword search. If it is internal metadata, do not let it pollute the vector. This saves storage, speeds imports, and keeps retrieval behavior cleaner.

Architecture at a Glance

User Query:
near_text("pricing strategy") + Filter(region=APAC, role=manager)
                          │
              ┌───────────▼────────────┐
              │   Inverted Index       │
              │   Builds allow-list:   │
              │   [id1, id5, id9 ...]  │
              └───────────┬────────────┘
                          │ allow-list passed in
              ┌───────────▼────────────┐
              │   HNSW Vector Index    │
              │   Filter-aware search  │
              │   over eligible IDs    │
              └───────────┬────────────┘
                          │
              ┌───────────▼────────────┐
              │   Final Results        │
              │   Ranked by distance   │
              └────────────────────────┘

Level	Key concept
Beginner	Filters narrow results by metadata conditions.
Intermediate	Pre-filtering builds an allow-list before vector search returns final results.
Intermediate-advanced	Different inverted index types serve different filter needs.
Advanced	Filter strategies like ACORN help with restrictive, low-correlation filters.
Expert	Per-property index tuning balances performance, storage, and retrieval quality.

Summary

Vector database metadata filtering architecture is about more than syntax. A good system uses structured indexes to build an eligible set of objects, then combines that set with vector retrieval so results are both semantically relevant and structurally valid.

Using Weaviate as an example, the architecture includes an inverted index for filtering, an HNSW index for vector search, optional range and metadata indexes, filter strategies such as ACORN, and property-level index configuration. Together, these pieces make metadata filtering part of retrieval execution rather than a cleanup step after search.