Metadata filtering architecture decides how a vector search system combines semantic similarity with structured constraints. A query may ask for content similar to “pricing strategy,” but the system may also need to enforce region = "APAC", role = "manager", a date range, a tenant boundary, or a permission label.
This guide explains metadata filtering from beginner to advanced levels. It uses Weaviate as the concrete technology example because its filtering model shows the architecture clearly, but the concepts apply broadly to production vector search, semantic search, hybrid search, and RAG systems.
Beginner: What Is Metadata Filtering?
What Is a Filter?
A filter is a condition that narrows search results. Instead of searching every object, the system only returns objects that match specific criteria, such as region = "EMEA", role = "admin", or status = "published".
In vector search, filters are used alongside semantic similarity. The result should be similar to the query and also match the structured conditions.
What Is Metadata?
Metadata is data about your data. In a vector database, metadata usually includes object properties and system fields that describe, classify, or control access to an object.
- Object properties, such as
product,region,role,category, orstatus - System metadata, such as creation time, update time, and object ID
- Array states, such as whether a tag list is empty or how many values it contains
Why Combine Filters with Vector Search?
A pure vector search finds the most semantically similar objects. But real applications usually need constraints.
Find documents similar to “pricing strategy,” but only for
region = "APAC"androle = "manager".
Filters let the system enforce these requirements during retrieval. That is essential for permission-aware RAG, product search, multi-tenant applications, and enterprise search.
Intermediate: Post-Filtering vs Pre-Filtering
There are two common ways to combine filters with vector search: post-filtering and pre-filtering.
| Approach | How it works | Problem or benefit |
|---|---|---|
| Post-filtering | Run vector search first, then remove non-matching results. | Restrictive filters can remove many top results and produce unstable or empty result sets. |
| Pre-filtering | Build an eligible object list first, then run vector search within that list. | Structured constraints shape retrieval before final results are selected. |
Weaviate uses pre-filtering. This means the filter is resolved before vector search returns final results. Architecturally, this is important because filtering is not just a cleanup step after search. It participates in retrieval.
Intermediate: The Two-Index Architecture
A filter-aware vector search system needs two kinds of lookup working together: one for structured filtering and one for vector similarity.
┌─────────────────────────────────────┐
│ Vector Database Shard │
│ │
│ ┌──────────────┐ ┌─────────────┐ │
│ │ Inverted │ │ HNSW Vector │ │
│ │ Index │ │ Index │ │
│ │ (Filtering) │ │ (Search) │ │
│ └──────┬───────┘ └──────┬──────┘ │
│ │ │ │
│ └──── allow-list ──┘ │
└─────────────────────────────────────┘
In Weaviate, each shard contains an inverted index and an HNSW vector index side by side.
- The inverted index resolves the metadata predicate and builds an allow-list of matching object IDs.
- The HNSW index performs vector search while using that allow-list to constrain which objects can be returned.
Objects outside the allow-list are not eligible as final results. This avoids relying on brute-force cleanup after retrieval.
Intermediate-Advanced: Inverted Index Types
Metadata filters do not all have the same shape. Equality filters, range filters, and keyword search need different index behavior. In Weaviate, individual inverted indexes can be created per property and per index type.
| Index type | Purpose | Default | Typical data types |
|---|---|---|---|
indexFilterable | Fast match-based filtering using a roaring bitmap style index. | true | Most property types except unsupported binary or special types |
indexSearchable | Keyword search for BM25 and hybrid retrieval. | true for text fields | text, text[] |
indexRangeFilters | Numeric and date range filtering. | false | int, number, date |
When Match and Range Indexes Are Both Enabled
If both match-oriented and range-oriented indexes are enabled, the database can route different operators to the better index structure.
| Operator | Preferred index |
|---|---|
| Equal / not equal | indexFilterable |
| Greater than / less than | indexRangeFilters |
This matters because equality filters and range filters have different performance needs.
Intermediate-Advanced: Special Metadata Indexes
Some metadata filters require optional index settings. These should be enabled only when the application needs them.
from weaviate import classes as wvc
client.collections.create(
name="KnowledgeAsset",
inverted_index_config=wvc.config.Configure.inverted_index(
index_timestamps=True, # Filter by creation/update time
index_null_state=True, # Filter by null/non-null properties
index_property_length=True # Filter by array/string length
)
)
These indexes add storage and indexing overhead. Use them when the query pattern is real: time filters, missing-value filters, or array/string length filters. Do not enable them only because they exist.
Advanced: Filter Strategies for HNSW Search
Filtered vector search is difficult when filters are restrictive. If only a small part of the graph matches the filter, the vector search algorithm may spend time around objects that cannot be returned.
Strategy 1: Sweeping
Graph traversal → check filter at each node → skip if not matching
In a sweeping strategy, graph traversal proceeds normally and candidates are checked against the filter as they are encountered. This can work, but it may waste effort when the filter has low correlation with the vector query.
For example, a query for “luxury handbags” with a filter like price < 50 may produce a low-correlation situation. The most semantically similar objects may be expensive, while the filter removes them. A naive traversal can spend too much time around objects that are not eligible.
Strategy 2: ACORN
ACORN stands for ANN Constraint-Optimized Retrieval Network. It is a filter-aware strategy for HNSW-based vector search.
ACORN improves filtered traversal in three important ways:
- It avoids distance calculations on non-matching objects.
- It uses multi-hop neighborhood evaluation to reach filtered regions faster.
- It can seed additional matching entry points so traversal is not trapped around irrelevant graph areas.
In Weaviate, ACORN is the default filtered vector search strategy for new collections from v1.34. This is useful for restrictive filters and low-correlation filters where simple sweeping can waste work.
Enabling ACORN Explicitly
from weaviate.classes.config import Configure
client.collections.create(
name="KnowledgeAsset",
vector_index_config=Configure.VectorIndex.hnsw(
filter_strategy=Configure.VectorIndex.FilterStrategy.acorn
)
)
Expert: Property-Level Index Configuration
At scale, not every property should have every index enabled. Property-level index configuration lets you decide which fields are filterable, searchable, range-filtered, or excluded from vectorization.
from weaviate.classes.config import Configure, Property, DataType
client.collections.create(
"Products",
properties=[
Property(
name="title",
data_type=DataType.TEXT,
index_filterable=True, # Fast match filtering
index_searchable=True, # BM25/hybrid search
),
Property(
name="price",
data_type=DataType.NUMBER,
index_range_filters=True, # Range queries: >, <, >=, <=
),
Property(
name="region",
data_type=DataType.TEXT,
index_filterable=True,
index_searchable=False, # No keyword search needed
),
Property(
name="internal_id",
data_type=DataType.TEXT,
index_filterable=False, # Never filtered
index_searchable=False, # Never searched
skip_vectorization=True, # Exclude from vector
),
]
)
The rule of thumb is simple: if a property will never be queried, do not index it for filtering or keyword search. If it is internal metadata, do not let it pollute the vector. This saves storage, speeds imports, and keeps retrieval behavior cleaner.
Architecture at a Glance
User Query:
near_text("pricing strategy") + Filter(region=APAC, role=manager)
│
┌───────────▼────────────┐
│ Inverted Index │
│ Builds allow-list: │
│ [id1, id5, id9 ...] │
└───────────┬────────────┘
│ allow-list passed in
┌───────────▼────────────┐
│ HNSW Vector Index │
│ Filter-aware search │
│ over eligible IDs │
└───────────┬────────────┘
│
┌───────────▼────────────┐
│ Final Results │
│ Ranked by distance │
└────────────────────────┘
| Level | Key concept |
|---|---|
| Beginner | Filters narrow results by metadata conditions. |
| Intermediate | Pre-filtering builds an allow-list before vector search returns final results. |
| Intermediate-advanced | Different inverted index types serve different filter needs. |
| Advanced | Filter strategies like ACORN help with restrictive, low-correlation filters. |
| Expert | Per-property index tuning balances performance, storage, and retrieval quality. |
Summary
Vector database metadata filtering architecture is about more than syntax. A good system uses structured indexes to build an eligible set of objects, then combines that set with vector retrieval so results are both semantically relevant and structurally valid.
Using Weaviate as an example, the architecture includes an inverted index for filtering, an HNSW index for vector search, optional range and metadata indexes, filter strategies such as ACORN, and property-level index configuration. Together, these pieces make metadata filtering part of retrieval execution rather than a cleanup step after search.