SQL Undefined and Null Semantics in Search Filters

SQL-style null semantics can confuse search filters because search systems do not always treat missing fields, explicit nulls, empty strings, empty arrays, and undefined application values the same way a relational database treats NULL.

For RAG, semantic search, and document retrieval, this matters because filters decide which records enter the retrieval set. If null and undefined behavior is unclear, the search system may silently exclude relevant documents or include documents that should not match.

Short Answer

SQL NULL means the value is unknown or absent, and SQL comparisons with NULL use three-valued logic: true, false, and unknown.

Search filters often use a simpler model. A field may be present, missing, null, empty, or unindexed. Some search systems merge missing and null into one null state. Others distinguish them. Some treat empty strings or empty arrays as null-like values.

The safest rule is to define null semantics at ingestion time, index the states you plan to filter on, and avoid assuming that field != value automatically includes null or missing records.

SQL NULL Is Not a Normal Value

In SQL, NULL is not equal to anything, including another NULL.

This is why SQL uses special predicates:

WHERE department IS NULL
WHERE department IS NOT NULL

A condition such as this does not behave like many beginners expect:

WHERE department != 'sales'

Rows where department is NULL do not simply pass the filter. The comparison evaluates to unknown, not true.

Three-Valued Logic Explained

SQL filters evaluate expressions as true, false, or unknown.

Only true rows are returned. False rows are excluded. Unknown rows are also excluded unless the query explicitly asks for them with IS NULL or a similar condition.

This is useful in relational systems, but it can surprise teams when they move metadata filtering into document search or vector search.

Search Filters Often Use Different Semantics

Search systems are usually built around indexes. If a field is missing, there may be no posting list entry for that field. If a field is null, the system may need a separate null-state index to find it efficiently.

That means null filtering is not only a logic question. It is also an indexing question.

A search system may need to know ahead of time whether it should track null states, property length, timestamps, or other filterable metadata.

Undefined Is Usually an Application Concept

SQL has NULL, but it does not have JavaScript-style undefined.

In application code, undefined often means a field was not sent at all. By the time the object reaches a database or search index, that may become a missing property, a null value, or no indexed value.

Those are not always equivalent.

For example, these three documents may produce different filter behavior depending on the system:

{ "department": "legal" }
{ "department": null }
{ }

The first has a value. The second has an explicit null. The third has no property.

Common Search Filter States

Search filters usually need to account for these states:

Known value: the field exists and has a usable value.
Explicit null: the field exists but has no value.
Missing field: the document does not contain the field.
Empty string: the field exists as text but contains no characters.
Empty array: the field exists as a list but contains no elements.
Unindexed field: the value may exist in storage but cannot be used by the filter index.

The last case is easy to overlook. A value can exist in the document but still be unavailable for filtering if the index is not configured for that property or state.

Why Not-Equal Filters Are Dangerous

Not-equal filters are a common source of null bugs.

A developer may expect this filter to mean “everything except sales”:

department != "sales"

But what should happen to documents with no department?

There are several possible interpretations:

exclude them because the value is unknown
include them because they are not sales
return them only when the user explicitly asks for missing departments
treat missing department as a data-quality problem

None of these is universally correct. The application must decide.

Null Handling in RAG Filters

In RAG systems, filters often run before vector search or alongside hybrid search. They may filter by tenant, product, region, role, language, document type, publication status, or access level.

If null behavior is wrong, retrieval quality suffers.

A strict filter may hide relevant documents with incomplete metadata. A loose filter may retrieve documents from the wrong product, region, or permission boundary.

The right behavior depends on the field. Missing summary may be acceptable. Missing tenant_id should usually be treated as unsafe.

Use Explicit Predicates

When filtering on nullable fields, avoid relying on equality and inequality alone.

Use explicit predicates such as:

is null
is not null
exists
does not exist
length = 0
status = unknown
status = not_applicable

The exact operator names vary by database, but the design principle is the same: make absence explicit.

Use Status Fields When SQL Semantics Are Not Enough

SQL NULL says the value is absent or unknown. It does not explain why.

Search applications often need that reason.

{
  "renewal_date": null,
  "renewal_date_status": "unknown"
}

This is different from:

{
  "renewal_date": null,
  "renewal_date_status": "not_applicable"
}

Both records have a null date, but they should not behave the same in filters, dashboards, or data cleanup workflows.

How Empty Strings and Empty Arrays Complicate Things

Some systems treat empty strings and zero-length arrays as null-like values for specific filter operations. Others keep them separate.

For text fields, an empty string may mean the source sent a blank value. For array fields, an empty array may mean the extraction system found no tags, or it may mean extraction has not run yet.

Do not assume these states are interchangeable. Normalize them or store companion status fields.

Implementation Example With Weaviate

In Weaviate, filtering by null state requires null-state indexing to be enabled for the collection. This lets filters query whether a property is null.

from weaviate.classes.config import Configure
from weaviate.classes.query import Filter

inverted_index_config = Configure.inverted_index(
    index_null_state=True,
    index_property_length=True
)

is_missing_department = Filter.by_property("department").is_none(True)
has_department = Filter.by_property("department").is_none(False)
empty_tags = Filter.by_property("tags", length=True).equal(0)

The important lesson is not the specific API. It is that null-state and property-length filtering may require explicit index support.

Design Rules for Search Filters

Decide whether missing and explicit null mean the same thing.
Do not rely on not-equal filters to handle nulls correctly.
Index null state if users or workflows need to filter by null values.
Use property length or count fields for empty arrays.
Use status fields when absence has business meaning.
Use strict defaults for access-control metadata.
Test filters with known, null, missing, empty, and unindexed examples.

Example: Safer Product Documentation Filters

Suppose a search system filters documentation by product line.

A naive rule might include every document where product_line != deprecated. But documents with missing product lines may slip into or out of results depending on filter semantics.

A safer model is:

{
  "product_line": "analytics",
  "product_line_status": "known",
  "visibility": "public"
}

Then the retrieval filter can require product_line_status = known and visibility = public, while a data-quality workflow separately reviews documents where product_line_status = unknown.

Summary

SQL NULL, application-level undefined, missing document fields, empty strings, and empty arrays are related but not identical.

SQL uses three-valued logic, where comparisons involving NULL often evaluate to unknown. Search systems may instead depend on index configuration and their own null-state rules.

For production search and RAG systems, the best approach is to model absence explicitly, index the states you need to filter, and test each nullable field with realistic examples before relying on it in retrieval.