Resumable Query - Upstash Documentation

Resumable queries let you start a nearest-neighbor search and continue fetching additional results later without restarting the search. This is useful for pagination or when you want to stream results incrementally instead of loading the whole result set into memory. Quick example (sync):

from upstash_vector import Index

index = Index()

# start a resumable query (returns initial results and a handle)
results, handle = index.resumable_query(
    vector=[0.1, 0.2],
    top_k=2,
    include_metadata=True,
    include_vectors=True,
    namespace="example-namespace",
)

with handle:
    # `results` contains the first batch
    for r in results:
        print(r.id, r.metadata, getattr(r, "vector", None))

    # fetch more results (fetch_next returns a list)
    more = handle.fetch_next(3)
    for r in more:
        print(r.id)

    # when the context block exits the handle is stopped automatically

Parameters

vector (List[float] | SupportsToList | None) — query vector (mutually exclusive with data).
top_k (int, default 10) — how many top matches to return in the initial batch.
include_vectors (bool, default False) — include full vector values on results.
include_metadata (bool, default False) — include metadata on results.
filter (str, default "") — filter expression to narrow results by metadata.
data (str | None) — text query for indexes using Upstash-hosted embedding models.
namespace (str, default DEFAULT_NAMESPACE) — namespace to search in.
include_data (bool, default False) — include stored data field (used for embedding indexes).
max_idle (int, default 3600) — how long the server keeps the resumable query alive (seconds).
sparse_vector (SparseVector | TupleAsSparseVectorT | None) — sparse vector for sparse/hybrid indexes.
weighting_strategy (WeightingStrategy | None) — weighting strategy for sparse vectors.
fusion_algorithm (FusionAlgorithm | None) — fusion algorithm for hybrid scoring.
query_mode (QueryMode | None) — query mode for hybrid embedding indexes (e.g. SPARSE).

How it works

The call to resumable_query returns a tuple (result, handle) where result is the first batch (list of QueryResult) and handle is a ResumableQueryHandle.
Use handle.fetch_next(n) (or await handle.fetch_next(n)) to retrieve the next n results. If no more results are available an empty list is returned.
Always stop the handle when finished (use with handle: / async with handle: or call handle.stop() / await handle.stop()). After the handle is stopped, further calls to fetch_next or stop raise an error (tests expect UpstashError).

Examples Simple paging (sync)

results, handle = index.resumable_query(vector=[0.1, 0.2], top_k=2, namespace="ns")

with handle:
    all_results = list(results)
    while True:
        next_batch = handle.fetch_next(2)
        if not next_batch:
            break
        all_results.extend(next_batch)

# handle is stopped after exiting the context manager

Sparse / hybrid example (showing advanced options)

from upstash_vector.types import SparseVector, WeightingStrategy, FusionAlgorithm

scores, handle = index.resumable_query(
    vector=[0.1, 0.1],
    sparse_vector=([0], [0.1]),
    top_k=2,
    include_vectors=True,
    include_metadata=True,
    include_data=True,
    weighting_strategy=WeightingStrategy.IDF,
    fusion_algorithm=FusionAlgorithm.DBSF,
    namespace="hybrid-ns",
)

with handle:
    for s in scores:
        print(s.id, getattr(s, "vector", None), getattr(s, "sparse_vector", None))
    more = handle.fetch_next(1)
    print("more:", more)

Notes

The server enforces a limit on the number of active resumable queries; keep them short-lived and call stop() when finished. The default max_idle is 3600 seconds.
After calling stop() (explicitly or via context manager), further fetch_next or stop calls will raise an error.