Elastic Architecture
Elasticsearch Chunking
for Agentic AI
Choosing the Right Strategy
"The chunking strategy you pick determines what your agents see — and how well they reason."
Elastic Serverless semantic_text dense_vector Hybrid Search Reranking

Why Chunking Matters

Document chunking breaks large documents into smaller segments before converting them to vectors for storage and retrieval.

This aids an agent by pinpointing the exact excerpt needed to answer a query — rather than retrieving the entire document.

The alternative — vectorizing the whole document — means important details get washed out by surrounding context. Signal is lost.

📶
Signal-to-Noise Ratio
An effective chunking strategy maximises the relevance of context delivered to an agent. The wrong strategy leaves agents reasoning over noise.

Architecture

High-level architecture
  • All logic executed from a Jupyter notebook
  • Elastic Serverless as the vector platform — provisioned via Terraform
  • Single sample markdown document — sample_doc.md — 2,427 words
  • Shared query: "How do distributed systems handle failures and recovery?"
  • Embedding: .jina-embeddings-v5-text-small (1024 dims · EIS)

Four Strategies

  • 1
    Semantic Text — Automatic Chunking ES handles chunking, embedding, and retrieval. Zero pipeline overhead.
  • 2
    Semantic Text — Pre-chunked Arrays Application controls chunk boundaries; ES still manages embedding.
  • 3
    External Chunking — One Document Per Chunk Each chunk is an independent ES document with dense_vector.
  • 4
    External Chunking — Nested Chunks Chunks stored as nested objects inside the parent document.
Strategy 1

Semantic Text — Automatic Chunking

Scenario 1

Index the raw document as a single string. Elasticsearch automatically:

  • Splits into chunks via the recursive strategy, aligned to markdown headings
  • Generates embeddings via .jina-embeddings-v5-text-small on EIS
  • Stores chunks internally per document
  • Returns ranked chunks via highlighting
Strategy 1

Mapping & Ingest

scenario1_mapping = {
    "mappings": {
        "properties": {
            "content": {
                "type": "semantic_text",
                "chunking_settings": {
                    "strategy": "recursive",
                    "max_chunk_size": 200,
                    "separator_group": "markdown"
                }
            }
        }
    }
}
es.indices.create(index=S1_INDEX,
                  body=scenario1_mapping)
es.index(index=S1_INDEX, id=1,
         document={"content": doc_content})

18 chunks auto-produced — aligned to markdown headings. No external library. No local inference.

Chunk 0: 62 words — # Building Resilient... Chunk 1: 199 words — ## Foundations of... Chunk 2: 138 words — ### The Fallacies of... Chunk 3: 108 words — ## Consensus and... Chunk 4: 174 words — ### Leader Election... ... Chunk 17: 177 words — ### Health Checks...
Strategy 1

Search & Results

response = es.search(
    index=S1_INDEX,
    body={
        "query": {
            "semantic": {
                "field": "content",
                "query": QUERY,
            }
        },
        "_source": False,
        "highlight": {
            "fields": {
                "content": {
                    "type": "semantic",
                    "number_of_fragments": 5,
                    "order": "score",
                }
            }
        },
        "size": 5
    },
)

Top-5 chunks returned in relevance order

Score: 0.8272 ## Handling Failure ### Failure Detection... # Building Resilient Distributed Systems... ## Foundations of Distribution... ### The Fallacies of Distributed Computing... ### Leaderless Replication...
Strategy 1

Appraisal

✓ Pros

  • Simplest implementation — index raw text, ES does the rest
  • Zero pipeline overhead — no external libraries or local inference
  • Configurable via chunking_settings
  • Highlighting returns chunks ranked by similarity
  • Compatible with hybrid BM25+vector search

✗ Cons

  • No per-chunk metadata — can't attach tags or filters to individual chunks
  • Chunks are internal — can't update or delete a single chunk without reindexing the whole document

⚡ Good For

  • Rapid prototyping — working semantic search in minutes
  • Teams without dedicated ML ops who want vector search out of the box
Strategy 2

Semantic Text — Pre-chunked Arrays

Scenario 2

Disable ES auto-chunking with strategy: "none". Application pre-chunks the document and passes a chunk array to the semantic_text field.

ES still handles embedding and retrieval — no dense_vector or inference pipeline needed.

Chunking strategy: regex split on markdown headings (H1 / H2 / H3)

Strategy 2

Code & Results

# Disable auto-chunking
mapping = {
    "content": {
        "type": "semantic_text",
        "chunking_settings": {
            "strategy": "none"
        }
    }
}

# Regex split on all markdown headings
chunks = re.split(
    r'(?=^#{1,3} )',
    doc_content,
    flags=re.MULTILINE
)
chunks = [c.strip() for c in chunks
          if c.strip()]
# → 24 chunks

# Pass chunk array to semantic_text
es.index(index=S2_INDEX,
         document={"content": chunks})

24 chunks — one per heading section

Score: 0.8147 # Building Resilient Distributed Systems... ### Failure Detection... ### Why Distribute?... ### The Fallacies of Distributed Computing... ## Observability in Distributed Systems...
Strategy 2

Appraisal

✓ Pros

  • Application controls exact chunk boundaries
  • ES still manages embedding — no local inference
  • Same simple semantic_text mapping as Strategy 1
  • Highlighting retrieves the most relevant pre-defined chunks

✗ Cons

  • No per-chunk metadata
  • Can't independently update or delete a single chunk
  • Chunking quality is your responsibility
  • Each chunk must fit the inference model's token limit

⚡ Good For

  • Documents with clear structural boundaries — markdown headers, HTML sections, paragraphs
  • When you need chunking control without managing embedding infrastructure
Strategy 3

External Chunking — One Document Per Chunk

Scenario 3

Each chunk is indexed as an independent top-level document with a dense_vector field.

  • Chunking: semchunk library (chunk_size=200)
  • Embeddings: Elastic Inference API
  • Per-chunk metadata: parent_id, chunk_index
  • Vector quantization: int8_hnsw
  • Same index demonstrates hybrid search + reranking — no reindexing needed
Strategy 3

Ingest & kNN Search

chunks = chunk(doc_content,
               chunk_size=200,
               token_counter=token_counter)
# → 15 chunks

# Batch embed via Inference API
result = es.inference.inference(
    inference_id=EMBEDDING_MODEL,
    task_type="text_embedding",
    body={"input": batch}
)

# kNN search
response = es.search(
    index=S3_INDEX,
    body={
        "knn": {
            "field": "embedding",
            "query_vector": query_embedding,
            "k": 5,
            "num_candidates": 20
        },
        "size": 5
    }
)

kNN results — top-5 chunks

Score: 0.8321 Chunk #7 This model offers high availability... Score: 0.8024 Chunk #0 # Building Resilient Distributed Systems... Score: 0.7957 Chunk #1 Availability is the second driver... Score: 0.7740 Chunk #2 1. The network is reliable... Score: 0.7685 Chunk #4 Protocols like Raft and Paxos...
Strategy 3

Hybrid Search + Reranking

response = es.options(
    request_timeout=120).search(
    index=S3_INDEX,   # same index as kNN
    body={
      "retriever": {
        "text_similarity_reranker": {
          "retriever": {
            "linear": {
              "retrievers": [
                {"retriever": {"standard": {
                    "query": {"match": {
                      "chunk_text": QUERY}}}},
                 "weight": 0.3},   # BM25
                {"retriever": {"knn": {
                    "field": "embedding",
                    "query_vector": query_embedding,
                    "k": 10}},
                 "weight": 0.7}    # vector
              ]}},
          "inference_id": ".jina-reranker-v3",
          "inference_text": QUERY,
          "field": "chunk_text"
        }
      }
    }
)

No reindexing required — query-level capability

Score: 1.2116 Chunk #7 This model offers high availability... Score: 1.1384 Chunk #8 Circuit breakers prevent a failing dependency... Score: 1.0229 Chunk #2 1. The network is reliable... Score: 1.0068 Chunk #1 Availability is the second driver...

★ Reranker surfaces Circuit Breakers (Chunk 8) — absent from pure kNN top-5

Strategy 3

Appraisal

✓ Pros

  • Each chunk is independent — update, delete, or reindex without touching others
  • Per-chunk metadata for filtering

✗ Cons

  • Requires an external chunking pipeline
  • More moving parts — chunking library, embedding generation
  • Chunk quality depends entirely on splitter configuration

⚡ Good For

  • Agentic AI apps needing filterable, retrievable chunks with metadata
  • Production systems requiring full control over model and retrieval
  • Multi-document corpora
Strategy 4

External Chunking — Nested Chunks

Scenario 4
  • Chunks stored as nested objects inside the parent — atomic indexing, strict cohesion
  • Chunking: LangChain RecursiveCharacterTextSplitter (chunk_size=800, chunk_overlap=100)
  • 28 chunks stored in one parent doc
  • Parent and all chunks indexed/deleted as a single atomic unit
  • Query via nested kNN with inner_hits
Strategy 4

Code & Results

response = es.search(
    index=S5_INDEX,
    body={
        "query": {
            "nested": {
                "path": "chunks",
                "query": {
                    "knn": {
                        "field": "chunks.embedding",
                        "query_vector": query_embedding,
                        "num_candidates": 20
                    }
                },
                "inner_hits": {
                    "size": 5,
                    "_source": [
                        "chunks.chunk_index",
                        "chunks.chunk_text"
                    ]
                }
            }
        },
        "size": 1
    },
)

Parent document + matched inner hits

Parent: Building Resilient Distributed Systems Score: 0.8300 Score: 0.8300 Chunk #12 ## Handling Failure ### Failure Detection... Score: 0.8086 Chunk #0 # Building Resilient... Score: 0.7946 Chunk #4 Every design decision... Score: 0.7936 Chunk #6 The challenge is what happens when the leader...
Strategy 4

Appraisal

✓ Pros

  • Document cohesion — chunks physically co-located with their parent
  • Atomic indexing — parent and all chunks indexed/deleted as one unit
  • Inner hits return matched chunks with scores while preserving parent context

✗ Cons

  • Nested kNN queries are more complex and slower than top-level kNN
  • Updating one chunk requires reindexing the entire parent
  • Cannot independently filter or paginate chunks across documents
  • Performance and doc size degrade as chunk count grows

⚡ Good For

  • Small-to-moderate collections with modest chunks per document
  • Document integrity is paramount
  • Always retrieving chunks in the context of their parent

Conclusion

semantic_text provides the most automation — index raw text, Elasticsearch handles the rest. Start here.
dense_vector with external chunking gives full control — choose your chunker, your embedding model, and your retrieval strategy.
Hybrid BM25+vector search and reranking are query-level capabilities — no reindexing required to layer them in.
The right strategy depends on how much control your agents need over what they retrieve.
Full source & notebook:  github.com/joeywhelan/es-chunking