Skip to main content

Tune prompts to your domain

GRAIL works well out-of-the-box because its prompts are generalists: they cover narrative text, scientific papers, code, legal contracts — all reasonably. But the gap between reasonable and excellent lives in the prompts.

If you spend an afternoon tuning the two or three critical prompts for your domain, you'll see compounding improvements across every layer.

Why it matters (gains compound)

Prompts affect four cascading layers:

extraction prompts → better graph

report prompts → better community reports

search prompts → better context to the LLM

synthesis prompts → better final answers

A 20% improvement in extraction doesn't yield 20% better answers — it yields something closer to 60%, because each layer amplifies the previous.

What you can tune

GRAIL ships 10 overrideable prompts. Group them mentally into three families:

Family 1 — Graph construction (highest impact)

PromptWhat it doesWhen to tune
entity_relationSingle-pass extraction — entities, relationships, descriptions, anticipated queriesAlmost always for non-generic corpora
community_reportNarrative summary per community — basis of global modeWhen global answers feel vague
summarize_descriptionConsolidates descriptions when an entity appears many timesWhen you see contradictory descriptions
entity_dedupDetects duplicate entities for mergingIf your corpus has noisy naming
create_custom_entitiesDiscovers new entity types from a sampleWhen starting in a new domain

Family 2 — Answer synthesis

PromptWhat it doesWhen to tune
local_searchAssembles local context and answersTo pin voice, structure, mandatory citations
global_mapRelevance scoring of community reportsAdvanced, rarely needed
global_reduceFinal synthesis of global modeTo control the shape of the thematic summary

Family 3 — Support (rarely tune)

PromptWhat it doesWhen to tune
json_correctionRepairs malformed JSON (fallback)Almost never
claim_extractionExtracts claims/covariates (optional)Only if you enabled covariates

How it works (1 minute of architecture)

Each prompt in GRAIL is a Python module with three exports:

NAME = "entity_relation" # logical name (must match filename)
REQUIRED_PARAMS = ["entity_types", "input_text"] # validated before calling

def build_messages(**params) -> list[dict]:
return [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
]

The PromptRegistry resolves by name, in this order:

custom_paths (in the order you list them) → builtin → KeyError

If your custom directory doesn't have entity_relation.py, the builtin is used. If it does, yours wins.

3-minute setup

1. Create the prompts directory

mkdir -p my-project/my_prompts

2. Copy the builtin you want to tune

cp grail/prompts/builtin/entity_relation.py my-project/my_prompts/entity_relation.py

(Or read the builtin source with grail prompt show entity_relation.)

3. Point at it in grail.yaml

prompts:
custom_paths:
- ./my_prompts
strict: false # true = requires all 10 prompts to be provided

4. Edit the prompt and re-index

grail index ./my-project

Done. GRAIL now uses your version for entity_relation and the builtins for everything else.

Tuning strategies (the ones that pay)

Strategy A — Change the prompt's examples

The builtin entity_relation ships three examples: narrative fiction, scientific paper, and code. These examples bias the LLM.

If your corpus is medical, replace the examples with real queries plus their expected extractions. The LLM will imitate the pattern.

EXAMPLES = """
Sample text: "The patient with grade IV glioblastoma multiforme received
adjuvant temozolomide + radiotherapy per the Stupp protocol."

Expected output:
<extracted_data>
("entity"<|>GLIOBLASTOMA MULTIFORME<|>DISEASE<|>Primary brain tumor...)##
("entity"<|>TEMOZOLOMIDE<|>DRUG<|>Oral alkylating agent...)##
("entity"<|>STUPP PROTOCOL<|>GUIDELINE<|>Standard regimen...)##
("relationship"<|>TEMOZOLOMIDE<|>GLIOBLASTOMA MULTIFORME<|>first-line treatment<|>9)##
</extracted_data>
"""

Strategy B — Constrain relationship types

GRAIL can classify relationships with a controlled vocabulary:

indexing:
extract_relationship_types: true
relationship_types:
- TREATS
- CONTRAINDICATES
- INTERACTS_WITH
- METABOLIZES

This turns edges from RELATED (generic) into typed relationships, which makes structural queries dramatically more useful.

Strategy C — Pin the voice of local mode

By default, local_search answers in neutral assistant tone. If your product needs a specific voice (formal-legal, technical-medical, conversational), edit the system prompt:

def build_messages(context_data, user_query, **kwargs):
return [
{"role": "system", "content":
"You are a legal assistant specialised in Chilean law. "
"Always cite the specific article and statute. "
"Don't answer with generalities — only with verifiable text "
"from the corpus. If you can't find an answer, say so explicitly."
},
{"role": "user", "content":
f"Context:\n{context_data}\n\nQuestion: {user_query}"
},
]

Strategy D — Full multilingual pack

For a 100% non-English experience (not just translating queries), create a pack that replaces all prompts with versions in your target language:

prompts:
custom_paths:
- ./prompts_es
strict: true # error if any are missing → forces a complete pack

We recommend strict: true only when you're sure all 10 files exist — fail fast at startup instead of mixing languages.

Iterative development workflow

Tuning prompts means iterating. GRAIL ships tools to make the loop cheap and reproducible:

1. Enable LLM cache

llm:
cache_enabled: true

If you re-run with the same prompt + same input, the provider isn't charged again. This turns the "tweak → index → review" loop into a free iteration after the first pass.

2. Index a sample, not the full corpus

mkdir my-project-sample/input
cp my-project/input/{0,1,2}*.pdf my-project-sample/input/
grail index ./my-project-sample

3 PDFs give you enough feedback to iterate. Only once you're happy with the prompt should you index the full corpus.

3. Inspect the effective prompts

grail prompt list # all registered prompts
grail prompt show entity_relation # print the full prompt
grail prompt show entity_relation --project ./my-project # uses your custom pack

4. Trace a query to see the prompt live

grail query ./my-project-sample "..." --mode local --trace ./traces
cat ./traces/*.json | jq '.llm_calls[0].messages'

See Trace queries for the full story.

Common pitfalls

1. You changed entity_relation delimiters and now extract 0 entities

The parser in grail.indexing.entities_relationships reads DEFAULT_DELIMITERS from the prompt module. If you change the delimiters in the prompt content without exporting DEFAULT_DELIMITERS, the parser uses the old ones and everything silently breaks.

Fix: export the new ones in your module:

DEFAULT_DELIMITERS = {
"tuple_delimiter": "||",
"record_delimiter": "@@",
"start_delimiter": "<data>",
"completion_delimiter": "</data>",
}

2. Thinking model (Qwen3.6, etc.) burns all max_tokens on <think> and the answer comes back truncated

Raise the limits:

indexing:
extraction_max_tokens: 16384
community:
max_report_length: 16384
search:
response_max_tokens: 16384

3. strict: true and you forgot a prompt

GRAIL fails at startup with a clear message listing what's missing. Fixes:

  • List all 10 in your custom directory, or
  • Switch to strict: false to merge with builtins

4. Your prompt isn't being used

Check the order in custom_paths:

prompts:
custom_paths:
- ./prompts_v2 # has entity_relation.py
- ./prompts_v1 # also has entity_relation.py

The first in the list wins. To revert to v1, swap the order.

When NOT to tune prompts

  • Your corpus is genuinely generic (Wikipedia, novels, mixed code without a pattern). The builtins will compete well.
  • You don't have time to iterate. Tuning without iterating tends to make things worse — the right domain expert in your head isn't the same as the right prompt.
  • You're validating whether GRAIL fits. First run the quickstart with builtins. Only invest in prompts after the rest of the system works.

Internal reference

For the full technical deep-dive — including the exact parser contract for entity_relation, every parameter of every prompt, and advanced workflows (e.g. adaptive prompts per chunk type) — see the internal doc docs/prompt_customization.md in the repo.

Next step

  • Trace queries — verify which prompts the LLM saw on each answer.
  • Cost optimisation — LLM cache makes prompt iteration free.
  • Memory model — in memory mode prompts matter less (the agent declares directly), but community_report and local_search still apply to queries.