AI Solutions
AI Solutions
AI for Work

Search across silos. Automate workflows. Orchestrate AI agents. Govern with confidence.

learn more
features
Enterprise SearchIntelligent OrchestratorPre-Built AI AgentsAdmin ControlsAI Agent Builder
Departments
SalesMarketingEngineeringLegalFinance
PRE-BUILT accelerators
HRITRecruiting
AI for Service

Leverage Agentic capabilities to empower customers and create personalized experiences.

learn more
features
AI agentsAgent AI AssistanceAgentic Contact CenterQuality AssuranceProactive Outreach
PRE-BUILT accelerators
RetailBankingHealthcare
AI for Process

Streamline knowledge-intensive business processes with autonomous AI agents.

learn more
features
Process AutomationAI Analytics + MonitoringPre-built Process Templates
Use Cases
Zero-Touch IT Operations Management
Top Resources
Scaling AI: practical insights
from AI leaders
AI use cases: insights from AI's leading decision makers
Beyond AI islands: how to fully build an enterwise-wide AI workforce
QUICK LINKS
About Kore.aiCustomer StoriesPartnersResourcesBlogWhitepapersDocumentationAnalyst RecognitionGet supportCommunityAcademyCareersContact Us
Agent Platform
Agent Platform
Agent Platform

Your strategic enabler for enterprise AI transformation.

learn more
FEATURES
Multi-Agent Orchestration
AI Engineering Tools
Search + Data AI
AI Security + Governance
No-Code + Pro-Code Tools
Integrations
GET STARTED
AI for WorkAI for ServiceAI for ProcessAgent Marketplace
LEARN + DISCOVER
About Kore.aiCustomer StoriesPartnersResource HubBlogWhitepapersAI Research ReportsNewsroomAnalyst RecognitionDocumentationGet supportAcademy
GET INVOLVED
AI PulseEventsCommunityCareersContact Us
upcoming event

CCW Berlin brings together international experts, visionary speakers, and leading companies to explore the future of customer experience, AI, and digital transformation in a dynamic blend of congress and exhibition

Berlin
4 Feb
register
Recent AI Insights
The AI productivity paradox: why employees are moving faster than enterprises
The AI productivity paradox: why employees are moving faster than enterprises
AI INSIGHT
12 Jan 2026
The Decline of AI Agents and Rise of Agentic Workflows
The Decline of AI Agents and Rise of Agentic Workflows
AI INSIGHT
01 Dec 2025
AI agents and tools: Empowering intelligent systems for real world impact
AI agents and tools: Empowering intelligent systems for real world impact
AI INSIGHT
12 Nov 2025
Agent Marketplace
More
More
Resources
Resource Hub
Blog
Whitepapers
Webinars
AI Research Reports
AI Glossary
Videos
AI Pulse
Generative AI 101
Responsive AI Framework
CXO Toolkit
support
Documentation
Get support
Submit RFP
Academy
Community
COMPANY
About us
Leadership
Customer Stories
Partners
Analyst Recognition
Newsroom
Events
Careers
Contact us
Agentic AI Guides
forrester cx wave 2024 Kore at top
Kore.ai named a leader in The Forrester Wave™: Conversational AI for Customer Service, Q2 2024
Generative AI 101
CXO AI toolkit for enterprise AI success
upcoming event

CCW Berlin brings together international experts, visionary speakers, and leading companies to explore the future of customer experience, AI, and digital transformation in a dynamic blend of congress and exhibition

Berlin
4 Feb
register
Talk to an expert
Not sure which product is right for you or have questions? Schedule a call with our experts.
Request a Demo
Double click on what's possible with Kore.ai
Sign in
Get in touch
Background Image 1
Blog
Conversational AI
Self-Reflective Retrieval-Augmented Generation (SELF-RAG)

Self-Reflective Retrieval-Augmented Generation (SELF-RAG)

Published Date:
October 4, 2024
Last Updated ON:
November 20, 2025

The SELF-RAG framework trains a single arbitrary language model to adaptively retrieve passages on-demand. To generate and reflect on retrieved passages and on own generations using special tokens, called reflection tokens.

  1. It is interesting to note, that RAG is following very much the same trajectory as prompt engineering. RAG started off as a simple yet effective concept which consists of prompt injection with contextual reference data.
  2. The primary objective of RAG is to leverage ICL (In-Context Learning) capabilities of LLMs.
    Complexity and efficiency are being introduced to RAG. Retrieval does not take place by default, and a process of triage takes place to determine if the LLM can fulfil the user request.
  3. Efficiency and accuracy trade-off. There is always a balance to be found between efficiency and accuracy. Accuracy at the cost of efficiency negatively impacts user experience and practical use-cases. Efficiency at the cost of accuracy leads to a misleading and inaccurate solution.
  4. Triaging user input to determine direct LLM inference or prompt injection via RAG requires a reference. In the case of SELF-RAG it is against a fine-tuned LLM making use of self-reflection.
  5. The principle of RAG triage can be applied in various forms. The most important aspect is the reference against which the decision is made to directly infer the question from an LLM, or make use of RAG. And in the case where RAG is used; being able to assess the quality and correctness of the response.
  6. Generative AI based applications can also include a wider consideration for triage…where other options apart from direct inference or RAG are available. For instance, human-in-the-loop, web search, multi-LLM orchestration, etc.
RAG triage diagram comparing efficiency versus accuracy and showing how user queries pass through a fine-tuned LLM to determine whether to retrieve documents or escalate to a larger model.
SELF-RAG framework for adaptive retrieval and reflection

Reflection tokens

Reflection tokens are categorised into retrieval and critique tokens to indicate the need for retrieval and its generation quality respectively.
SELF-RAG uses reflection tokens to decide the need for retrieval and to self-evaluate generation quality.
Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behaviour to diverse task requirements.
The study shows that SELF-RAG significantly outperforms LLMs and also standard RAG approaches.

Steps in SELF-RAG

  1. The LLM generates text informed by retrieved passages.
  2. Criticise the output by learning to generate special tokens.
  3. These reflection tokens signal the need for retrieval or confirm the output’s relevance, support, or completeness.
  4. In contrast, common RAG approaches retrieve passages indiscriminately, without ensuring complete support from cited sources.
Diagram showing critic LLM workflows, including retrieval steps, augmented outputs, relevance scoring, and utility labels for generated text.
SELF-RAG retrieves, critiques, and generates text passages

Considering the image below…SELF-RAG learns to retrieve, critique and generate text passages to enhance overall generation quality, factuality, and verifiability.

Side-by-side comparison of Retrieval-Augmented Generation and Self-RAG, illustrating retrieval steps, parallel segment generation, critique scoring, and selection of the best answer.
Parallel inference steps in self-reflective RAG architecture

Some considerations

Additional inference and cost

SELF-RAG will introduce more overhead in terms of inference. Considering the image above, the self-reflective approach to RAG introduces more points of inference.

A first step of inference is performed, with three inference steps being performed in parallel. The three results are then compared and a winner is selected for RAG inference.

Out-of-domain

Also as can be seen in the image above, out-of-domain queries are recognised as such, and the request is not serviced via retrieval, but sent directly to the LLM inference.

Agentic RAG

Considering the image blow, the question needs to be asked…

With the complexity being introduced to the RAG process, are we not reaching a point where an agent-based RAG approach will work best? An approach LlamaIndex refers to as Agentic RAG.

Intents

There has been studies where intent-based routing has been used to triage user input for the correct treatment with in a generative AI framework. Intents are merely pre-defined use-case classes.

Diagram showing a user question routed to a top-level agent, followed by a Cohere reranker and multiple agents performing summarization, embeddings, and document processing.
Evolution from static prompts to Agentic RAG systems
Find the original study here
Share
Link copied
authors
Cobus Greyling
Cobus Greyling
Chief Evangelist
Forrester logo at display.
Kore.ai named a leader in the Forrester Wave™ Cognitive Search Platforms, Q4 2025
Access Report
Gartner logo in display.
Kore.ai named a leader in the Gartner® Magic Quadrant™ for Conversational AI Platforms, 2025
Access Report
Stay in touch with the pace of the AI industry with the latest resources from Kore.ai

Get updates when new insights, blogs, and other resources are published, directly in your inbox.

Subscribe
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Recent Blogs

View all
Agentic AI in Retail: Transforming Customer Experience & Operations 
January 23, 2026
Agentic AI in Retail: Transforming Customer Experience & Operations 
Top Glean Alternatives (2026 Guide)
January 23, 2026
Top Glean Alternatives (2026 Guide)
AI Agents in 2026: From Hype to Enterprise Reality
January 16, 2026
AI Agents in 2026: From Hype to Enterprise Reality
Start using an AI agent today

Browse and deploy our pre-built templates

Marketplace
Reimagine your business

Find out how Kore.ai can help you today.

Talk to an expert
Background Image 4
Background Image 9
You are now leaving Kore.ai’s website.

‍

Kore.ai does not endorse, has not verified, and is not responsible for, any content, views, products, services, or policies of any third-party websites, or for any verification or updates of such websites. Third-party websites may also include "forward-looking statements" which are inherently subject to risks and uncertainties, some of which cannot be predicted or quantified. Actual results could differ materially from those indicated in such forward-looking statements.



Click ‘Continue’ to acknowledge the above and leave Kore.ai’s website. If you don’t want to leave Kore.ai’s website, simply click ‘Back’.

CONTINUEGO BACK
Reimagine your enterprise with Kore.ai
English
Spanish
Spanish
Spanish
Spanish
Get Started
AI for WorkAI for ServiceAI for ProcessAgent Marketplace
Kore.ai agent platform
Platform OverviewMulti-Agent OrchestrationAI Engineering ToolsSearch and Data AIAI Security and GovernanceNo-Code and Pro-Code ToolsIntegrations
ACCELERATORS
BankingHealthcareRetailRecruitingHRIT
company
About Kore.aiLeadershipCustomer StoriesPartnersAnalyst RecognitionNewsroom
resources
DocumentationBlogWhitepapersWebinarsAI Research ReportsAI GlossaryVideosGenerative AI 101Responsive AI frameworkCXO Toolkit
GET INVOLVED
EventsSupportAcademyCommunityCareers

Let’s work together

Get answers and a customized quote for your projects

Submit RFP
Follow us on
© 2026 Kore.ai Inc. All trademarks are property of their respective owners.
Privacy PolicyTerms of ServiceAcceptable Use PolicyCookie PolicyIntellectual Property Rights
|
×