LangGraph Reflection

author:BZdate:2025-10-19

Langgraph Reflections

Overview

Reflection is related to agent self-improvement or reasoning feedback loops.

Reflection is a framework-level pattern (or module) that enables an LLM agent to evaluate its own outputs, learn from mistakes, and revise or improve results across iterations.
Reflexion: It’s an agent training framework where an LLM learns from verbal feedback (its own or from a reward model) through multiple episodes — akin to reinforcement learning (RL) but using text-based reflection.
Language Agent Tree Search (LATS): It is a framework that combines LLMs with tree-search planning, inspired by algorithms like Monte Carlo Tree Search (MCTS).

Reflection

The agent goes through a Reflect → Revise cycle:

Generation: The agent produces an initial answer or plan.
Reflection: The agent reviews that output — often using another LLM call — and identifies mistakes or improvements.
Revision: The agent updates its reasoning or final output based on the reflection.

RefleXion paper

The agent goes through each episode involving:

Trajectory Generation → The agent takes actions and generates an outcome.
Reflection → The agent verbalizes what worked or failed.
Improvement → The agent uses the reflection to guide the next episode’s reasoning or action sequence.

This process yields cumulative learning across episodes — a meta-learning approach for long-term reasoning and self-correction.

LATS paper

LATS lets an LLM simulate multiple reasoning paths, evaluate them, and select the best one — instead of following a single linear reasoning trace (like in ReAct or standard chain-of-thought).

1
          Root (Question)
2
          /      |      \
3
  Step A1     Step B1     Step C1
4
    |           |           |
5
  ...         ...         ...
6
  (each branch grows via reasoning steps)

Algorithmic Structure

A simplified LATS loop looks like:

Expansion

Generate multiple next-step reasoning candidates from the current node using an LLM.
Evaluation

Use another LLM call (or a value function) to score each candidate by quality, correctness, or expected utility.
Selection / Backpropagation

Propagate scores up the tree and select the best reasoning trajectory.
Termination

Stop when a branch reaches a high-confidence or goal state (e.g., complete answer or solved task).