@reyzowter
Hello Agents
Hello Agents
๐Ÿค– Build AI Agents from Scratch
From theory to practice โ€” master the design and implementation of AI agent systems.
Covers Agent Principles ยท Classic Paradigms ยท Framework Development ยท Real-World Cases
View on GitHub Follow the Author

โœจ What You'll Learn

๐Ÿ“–

Completely Free & Open Source

All content is free โ€” grow together with the community.

๐Ÿ”

Core Principles

Deep dive into Agent concepts, history, and classic paradigms to build a solid foundation.

๐Ÿ—๏ธ

Hands-On Building

Master leading no-code platforms and Agent frameworks in practice.

๐Ÿ› ๏ธ

Build Your Own Framework

Build your own Agent framework from scratch using the OpenAI native API.

โš™๏ธ

Advanced Skills

Step-by-step: context engineering, memory, protocols, evaluation, and more.

๐Ÿค

Model Training

Master Agentic RL โ€” full pipeline from SFT to GRPO for LLM training.

๐Ÿš€

Real-World Projects

Build a Smart Travel Assistant, Cyber Town, and other comprehensive projects.

๐Ÿ’ผ

Interview Preparation

Study Agent interview questions to boost your career in AI.

๐Ÿ“– Course Navigation

ChapterKey ContentStatus
Part 1: Agent & LLM Foundations
Ch.1 Introduction to AgentsAgent definition, types, paradigms & applicationsโœ… Done
Ch.2 History of AgentsFrom symbolic AI to LLM-powered Agentsโœ… Done
Ch.3 LLM FundamentalsTransformer, prompting, major LLMs, and limitationsโœ… Done
Part 2: Build Your LLM Agent
Ch.4 Classic Agent ParadigmsHands-on ReAct, Plan-and-Solve, Reflectionโœ… Done
Ch.5 No-Code PlatformsCoze, Dify, n8n and other low-code Agent platformsโœ… Done
Ch.6 Framework DevelopmentAutoGen, AgentScope, LangGraph and moreโœ… Done
Ch.7 Build Your Own FrameworkBuild an Agent framework from scratchโœ… Done
Part 3: Advanced Topics
Ch.8 Memory & RetrievalMemory systems, RAG, vector storageโœ… Done
Ch.9 Context Engineering"Context understanding" for continuous interactionโœ… Done
Ch.10 Agent Communication ProtocolsMCP, A2A, ANP protocol analysisโœ… Done
Ch.11 Agentic-RLFull pipeline from SFT to GRPO for LLM trainingโœ… Done
Ch.12 Agent EvaluationCore metrics, benchmarks & evaluation frameworksโœ… Done
Part 4: Advanced Case Studies
Ch.13 Smart Travel AssistantMCP and multi-agent collaboration in practiceโœ… Done
Ch.14 Automated Deep Research AgentDeepResearch Agent reproduction & analysisโœ… Done
Ch.15 Cyber Town SimulationCombining Agents with games to simulate social dynamicsโœ… Done
Part 5: Capstone & Future
Ch.16 Capstone ProjectBuild your own complete multi-Agent applicationโœ… Done

๐Ÿ’ก How to Learn

Welcome, future AI systems builder! This course balances theory and practice, helping you master the design and development of single-agent to multi-agent systems end-to-end. It is especially suited for AI developers, software engineers, and students with a basic programming background, as well as self-learners with a strong interest in cutting-edge AI.

Before starting, you should have basic Python programming skills and a general understanding of large language models (e.g., knowing how to call an LLM via API). This course focuses on application and building โ€” no deep algorithmic or model-training background is required.

๐Ÿ’ก Study tip: Agents are a fast-moving, highly practice-driven field. For the best results, we strongly recommend running, debugging, and even modifying every code snippet provided. All companion code is in the project's code folder.
HELLO AGENTS
Preface
A Note Before We Begin
Project origins, background, and reader guidance

๐Ÿ“Œ Project Origins

If 2024 was the year of the "model wars," then 2025 is undoubtedly the "Year of Agents." The technical focus is shifting from training larger foundation models to building smarter Agent applications. Yet systematic, practice-oriented tutorials remain scarce.

That's why we launched Hello-Agents โ€” to provide a complete guide for building Agent systems from scratch, balancing theory and hands-on practice.

๐ŸŽฏ What This Course Is

Hello-Agents is a systematic Agent learning curriculum. Current Agent development falls into two main camps: software-engineering Agents (like Dify, Coze, n8n โ€” essentially LLM-backed workflow automation), and truly AI-native Agents driven by genuine AI reasoning.

This course guides you to understand and build the latter โ€” truly AI-native Agents. We'll cut through the surface of frameworks, start from the core principles of Agents, explore their architecture, understand classic paradigms, and ultimately build your own multi-Agent application.

๐ŸŒŸ We believe the best way to learn is by doing. We hope this course becomes your starting point for exploring the world of Agents โ€” transforming you from an LLM "user" into an Agent "builder."

๐Ÿ‘ฅ Who This Is For

  • AI developers, software engineers, and students with basic programming skills
  • Self-learners with a strong interest in cutting-edge AI
  • Developers who want to level up from "using LLMs" to "building Agents"
  • Candidates preparing for Agent-related roles

๐Ÿ“‹ Prerequisites

  • Basic Python programming ability
  • General understanding of LLMs (knowing how to call an LLM via API is enough)
  • No deep algorithmic or model-training background required
โš ๏ธ Note: This course focuses on application and building, not heavy math or theory. If you need in-depth model training knowledge, we recommend supplementing with other resources.
Part 1 ยท Agent & LLM Foundations
Chapter 1
Introduction to Agents
Agent definition, types, paradigms, and application scenarios

๐Ÿค– What Is an Agent?

An Agent is a computational entity that perceives its environment, makes decisions, and takes actions to achieve specific goals. Unlike traditional software, an Agent has autonomy, reactivity, proactivity, and social ability.

In AI, a modern LLM Agent uses a large language model as its "brain" and can use tools, plan tasks, interact with its environment, and complete complex tasks.

๐Ÿ’ก A simple analogy: If the LLM is a highly knowledgeable expert, the Agent is that expert plus a pair of hands, a pair of eyes, and an action plan โ€” it doesn't just answer questions, it actively solves problems.

๐Ÿท๏ธ Core Components of an Agent

1. Perception

Agents perceive the state of their environment through various inputs (text, images, sensor data, etc.). LLM Agents typically receive information via user messages, tool outputs, and memory systems.

2. Planning

Agents decompose complex tasks into executable sub-steps and formulate action plans. This is the core capability of LLM Agents and what distinguishes them from plain LLMs.

3. Action

Agents execute concrete operations by calling tools (APIs, code execution, search, etc.) or interacting with other Agents, changing the state of the environment.

4. Memory

Agents maintain short-term memory (current conversation context) and long-term memory (persistent knowledge and experience) to support continuous interaction and learning.

๐Ÿ“Š Types of Agents

  • Simple Reflex Agents: Respond directly to current perception with no internal state
  • Model-Based Agents: Maintain an internal world model and consider consequences of actions
  • Goal-Based Agents: Plan and act with a goal in mind
  • Utility-Based Agents: Weigh different action options via a utility function
  • Learning Agents: Can learn from experience and improve behavior
  • Multi-Agent Systems: Multiple Agents work together to solve complex problems

๐ŸŒŸ LLM Agent Application Scenarios

  • ๐Ÿ” Information Retrieval & Research: Automatically search, summarize, and analyze large volumes of information
  • ๐Ÿ’ป Code Development Assistance: Code generation, debugging, and test automation
  • ๐Ÿ“Š Data Analysis: Automated data processing and insight extraction
  • ๐Ÿ—ฃ๏ธ Customer Service: Intelligent support and problem resolution
  • ๐ŸŽฏ Task Automation: Workflow automation and task delegation
  • ๐ŸŽฎ Game NPCs: Game characters with realistic behavior
Python
from openai import OpenAI

client = OpenAI()

def simple_agent(user_query: str) -> str:
    """A minimal LLM Agent example"""
    messages = [
        {"role": "system", "content": "You are an intelligent assistant that can answer questions and perform simple tasks."},
        {"role": "user", "content": user_query}
    ]
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
    
    return response.choices[0].message.content

# Test
result = simple_agent("Analyze the development trends in AI agents")
print(result)
Part 1 ยท Agent & LLM Foundations
Chapter 2
History of Agents
From symbolic AI to the LLM-powered Agent era

๐Ÿ“… Agent Development Timeline

The history of Agents is an important thread in AI research โ€” from early symbolic AI to today's LLM-centric modern Agents, spanning decades of evolution.

๐Ÿ›๏ธ Phase 1: Symbolic AI Era (1950sโ€“1980s)

Early AI research was dominated by rule systems and expert systems โ€” hand-crafted logical rules to simulate intelligent behavior.

  • STRIPS (1971): The first true AI planning system
  • MYCIN (1974): Medical diagnosis expert system with ~600 rules
  • Shakey the Robot (1972): An autonomous robot capable of perception and planning
โš ๏ธ Limitations: Symbolic Agents relied heavily on hand-designed rules, were difficult to generalize to new scenarios, and could not handle uncertainty or ambiguous information.

๐Ÿง  Phase 2: Machine Learning & Reinforcement Learning (1990sโ€“2010s)

With the rise of machine learning, Agents began learning behavior policies from data instead of relying on pre-written rules.

  • TD-Gammon (1992): Learned to play backgammon through self-play
  • Deep Blue (1997): Defeated chess world champion Kasparov
  • AlphaGo (2016): Defeated the Go world champion โ€” a landmark in deep reinforcement learning

๐ŸŒ Phase 3: The LLM Agent Era (2020sโ€“Present)

The GPT series of large language models completely changed the Agent design paradigm. LLMs as powerful "reasoning engines" gave Agents unprecedented general capabilities.

  • GPT-3 (2020): Demonstrated few-shot learning capabilities of LLMs
  • ChatGPT (2022): Explosion of conversational AI, the RLHF training paradigm
  • ReAct (2022): First Agent paradigm combining reasoning and action
  • AutoGPT (2023): Large-scale popularization of the autonomous Agent concept
  • OpenAI Assistants API (2023): Official Agent-building platform
  • Claude MCP (2024): Standardized Agent tool-calling protocol

๐Ÿ”ฎ Future Trends

  • From single Agents to multi-Agent collaborative systems
  • From text input to multimodal perception (vision, audio)
  • From passive response to proactive planning and long-term goal pursuit
  • From limited tools to an open tool ecosystem
  • Agent self-evolution and continuous learning
Part 1 ยท Agent & LLM Foundations
Chapter 3
LLM Fundamentals
Transformer architecture, prompt engineering, major LLMs, and their limitations

๐Ÿ—๏ธ Transformer Architecture

Transformer is the core architecture of modern large language models, introduced by Google in the 2017 paper "Attention is All You Need." Its core mechanism is Self-Attention, which allows the model to consider all other tokens in the sequence when processing each token.

Core Components

  • Self-Attention: Allows every token to attend to every other token, capturing long-range dependencies
  • Feed-Forward Network: Processes each position independently for feature transformation
  • Layer Normalization: Stabilizes training and speeds up convergence
  • Positional Encoding: Injects position information since Transformers have no inherent order

โœ๏ธ Prompt Engineering

Prompt engineering is the art of designing inputs to make LLMs produce the desired outputs. For Agent development, mastering prompting is fundamental.

Key Techniques

  • Zero-Shot: Directly describe the task without examples
  • Few-Shot: Provide a few examples to guide the model
  • Chain-of-Thought (CoT): Guide the model to reason step by step, improving complex task performance
  • System Prompt: Define the Agent's role and behavior guidelines

๐Ÿค– Major LLMs

  • GPT-4o (OpenAI): Currently one of the most capable multimodal models
  • Claude 3.5 (Anthropic): Excellent reasoning, very long context window
  • Gemini 1.5 (Google): Multimodal capabilities, supports 1M+ context
  • Qwen 2.5 (Alibaba): Strong Chinese and code performance, open source
  • DeepSeek-V3 (DeepSeek): Top open-source model, cost-effective

โš ๏ธ LLM Limitations

  • Knowledge Cutoff: Cannot access information after the training cutoff date
  • Hallucination: May generate plausible-sounding but factually incorrect information
  • Context Window Limits: Can only process a limited amount of text at once
  • No Execution Capability: Cannot directly perform actions such as searching the web or running code
๐Ÿ’ก Why Agents? Agents overcome LLM limitations by providing tools (solving execution), RAG (solving knowledge cutoffs), and memory systems (solving context limits).
Part 2 ยท Build Your LLM Agent
Chapter 4
Classic Agent Paradigms
Hands-on implementation of ReAct, Plan-and-Solve, and Reflection

โšก ReAct Paradigm

ReAct (Reasoning + Acting) is one of the most important Agent paradigms. It interleaves reasoning (Thought) and action (Act) in a loop, allowing the Agent to continuously adjust its strategy based on environmental feedback.

ReAct Loop

  • Thought: LLM analyzes the current situation and decides the next step
  • Action: Execute a specific tool call or operation
  • Observation: Receive the result of the action
  • Repeat the loop until the task is complete
Python - ReAct Agent
from openai import OpenAI import json client = OpenAI() tools = [ { "type": "function", "function": { "name": "web_search", "description": "Search the web for real-time information", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } } ] def react_agent(user_query: str) -> str: messages = [ {"role": "system", "content": "You are an intelligent assistant. You can use tools to help users solve problems."}, {"role": "user", "content": user_query} ] while True: response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice="auto" ) message = response.choices[0].message if message.tool_calls: # Handle tool calls messages.append(message) for tool_call in message.tool_calls: result = execute_tool(tool_call) messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result }) else: return message.content

๐Ÿ“‹ Plan-and-Solve Paradigm

Plan-and-Solve divides tasks into two phases: first create a complete plan, then execute step by step. This approach suits complex tasks requiring long-range planning.

  • Planning Phase: LLM decomposes the task into an ordered list of sub-tasks
  • Execution Phase: Execute each sub-task in sequence and collect results
  • Integration Phase: Aggregate all sub-task results into a final answer

๐Ÿ” Reflection Paradigm

Reflection lets an Agent review its own output, identify errors, and improve. This greatly increases the quality of complex task outputs.

  • Generate: Produce an initial output
  • Reflect: Evaluate output quality and find issues
  • Refine: Improve the output based on reflection
๐Ÿ’ก Practical tip: Use ReAct for simple tasks, Plan-and-Solve for tasks requiring long-range planning, and Reflection for tasks needing high-quality output. All three paradigms can also be combined.
Part 2 ยท Build Your LLM Agent
Chapter 5
No-Code Agent Platforms
Explore and use Coze, Dify, n8n, and other leading no-code Agent platforms

๐ŸŽ›๏ธ Why No-Code Platforms?

No-code Agent platforms allow users without deep programming backgrounds to quickly build powerful Agent applications. These platforms use visual interfaces, pre-built components, and templates to dramatically lower the barrier to Agent development.

๐Ÿ”ง Coze

ByteDance's Agent development platform with a rich plugin ecosystem and a simple, intuitive conversation flow design interface.

  • Plugin Marketplace: Hundreds of pre-built plugins covering search, computation, content generation, and more
  • Workflow: Visually orchestrate complex multi-step task flows
  • Knowledge Base: Upload documents to build a private knowledge base
  • Publish Channels: One-click publish to WeChat, Feishu, Discord, and other platforms

๐Ÿ”ง Dify

An open-source LLM application development platform with private deployment support โ€” a popular choice for enterprise Agent applications.

  • Application Types: Supports conversational assistants, text generation, Agents, and more
  • RAG Capabilities: Powerful document processing and retrieval-augmented generation
  • Workflow Orchestration: Visual node-based workflows with conditional branching and loops
  • API Integration: Standard API for easy integration with existing systems

๐Ÿ”ง n8n

A powerful workflow automation platform with 400+ service integrations, ideal for building complex automated Agent workflows.

  • Node Connection: Drag and drop to connect nodes from different services
  • Triggers: Supports webhooks, scheduled tasks, event-driven, and more
  • Code Nodes: Insert custom JavaScript code when needed
  • Self-Hosted: Fully open source โ€” run on your own server

โš–๏ธ Platform Comparison

๐Ÿ“Š Selection guide: Rapid prototyping โ†’ Coze; Enterprise private deployment โ†’ Dify; Complex automated workflows โ†’ n8n; Need full customization โ†’ Write code and build your own framework (see Ch.7)
Part 2 ยท Build Your LLM Agent
Chapter 6
Framework Development
AutoGen, AgentScope, LangGraph, and other leading Agent frameworks

๐Ÿ”ท LangChain / LangGraph

LangChain is the most popular LLM application development framework. LangGraph extends it with a State Graph concept โ€” particularly suited for Agent systems with complex control flows.

Python - LangGraph
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List

class AgentState(TypedDict):
    messages: List[dict]
    next_step: str

def create_agent_graph():
    llm = ChatOpenAI(model="gpt-4o")
    graph = StateGraph(AgentState)
    
    # Add nodes
    graph.add_node("reason", reason_node)
    graph.add_node("act", action_node)
    graph.add_node("observe", observe_node)
    
    # Add edges
    graph.add_edge("reason", "act")
    graph.add_edge("act", "observe")
    graph.add_conditional_edges(
        "observe",
        should_continue,
        {"continue": "reason", "end": END}
    )
    
    graph.set_entry_point("reason")
    return graph.compile()

๐Ÿ”ท AutoGen

Microsoft's open-source multi-Agent conversation framework, focused on multi-Agent collaboration. Its core idea is letting multiple Agents with different roles converse with each other to solve complex problems.

  • ConversableAgent: Base Agent type that can converse with other Agents
  • AssistantAgent: Assistant Agent with code generation and execution capabilities
  • UserProxyAgent: Proxy Agent representing the user in conversations
  • GroupChat: Group conversation supporting multiple Agents

๐Ÿ”ท AgentScope

Alibaba's open-source multi-Agent framework, designed to make multi-Agent application development simpler and more reliable.

  • Message System: Unified message format based on Msg
  • Pipeline: Flexible Agent collaboration pipelines
  • Distributed: Native support for distributed multi-Agent systems

๐Ÿ’ก How to Choose a Framework?

  • Want a mature ecosystem with lots of examples โ†’ LangChain / LangGraph
  • Focus on multi-Agent collaboration โ†’ AutoGen
  • Want to understand Agents from the ground up โ†’ Build your own (Ch.7)
Part 2 ยท Build Your LLM Agent
Chapter 7
Build Your Own Agent Framework
Build a complete Agent framework from scratch โ€” HelloAgents

๐ŸŽฏ Why Build Your Own Framework?

The best way to understand an Agent framework is to implement one yourself. By building from scratch, you'll deeply understand every component's role โ€” rather than just "using the wheel." This chapter walks you through building HelloAgents โ€” a clean Agent framework built on the OpenAI native API.

๐Ÿ—๏ธ HelloAgents Architecture

Core Components

  • Agent Core: LLM calls, tool management, message history
  • Tool System: Standardized tool registration and invocation interface
  • Memory System: Short-term and long-term memory management
  • Planner: Task decomposition and execution plan generation
Python - HelloAgents Core
from openai import OpenAI
from typing import Callable, Any
import json

class Tool:
    def __init__(self, func: Callable, name: str, description: str, parameters: dict):
        self.func = func
        self.name = name
        self.description = description
        self.parameters = parameters
    
    def to_openai_schema(self) -> dict:
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters
            }
        }
    
    def __call__(self, **kwargs) -> Any:
        return self.func(**kwargs)

class HelloAgent:
    def __init__(self, model: str = "gpt-4o", system_prompt: str = ""):
        self.client = OpenAI()
        self.model = model
        self.system_prompt = system_prompt
        self.tools: dict[str, Tool] = {}
        self.messages: list[dict] = []
        
        if system_prompt:
            self.messages.append({"role": "system", "content": system_prompt})
    
    def register_tool(self, tool: Tool):
        """Register a tool"""
        self.tools[tool.name] = tool
    
    def run(self, user_input: str) -> str:
        """Run the Agent"""
        self.messages.append({"role": "user", "content": user_input})
        
        while True:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                tools=[t.to_openai_schema() for t in self.tools.values()] or None,
                tool_choice="auto" if self.tools else None
            )
            
            assistant_msg = response.choices[0].message
            self.messages.append(assistant_msg)
            
            if not assistant_msg.tool_calls:
                return assistant_msg.content
            
            # Execute tool calls
            for tool_call in assistant_msg.tool_calls:
                tool_name = tool_call.function.name
                tool_args = json.loads(tool_call.function.arguments)
                
                if tool_name in self.tools:
                    result = self.tools[tool_name](**tool_args)
                    self.messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": str(result)
                    })

๐Ÿ”ง Using HelloAgents

Python - Example Usage
agent = HelloAgent(
    model="gpt-4o",
    system_prompt="You are a professional data analysis assistant"
)

# Register a tool
def get_weather(city: str) -> str:
    return f"It's sunny in {city} today, 25ยฐC"

weather_tool = Tool(
    func=get_weather,
    name="get_weather",
    description="Get the current weather for a city",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
    }
)

agent.register_tool(weather_tool)
result = agent.run("What's the weather like in New York today?")
print(result)
๐Ÿ’ก Full code and more examples: github.com/jjyaoao/helloagents
Part 3 ยท Advanced Topics
Chapter 8
Memory & Retrieval
Memory system design, RAG principles and practice, vector storage

๐Ÿง  Agent Memory Systems

Memory is the key to enabling Agents to learn continuously and accumulate knowledge across sessions. Drawing on human memory, Agent memory can be categorized as follows:

Memory Types

  • Sensory Memory: Brief retention of raw input โ€” e.g., the current token stream
  • Working Memory (Short-Term): Current conversation context โ€” the messages list
  • Long-Term Memory: Persistent cross-session knowledge stored in databases or vector stores
  • Procedural Memory: Skills and behavior patterns, implemented via few-shot examples or fine-tuning

๐Ÿ” Retrieval-Augmented Generation (RAG)

RAG is the core technology for overcoming LLM knowledge limitations โ€” it retrieves relevant external knowledge at generation time to improve answer quality.

Basic RAG Pipeline

  • Document Processing: Split documents into appropriately sized chunks
  • Vectorization: Use an embedding model to convert chunks into vectors
  • Storage: Store vectors in a vector database (Chroma, Pinecone, Faiss, etc.)
  • Retrieval: Vectorize the user question and retrieve the most relevant chunks
  • Generation: Inject retrieved results into the prompt; LLM generates an answer accordingly
Python - Simple RAG Implementation
from openai import OpenAI
import chromadb

client = OpenAI()
chroma = chromadb.Client()
collection = chroma.create_collection("knowledge_base")

def add_documents(docs: list[str]):
    """Add documents to the knowledge base"""
    embeddings = client.embeddings.create(
        model="text-embedding-3-small",
        input=docs
    ).data
    
    collection.add(
        documents=docs,
        embeddings=[e.embedding for e in embeddings],
        ids=[f"doc_{i}" for i in range(len(docs))]
    )

def rag_query(question: str, top_k: int = 3) -> str:
    """RAG question answering"""
    # Retrieve relevant documents
    q_embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=[question]
    ).data[0].embedding
    
    results = collection.query(
        query_embeddings=[q_embedding],
        n_results=top_k
    )
    
    context = "\n".join(results['documents'][0])
    
    # Generate answer
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer the question based on the following context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

๐Ÿ“ˆ Advanced RAG Techniques

  • Hybrid Retrieval: Combine vector retrieval with BM25 keyword retrieval
  • Reranking: Apply a second-pass ranking to retrieved results
  • Multi-Hop Retrieval: Solve complex reasoning problems through multiple retrieval rounds
  • Adaptive Retrieval: Dynamically adjust retrieval strategy based on question type
Part 3 ยท Advanced Topics
Chapter 9
Context Engineering
How to efficiently manage an LLM's context window for continuous interaction

๐Ÿ“ What Is Context Engineering?

Context Engineering is the systematic design, management, and optimization of the context fed to an LLM to maximize output quality within a limited context window.

๐Ÿ’ก Andrej Karpathy's definition: "Context engineering is the subtle art and science of filling the context window with just the right information at just the right time."

๐Ÿ—‚๏ธ What Makes Up the Context Window?

  • System Prompt: Defines the Agent's role, capabilities, and behavior guidelines
  • Tool Definitions: Tells the LLM which tools are available and their parameters
  • Conversation History: Records the userโ€“Agent interaction history
  • External Knowledge: Relevant documents retrieved from the RAG system
  • Tool Outputs: Results from previous tool calls
  • User Input: The current user message

โš™๏ธ Core Techniques

1. Conversation Compression

When conversation history grows too long and exceeds the context window, compression is needed:

  • Sliding Window: Keep only the most recent N turns of dialogue
  • Summary Compression: Use an LLM to generate a summary of the conversation history
  • Importance Filtering: Retain key information based on importance scores

2. Dynamic Context Injection

Dynamically decide which context to inject based on the current question:

  • Retrieve relevant memories based on user intent
  • Select relevant tools based on task type
  • Adjust the system prompt based on the conversation stage

3. KV Cache Optimization

By leveraging the LLM's KV Cache mechanism โ€” placing unchanged system prompts and tool definitions at the very beginning of the context โ€” you can significantly reduce API call costs.

๐Ÿ’ก Best practice: When designing context, always place the most important information at the very beginning (system prompt) and the very end (latest user input). LLMs pay the most attention to content at these positions.
Part 3 ยท Advanced Topics
Chapter 10
Agent Communication Protocols
Deep analysis and practice of MCP, A2A, ANP, and other protocols

๐ŸŒ Why Do We Need Agent Communication Protocols?

As Agent systems grow more complex, communication between different Agents and between Agents and tools becomes a key challenge. Communication protocols solve interoperability โ€” allowing Agents and tools built by different developers to collaborate seamlessly.

๐Ÿ”Œ MCP (Model Context Protocol)

An open standard protocol proposed by Anthropic that defines how LLM applications communicate with external tools and data sources.

  • Core Concepts: Server (tool provider) and Client (AI application)
  • Transport Layer: Supports stdio and HTTP/SSE transport
  • Capability Types: Tools, Resources, Prompt Templates
  • Ecosystem: Hundreds of official and community MCP Servers already available
Python - MCP Server Example
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import mcp.types as types

app = Server("my-mcp-server")

@app.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="get_stock_price",
            description="Get the real-time price of a stock",
            inputSchema={
                "type": "object",
                "properties": {
                    "symbol": {"type": "string", "description": "Stock ticker symbol"}
                },
                "required": ["symbol"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    if name == "get_stock_price":
        symbol = arguments["symbol"]
        # Call the stock API
        price = fetch_stock_price(symbol)
        return [TextContent(type="text", text=f"{symbol} current price: {price}")]

async def main():
    async with stdio_server() as streams:
        await app.run(*streams)

๐Ÿค A2A (Agent-to-Agent) Protocol

A protocol proposed by Google for inter-Agent communication, focused on interoperability between different AI Agent systems. A2A defines standard ways for Agents to discover each other, declare capabilities, and delegate tasks.

๐Ÿ”— ANP (Agent Network Protocol)

An Agent communication protocol for open networks, enabling decentralized discovery and communication between Agents on the open internet.

๐Ÿ“Œ Protocol selection guide: Tool integration โ†’ MCP; Multi-Agent enterprise systems โ†’ A2A; Open-network Agent ecosystems โ†’ ANP
Part 3 ยท Advanced Topics
Chapter 11
Agentic-RL
Full pipeline from SFT to GRPO for hands-on LLM training

๐ŸŽ“ Why Train Agents?

General-purpose LLMs often perform poorly on specific Agent tasks. Targeted training can teach models to use tools more effectively, plan, and execute complex tasks.

๐Ÿ“š Training Paradigm Overview

1. Supervised Fine-Tuning (SFT)

Supervisedly train a model on high-quality Agent trajectory data to learn correct behavior patterns.

  • Data Format: (task, tool-call sequence, final result) triples
  • Advantages: Stable training, high data efficiency
  • Disadvantages: Requires large amounts of high-quality annotated data; hard to exceed expert demonstrations

2. RLHF (Reinforcement Learning from Human Feedback)

Optimize model behavior through human ratings of model outputs โ€” the core training technique behind ChatGPT.

3. GRPO (Group Relative Policy Optimization)

An efficient RL algorithm proposed by DeepSeek โ€” the key technology behind DeepSeek-R1's success.

  • Sample multiple outputs for the same question; score them relative to each other within the group
  • No separate Critic model needed โ€” significantly reduces compute cost
  • Especially suited for tasks with clear correct answers, like math and code
Python - GRPO Reward Function Example
def compute_reward(completions: list[str], ground_truth: str) -> list[float]:
    """
    GRPO reward function example
    Compute rewards for multiple model outputs on the same question
    """
    rewards = []
    for completion in completions:
        # Format reward: does the output contain a chain of thought?
        format_reward = 1.0 if "<think>" in completion else 0.0
        
        # Correctness reward: is the final answer correct?
        if extract_answer(completion) == ground_truth:
            correctness_reward = 1.0
        else:
            correctness_reward = 0.0
        
        rewards.append(0.3 * format_reward + 0.7 * correctness_reward)
    
    return rewards

๐Ÿ› ๏ธ Agent Training Best Practices

  • Start with small models and small datasets to verify your pipeline is correct
  • Reward function design is critical โ€” think carefully about it
  • Use tools like Weights & Biases to monitor training
  • Regularly evaluate on a test set to prevent overfitting
Part 3 ยท Advanced Topics
Chapter 12
Agent Evaluation
Core metrics, benchmarks & evaluation frameworks explained

๐Ÿ“ Why Does Evaluation Matter?

No improvement without evaluation. In Agent systems, evaluation is especially difficult โ€” Agent output is often a multi-step trajectory, not a single answer, and many tasks have no unique correct answer.

๐Ÿ“Š Core Evaluation Dimensions

  • Task Success Rate: Proportion of tasks the Agent completes successfully
  • Step Efficiency: Average number of steps required to complete a task
  • Tool Accuracy: Proportion of correct tool selections and calls
  • Hallucination Rate: Frequency of fabricated information from the model
  • Cost Efficiency: Token consumption and API cost per task
  • Latency: Average time to complete a task

๐Ÿ† Leading Agent Benchmarks

  • GAIA: Tests Agent general capability on real-world tasks
  • WebArena: Evaluates Agents' ability to operate web interfaces
  • SWE-bench: Evaluates Agents solving real software engineering problems
  • HotPotQA: Multi-hop question-answering and reasoning evaluation
  • AgentBench: Comprehensive multi-task Agent evaluation

๐Ÿ”ง Evaluation Framework in Practice

Python - Simple Evaluation Framework
from dataclasses import dataclass
from typing import Callable

@dataclass
class EvalResult:
    task_id: str
    success: bool
    steps: int
    tokens_used: int
    latency_ms: float
    error: str = None

class AgentEvaluator:
    def __init__(self, agent, test_cases: list[dict]):
        self.agent = agent
        self.test_cases = test_cases
    
    def run(self) -> list[EvalResult]:
        results = []
        for case in self.test_cases:
            import time
            start = time.time()
            try:
                output = self.agent.run(case["input"])
                success = case["judge"](output)
                results.append(EvalResult(
                    task_id=case["id"],
                    success=success,
                    steps=self.agent.step_count,
                    tokens_used=self.agent.token_count,
                    latency_ms=(time.time()-start)*1000
                ))
            except Exception as e:
                results.append(EvalResult(
                    task_id=case["id"],
                    success=False,
                    steps=0, tokens_used=0,
                    latency_ms=(time.time()-start)*1000,
                    error=str(e)
                ))
        return results
    
    def summarize(self, results: list[EvalResult]) -> dict:
        success_rate = sum(r.success for r in results) / len(results)
        avg_steps = sum(r.steps for r in results) / len(results)
        return {
            "success_rate": f"{success_rate:.1%}",
            "avg_steps": f"{avg_steps:.1f}",
            "total_tokens": sum(r.tokens_used for r in results)
        }
๐Ÿ’ก Evaluation tip: Build a fixed evaluation set and run it with every iteration โ€” that's the only way to track real progress. Don't just look at success rate; step efficiency and cost matter equally.
Part 4 ยท Case Studies
Chapter 13
Smart Travel Assistant
Hands-on intelligent travel planning system using MCP and multi-Agent collaboration

๐ŸŒ Project Overview

The Smart Travel Assistant is a hands-on project that applies MCP tool protocols and multi-Agent collaboration. The system automatically plans itineraries, queries flights and hotels, and generates travel guides based on user needs.

๐Ÿ—๏ธ System Architecture

  • Master Agent: Understands user intent and coordinates sub-Agents
  • Flight Search Agent: Connects to flight search APIs via MCP
  • Hotel Recommendation Agent: Recommends accommodations based on preferences and budget
  • Attraction Planning Agent: Retrieves destination attraction information and recommended routes
  • Budget Calculation Agent: Aggregates and optimizes total travel costs

๐Ÿ”ง Core MCP Tools

  • search_flights(origin, dest, date) โ€” Search available flights
  • get_hotels(city, checkin, checkout, budget) โ€” Query hotels
  • get_attractions(city, interests) โ€” Get attraction recommendations
  • get_weather_forecast(city, dates) โ€” Query weather forecast
  • calculate_budget(items) โ€” Calculate travel budget
Python - Travel Assistant Master Workflow
async def travel_agent_workflow(user_request: str) -> str:
    """
    Travel assistant master workflow
    """
    # Step 1: Parse user intent
    parsed = await intent_parser.parse(user_request)
    # {destination: "Tokyo", dates: ["2025-03-15", "2025-03-22"], budget: 2000}
    
    # Step 2: Parallel queries (improve efficiency)
    flights, hotels, weather = await asyncio.gather(
        flight_agent.search(parsed["origin"], parsed["destination"], parsed["dates"]),
        hotel_agent.search(parsed["destination"], parsed["dates"], parsed["budget"]),
        weather_agent.forecast(parsed["destination"], parsed["dates"])
    )
    
    # Step 3: Attraction planning
    attractions = await attraction_agent.plan(
        city=parsed["destination"],
        duration=7,
        interests=parsed.get("interests", [])
    )
    
    # Step 4: Generate comprehensive travel plan
    plan = await planner_agent.generate(
        flights=flights,
        hotels=hotels,
        attractions=attractions,
        weather=weather,
        budget=parsed["budget"]
    )
    
    return plan.to_markdown()
๐Ÿ—บ๏ธ Result: The user inputs "Plan a 7-day trip to Tokyo next week with a $2000 budget" and the system generates a complete travel plan โ€” including flight recommendations, hotel choices, a daily itinerary, and a cost breakdown โ€” in under 30 seconds.
Part 4 ยท Case Studies
Chapter 14
Automated Deep Research Agent
DeepResearch Agent reproduction & analysis โ€” making AI work like a researcher

๐Ÿ”ฌ What Is a DeepResearch Agent?

DeepResearch is a deep research feature from OpenAI that can autonomously perform multi-round web searches, read literature, integrate information, and finally produce professional-grade research reports. This chapter reproduces its core logic.

๐Ÿ” Core Workflow

  • Problem Analysis: Decompose complex research questions into searchable sub-questions
  • Iterative Search: Multiple rounds of search, each refining the strategy based on the previous round
  • Content Extraction: Extract key information from search results
  • Knowledge Integration: Integrate information from multiple sources into a coherent knowledge base
  • Report Generation: Generate a structured research report from the integrated knowledge
Python - DeepResearch Core Loop
class DeepResearchAgent:
    def __init__(self, max_iterations: int = 5):
        self.max_iterations = max_iterations
        self.knowledge_base = []
    
    async def research(self, topic: str) -> str:
        # 1. Generate initial search plan
        search_plan = await self.generate_search_plan(topic)
        
        for iteration in range(self.max_iterations):
            # 2. Execute searches
            search_results = await asyncio.gather(*[
                self.web_search(query) 
                for query in search_plan.queries
            ])
            
            # 3. Extract and validate information
            extracted = await self.extract_information(search_results)
            self.knowledge_base.extend(extracted)
            
            # 4. Assess knowledge completeness
            assessment = await self.assess_knowledge_gaps(
                topic=topic,
                knowledge=self.knowledge_base
            )
            
            if assessment.is_sufficient:
                break
            
            # 5. Generate next round search plan
            search_plan = await self.generate_next_plan(assessment.gaps)
        
        # 6. Generate final report
        return await self.generate_report(topic, self.knowledge_base)

๐Ÿ’ก Key Technical Points

  • Problem Decomposition: Use tree structures to break complex questions into sub-problems
  • Query Optimization: Dynamically adjust search queries based on existing knowledge
  • Deduplication: Identify and merge duplicate information from different sources
  • Citation Management: Maintain source citations to ensure reports are traceable
Part 4 ยท Case Studies
Chapter 15
Cyber Town Simulation
Combining Agents with games to simulate real social dynamics and group behavior

๐Ÿ™๏ธ What Is Cyber Town?

Cyber Town is a multi-Agent social simulation project inspired by Stanford's "Smallville" paper. In a virtual town, multiple Agents with independent personalities, memories, and goals live together โ€” producing emergent social behavior.

๐ŸŽฎ System Architecture

Game World

  • Map System: Contains locations like homes, cafes, libraries, and parks
  • Time System: Simulates 24 hours of a day
  • Interaction System: Agents can interact with objects and other Agents

Agent Design

  • Persona: Each Agent has a unique personality, profession, and interests
  • Memory System: Agents remember past experiences and conversations
  • Planning System: Agents create daily plans based on their goals
  • Social System: Agents can build relationships and hold conversations
Python - Town Resident Agent
class TownResident:
    def __init__(self, name: str, persona: str):
        self.name = name
        self.persona = persona
        self.memories = MemoryStream()
        self.location = "home"
        self.schedule = []
    
    async def perceive(self, environment: dict) -> list[str]:
        """Perceive the current environment"""
        observations = []
        for entity in environment.get("entities", []):
            if self.is_relevant(entity):
                observations.append(f"I see {entity['description']}")
                await self.memories.add(entity)
        return observations
    
    async def plan_day(self) -> list[dict]:
        """Plan the day's schedule"""
        context = f"You are {self.name}. {self.persona}\n"
        context += f"Today's memories: {await self.memories.get_recent(5)}\n"
        
        schedule = await llm.generate_schedule(
            context=context,
            current_time="08:00",
            available_locations=TOWN_LOCATIONS
        )
        self.schedule = schedule
        return schedule
    
    async def react(self, observation: str) -> str:
        """React to an observed event"""
        relevant_memories = await self.memories.retrieve(observation)
        
        response = await llm.chat(
            system=f"You are {self.name}. {self.persona}",
            context=relevant_memories,
            user=observation
        )
        
        await self.memories.add({
            "content": f"I said: {response}",
            "importance": 7
        })
        return response
๐ŸŒŸ Emergent Behavior Example: After running for 48 hours, town residents spontaneously organized a "party" โ€” one Agent decided to host it, the invitation spread through the social network, and other Agents decided whether to attend based on their own plans. This behavior was not pre-programmed; it was a natural result of multi-Agent interaction.
Part 5 ยท Capstone & Future
Chapter 16
Capstone Project
Build your own complete multi-Agent application and demonstrate your learning

๐ŸŽ“ Capstone Goals

Congratulations on completing all of Hello-Agents! The capstone project is your chance to apply everything you've learned to build a complete multi-Agent application. Through this project, you will:

  • Apply Agent architecture, tool calling, memory systems, and multi-Agent collaboration
  • Experience the full Agent application development lifecycle
  • Build real project experience to enrich your portfolio

๐Ÿ“‹ Project Requirements

Core Requirements (Must Complete)

  • โœ… Implement at least one core Agent capability (tool calling, planning, etc.)
  • โœ… Integrate at least 3 different types of tools
  • โœ… Implement basic memory / context management
  • โœ… Provide clear code documentation and a README

Advanced Requirements (Optional)

  • โญ Implement multi-Agent collaboration
  • โญ Integrate a RAG knowledge base
  • โญ Deploy to the cloud with a public URL
  • โญ Build a web interface

๐Ÿ’ก Topic Ideas

๐Ÿ“š

Personal Knowledge Assistant

Manage personal notes and documents with intelligent Q&A and a knowledge graph

๐Ÿ’ผ

Workplace Productivity Agent

Automate email handling, meeting notes, and task management

๐ŸŽฏ

Learning Assistant

Personalized learning plans, question generation, and concept explanation

๐Ÿ’ป

Code Assistant

Code review, bug fixing, and documentation generation

๐Ÿš€ Showcase & Share

After completing your capstone, share your work:

  • Share your project link in Issues or Discussions
  • Post on social media to let more people discover your work
  • Follow @reyzowter on X to share updates
๐ŸŒŸ Final words: The Agent field is evolving at an unprecedented pace. What you've learned here is a solid foundation for entering this space. Stay curious, keep up with the latest developments, and be bold in building and sharing your work.

You've grown from an LLM "user" into an Agent "builder"! ๐ŸŽ‰
Community
Extra Chapters
Community-contributed supplementary content covering interviews, tools, and real-world experience
00

Community Capstone Projects

Showcases of outstanding capstone projects from the community

01

Agent Interview Questions

Curated high-frequency interview questions for Agent-related roles

01

Agent Interview Reference Answers

Reference answers and explanations for the interview questions

02

Context Engineering Supplement

In-depth extensions and case studies for Chapter 9

03

Dify Agent Step-by-Step Tutorial

A hand-holding guide to creating a Dify Agent from scratch

04

Hello-Agents Common Questions

Answers to the most common questions from course participants

05

Agent Skills vs. MCP Comparison

Technical deep-dive comparing two tool integration approaches

06

GUI Agent: Introduction & Practice

GUI Agent concepts, principles, and hands-on tutorials

07

Environment Setup Guide

Detailed OpenAI API and Python environment configuration

08

How to Write a Great Skill

Best practices and examples for Agent Skill design

09

Agent Development Lessons & Experience

Real-world lessons and insights from building Code Agents

10

Agent Self-Evolution

Four feedback loops and representative self-evolving Agent projects

11

Web Agent: Introduction & Practice

Web Agent principles, anti-scraping practice, and HelloAgents integration

12

Travel Assistant Post-Training

Training the travel planning demo into a real, usable planner

Community
Contributors
Thank you to everyone who contributed to Hello-Agents

๐ŸŒŸ Core Contributors

C

Chen Sizhou

Project Lead
Datawhale Member
Full text author & editor

S

Sun Tao

Co-Founder
Datawhale Member
CAMEL-AI

J

Jiang Shufan

Co-Founder
Datawhale Member
Exercise design & review

H

Huang Peilin

Datawhale Associate
Agent Dev Engineer
Chapter 5 contributor

Z

Zeng Xinmin

Agent Engineer
Niuke Technology
Chapter 14 case study

Z

Zhu Xinzhong

Advisor
Datawhale Chief Scientist
ZJU Professor

๐Ÿ‘ฅ Extra Chapter Contributors

W

WH

Content Contributor

Z

Zhou Aojie

DW Contributor Team
Xi'an Jiaotong Univ.
Extra02 content

Z

Zhang Chenxu

Independent Developer
Imperial College London
Extra03 content

H

Huang Honghan

DW Contributor Team
Shenzhen University
Extra04 content

W

Wang Dapeng

Datawhale Member
Senior Developer
Extra08 content

Y

You Yihui

Independent Developer
NUIST
Extra09 content

Y

Yin Xin

Independent Developer
Zhejiang University
Extra10 content

P

Pranav J.

Independent Developer
TinyFish
Extra11 content

W

Wang Yufei

Independent Developer
BUPT
Extra12 content

๐Ÿค How to Contribute

We welcome all forms of contribution!

  • ๐Ÿ› Report Bugs โ€” Found a content or code issue? Please open an Issue
  • ๐Ÿ’ก Suggest Ideas โ€” Have a great idea for the project? Start a discussion
  • ๐Ÿ“ Improve Content โ€” Help improve the tutorial; submit your PR
  • โœ๏ธ Share Your Work โ€” Share your notes and projects in the community section