Hello Agents · Build AI Agents from Scratch

Hello Agents

🤖 Build AI Agents from Scratch

From theory to practice — master the design and implementation of AI agent systems.
Covers Agent Principles · Classic Paradigms · Framework Development · Real-World Cases

🌐 English Tutorial 🐦 Follow on X ⭐ Star on GitHub

View on GitHub Follow the Author

✨ What You'll Learn

📖

Completely Free & Open Source

All content is free — grow together with the community.

🔍

Core Principles

Deep dive into Agent concepts, history, and classic paradigms to build a solid foundation.

🏗️

Hands-On Building

Master leading no-code platforms and Agent frameworks in practice.

🛠️

Build Your Own Framework

Build your own Agent framework from scratch using the OpenAI native API.

⚙️

Advanced Skills

Step-by-step: context engineering, memory, protocols, evaluation, and more.

🤝

Model Training

Master Agentic RL — full pipeline from SFT to GRPO for LLM training.

🚀

Real-World Projects

Build a Smart Travel Assistant, Cyber Town, and other comprehensive projects.

💼

Interview Preparation

Study Agent interview questions to boost your career in AI.

📖 Course Navigation

Chapter	Key Content	Status
Part 1: Agent & LLM Foundations
Ch.1 Introduction to Agents	Agent definition, types, paradigms & applications	✅ Done
Ch.2 History of Agents	From symbolic AI to LLM-powered Agents	✅ Done
Ch.3 LLM Fundamentals	Transformer, prompting, major LLMs, and limitations	✅ Done
Part 2: Build Your LLM Agent
Ch.4 Classic Agent Paradigms	Hands-on ReAct, Plan-and-Solve, Reflection	✅ Done
Ch.5 No-Code Platforms	Coze, Dify, n8n and other low-code Agent platforms	✅ Done
Ch.6 Framework Development	AutoGen, AgentScope, LangGraph and more	✅ Done
Ch.7 Build Your Own Framework	Build an Agent framework from scratch	✅ Done
Part 3: Advanced Topics
Ch.8 Memory & Retrieval	Memory systems, RAG, vector storage	✅ Done
Ch.9 Context Engineering	"Context understanding" for continuous interaction	✅ Done
Ch.10 Agent Communication Protocols	MCP, A2A, ANP protocol analysis	✅ Done
Ch.11 Agentic-RL	Full pipeline from SFT to GRPO for LLM training	✅ Done
Ch.12 Agent Evaluation	Core metrics, benchmarks & evaluation frameworks	✅ Done
Part 4: Advanced Case Studies
Ch.13 Smart Travel Assistant	MCP and multi-agent collaboration in practice	✅ Done
Ch.14 Automated Deep Research Agent	DeepResearch Agent reproduction & analysis	✅ Done
Ch.15 Cyber Town Simulation	Combining Agents with games to simulate social dynamics	✅ Done
Part 5: Capstone & Future
Ch.16 Capstone Project	Build your own complete multi-Agent application	✅ Done

💡 How to Learn

Welcome, future AI systems builder! This course balances theory and practice, helping you master the design and development of single-agent to multi-agent systems end-to-end. It is especially suited for AI developers, software engineers, and students with a basic programming background, as well as self-learners with a strong interest in cutting-edge AI.

Before starting, you should have basic Python programming skills and a general understanding of large language models (e.g., knowing how to call an LLM via API). This course focuses on application and building — no deep algorithmic or model-training background is required.

💡 Study tip: Agents are a fast-moving, highly practice-driven field. For the best results, we strongly recommend running, debugging, and even modifying every code snippet provided. All companion code is in the project's code folder.

HELLO AGENTS

Preface

A Note Before We Begin

Project origins, background, and reader guidance

📌 Project Origins

If 2024 was the year of the "model wars," then 2025 is undoubtedly the "Year of Agents." The technical focus is shifting from training larger foundation models to building smarter Agent applications. Yet systematic, practice-oriented tutorials remain scarce.

That's why we launched Hello-Agents — to provide a complete guide for building Agent systems from scratch, balancing theory and hands-on practice.

🎯 What This Course Is

Hello-Agents is a systematic Agent learning curriculum. Current Agent development falls into two main camps: software-engineering Agents (like Dify, Coze, n8n — essentially LLM-backed workflow automation), and truly AI-native Agents driven by genuine AI reasoning.

This course guides you to understand and build the latter — truly AI-native Agents. We'll cut through the surface of frameworks, start from the core principles of Agents, explore their architecture, understand classic paradigms, and ultimately build your own multi-Agent application.

🌟 We believe the best way to learn is by doing. We hope this course becomes your starting point for exploring the world of Agents — transforming you from an LLM "user" into an Agent "builder."

👥 Who This Is For

AI developers, software engineers, and students with basic programming skills
Self-learners with a strong interest in cutting-edge AI
Developers who want to level up from "using LLMs" to "building Agents"
Candidates preparing for Agent-related roles

📋 Prerequisites

Basic Python programming ability
General understanding of LLMs (knowing how to call an LLM via API is enough)
No deep algorithmic or model-training background required

⚠️ Note: This course focuses on application and building, not heavy math or theory. If you need in-depth model training knowledge, we recommend supplementing with other resources.

Part 1 · Agent & LLM Foundations

Chapter 1

Introduction to Agents

Agent definition, types, paradigms, and application scenarios

🤖 What Is an Agent?

An Agent is a computational entity that perceives its environment, makes decisions, and takes actions to achieve specific goals. Unlike traditional software, an Agent has autonomy, reactivity, proactivity, and social ability.

In AI, a modern LLM Agent uses a large language model as its "brain" and can use tools, plan tasks, interact with its environment, and complete complex tasks.

💡 A simple analogy: If the LLM is a highly knowledgeable expert, the Agent is that expert plus a pair of hands, a pair of eyes, and an action plan — it doesn't just answer questions, it actively solves problems.

🏷️ Core Components of an Agent

1. Perception

Agents perceive the state of their environment through various inputs (text, images, sensor data, etc.). LLM Agents typically receive information via user messages, tool outputs, and memory systems.

2. Planning

Agents decompose complex tasks into executable sub-steps and formulate action plans. This is the core capability of LLM Agents and what distinguishes them from plain LLMs.

3. Action

Agents execute concrete operations by calling tools (APIs, code execution, search, etc.) or interacting with other Agents, changing the state of the environment.

4. Memory

Agents maintain short-term memory (current conversation context) and long-term memory (persistent knowledge and experience) to support continuous interaction and learning.

📊 Types of Agents

Simple Reflex Agents: Respond directly to current perception with no internal state
Model-Based Agents: Maintain an internal world model and consider consequences of actions
Goal-Based Agents: Plan and act with a goal in mind
Utility-Based Agents: Weigh different action options via a utility function
Learning Agents: Can learn from experience and improve behavior
Multi-Agent Systems: Multiple Agents work together to solve complex problems

🌟 LLM Agent Application Scenarios

🔍 Information Retrieval & Research: Automatically search, summarize, and analyze large volumes of information
💻 Code Development Assistance: Code generation, debugging, and test automation
📊 Data Analysis: Automated data processing and insight extraction
🗣️ Customer Service: Intelligent support and problem resolution
🎯 Task Automation: Workflow automation and task delegation
🎮 Game NPCs: Game characters with realistic behavior

Python

from openai import OpenAI

client = OpenAI()

def simple_agent(user_query: str) -> str:
    """A minimal LLM Agent example"""
    messages = [
        {"role": "system", "content": "You are an intelligent assistant that can answer questions and perform simple tasks."},
        {"role": "user", "content": user_query}
    ]
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
    
    return response.choices[0].message.content

# Test
result = simple_agent("Analyze the development trends in AI agents")
print(result)

Part 1 · Agent & LLM Foundations

Chapter 2

History of Agents

From symbolic AI to the LLM-powered Agent era

📅 Agent Development Timeline

The history of Agents is an important thread in AI research — from early symbolic AI to today's LLM-centric modern Agents, spanning decades of evolution.

🏛️ Phase 1: Symbolic AI Era (1950s–1980s)

Early AI research was dominated by rule systems and expert systems — hand-crafted logical rules to simulate intelligent behavior.

STRIPS (1971): The first true AI planning system
MYCIN (1974): Medical diagnosis expert system with ~600 rules
Shakey the Robot (1972): An autonomous robot capable of perception and planning

⚠️ Limitations: Symbolic Agents relied heavily on hand-designed rules, were difficult to generalize to new scenarios, and could not handle uncertainty or ambiguous information.

🧠 Phase 2: Machine Learning & Reinforcement Learning (1990s–2010s)

With the rise of machine learning, Agents began learning behavior policies from data instead of relying on pre-written rules.

TD-Gammon (1992): Learned to play backgammon through self-play
Deep Blue (1997): Defeated chess world champion Kasparov
AlphaGo (2016): Defeated the Go world champion — a landmark in deep reinforcement learning

🌐 Phase 3: The LLM Agent Era (2020s–Present)

The GPT series of large language models completely changed the Agent design paradigm. LLMs as powerful "reasoning engines" gave Agents unprecedented general capabilities.

GPT-3 (2020): Demonstrated few-shot learning capabilities of LLMs
ChatGPT (2022): Explosion of conversational AI, the RLHF training paradigm
ReAct (2022): First Agent paradigm combining reasoning and action
AutoGPT (2023): Large-scale popularization of the autonomous Agent concept
OpenAI Assistants API (2023): Official Agent-building platform
Claude MCP (2024): Standardized Agent tool-calling protocol

🔮 Future Trends

From single Agents to multi-Agent collaborative systems
From text input to multimodal perception (vision, audio)
From passive response to proactive planning and long-term goal pursuit
From limited tools to an open tool ecosystem
Agent self-evolution and continuous learning

Part 1 · Agent & LLM Foundations

Chapter 3

LLM Fundamentals

Transformer architecture, prompt engineering, major LLMs, and their limitations

🏗️ Transformer Architecture

Transformer is the core architecture of modern large language models, introduced by Google in the 2017 paper "Attention is All You Need." Its core mechanism is Self-Attention, which allows the model to consider all other tokens in the sequence when processing each token.

Core Components

Self-Attention: Allows every token to attend to every other token, capturing long-range dependencies
Feed-Forward Network: Processes each position independently for feature transformation
Layer Normalization: Stabilizes training and speeds up convergence
Positional Encoding: Injects position information since Transformers have no inherent order

✍️ Prompt Engineering

Prompt engineering is the art of designing inputs to make LLMs produce the desired outputs. For Agent development, mastering prompting is fundamental.

Key Techniques

Zero-Shot: Directly describe the task without examples
Few-Shot: Provide a few examples to guide the model
Chain-of-Thought (CoT): Guide the model to reason step by step, improving complex task performance
System Prompt: Define the Agent's role and behavior guidelines

🤖 Major LLMs

GPT-4o (OpenAI): Currently one of the most capable multimodal models
Claude 3.5 (Anthropic): Excellent reasoning, very long context window
Gemini 1.5 (Google): Multimodal capabilities, supports 1M+ context
Qwen 2.5 (Alibaba): Strong Chinese and code performance, open source
DeepSeek-V3 (DeepSeek): Top open-source model, cost-effective

⚠️ LLM Limitations

Knowledge Cutoff: Cannot access information after the training cutoff date
Hallucination: May generate plausible-sounding but factually incorrect information
Context Window Limits: Can only process a limited amount of text at once
No Execution Capability: Cannot directly perform actions such as searching the web or running code

💡 Why Agents? Agents overcome LLM limitations by providing tools (solving execution), RAG (solving knowledge cutoffs), and memory systems (solving context limits).

Part 2 · Build Your LLM Agent

Chapter 4

Classic Agent Paradigms

Hands-on implementation of ReAct, Plan-and-Solve, and Reflection

⚡ ReAct Paradigm

ReAct (Reasoning + Acting) is one of the most important Agent paradigms. It interleaves reasoning (Thought) and action (Act) in a loop, allowing the Agent to continuously adjust its strategy based on environmental feedback.

ReAct Loop

Thought: LLM analyzes the current situation and decides the next step
Action: Execute a specific tool call or operation
Observation: Receive the result of the action
Repeat the loop until the task is complete

Python - ReAct Agent
from openai import OpenAI
import json

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for real-time information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    }
]

def react_agent(user_query: str) -> str:
    messages = [
        {"role": "system", "content": "You are an intelligent assistant. You can use tools to help users solve problems."},
        {"role": "user", "content": user_query}
    ]
    
    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        
        if message.tool_calls:
            # Handle tool calls
            messages.append(message)
            for tool_call in message.tool_calls:
                result = execute_tool(tool_call)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })
        else:
            return message.content
          

📋 Plan-and-Solve Paradigm

Plan-and-Solve divides tasks into two phases: first create a complete plan, then execute step by step. This approach suits complex tasks requiring long-range planning.

Planning Phase: LLM decomposes the task into an ordered list of sub-tasks
Execution Phase: Execute each sub-task in sequence and collect results
Integration Phase: Aggregate all sub-task results into a final answer

🔁 Reflection Paradigm

Reflection lets an Agent review its own output, identify errors, and improve. This greatly increases the quality of complex task outputs.

Generate: Produce an initial output
Reflect: Evaluate output quality and find issues
Refine: Improve the output based on reflection

💡 Practical tip: Use ReAct for simple tasks, Plan-and-Solve for tasks requiring long-range planning, and Reflection for tasks needing high-quality output. All three paradigms can also be combined.

Part 2 · Build Your LLM Agent

Chapter 5

No-Code Agent Platforms

Explore and use Coze, Dify, n8n, and other leading no-code Agent platforms

🎛️ Why No-Code Platforms?

No-code Agent platforms allow users without deep programming backgrounds to quickly build powerful Agent applications. These platforms use visual interfaces, pre-built components, and templates to dramatically lower the barrier to Agent development.

🔧 Coze

ByteDance's Agent development platform with a rich plugin ecosystem and a simple, intuitive conversation flow design interface.

Plugin Marketplace: Hundreds of pre-built plugins covering search, computation, content generation, and more
Workflow: Visually orchestrate complex multi-step task flows
Knowledge Base: Upload documents to build a private knowledge base
Publish Channels: One-click publish to WeChat, Feishu, Discord, and other platforms

🔧 Dify

An open-source LLM application development platform with private deployment support — a popular choice for enterprise Agent applications.

Application Types: Supports conversational assistants, text generation, Agents, and more
RAG Capabilities: Powerful document processing and retrieval-augmented generation
Workflow Orchestration: Visual node-based workflows with conditional branching and loops
API Integration: Standard API for easy integration with existing systems

🔧 n8n

A powerful workflow automation platform with 400+ service integrations, ideal for building complex automated Agent workflows.

Node Connection: Drag and drop to connect nodes from different services
Triggers: Supports webhooks, scheduled tasks, event-driven, and more
Code Nodes: Insert custom JavaScript code when needed
Self-Hosted: Fully open source — run on your own server

⚖️ Platform Comparison

📊 Selection guide: Rapid prototyping → Coze; Enterprise private deployment → Dify; Complex automated workflows → n8n; Need full customization → Write code and build your own framework (see Ch.7)

Part 2 · Build Your LLM Agent

Chapter 6

Framework Development

AutoGen, AgentScope, LangGraph, and other leading Agent frameworks

🔷 LangChain / LangGraph

LangChain is the most popular LLM application development framework. LangGraph extends it with a State Graph concept — particularly suited for Agent systems with complex control flows.

Python - LangGraph

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List

class AgentState(TypedDict):
    messages: List[dict]
    next_step: str

def create_agent_graph():
    llm = ChatOpenAI(model="gpt-4o")
    graph = StateGraph(AgentState)
    
    # Add nodes
    graph.add_node("reason", reason_node)
    graph.add_node("act", action_node)
    graph.add_node("observe", observe_node)
    
    # Add edges
    graph.add_edge("reason", "act")
    graph.add_edge("act", "observe")
    graph.add_conditional_edges(
        "observe",
        should_continue,
        {"continue": "reason", "end": END}
    )
    
    graph.set_entry_point("reason")
    return graph.compile()

🔷 AutoGen

Microsoft's open-source multi-Agent conversation framework, focused on multi-Agent collaboration. Its core idea is letting multiple Agents with different roles converse with each other to solve complex problems.

ConversableAgent: Base Agent type that can converse with other Agents
AssistantAgent: Assistant Agent with code generation and execution capabilities
UserProxyAgent: Proxy Agent representing the user in conversations
GroupChat: Group conversation supporting multiple Agents

🔷 AgentScope

Alibaba's open-source multi-Agent framework, designed to make multi-Agent application development simpler and more reliable.

Message System: Unified message format based on Msg
Pipeline: Flexible Agent collaboration pipelines
Distributed: Native support for distributed multi-Agent systems

💡 How to Choose a Framework?

Want a mature ecosystem with lots of examples → LangChain / LangGraph
Focus on multi-Agent collaboration → AutoGen
Want to understand Agents from the ground up → Build your own (Ch.7)

Part 2 · Build Your LLM Agent

Chapter 7

Build Your Own Agent Framework

Build a complete Agent framework from scratch — HelloAgents

🎯 Why Build Your Own Framework?

The best way to understand an Agent framework is to implement one yourself. By building from scratch, you'll deeply understand every component's role — rather than just "using the wheel." This chapter walks you through building HelloAgents — a clean Agent framework built on the OpenAI native API.

🏗️ HelloAgents Architecture

Core Components

Agent Core: LLM calls, tool management, message history
Tool System: Standardized tool registration and invocation interface
Memory System: Short-term and long-term memory management
Planner: Task decomposition and execution plan generation

Python - HelloAgents Core

from openai import OpenAI
from typing import Callable, Any
import json

class Tool:
    def __init__(self, func: Callable, name: str, description: str, parameters: dict):
        self.func = func
        self.name = name
        self.description = description
        self.parameters = parameters
    
    def to_openai_schema(self) -> dict:
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters
            }
        }
    
    def __call__(self, **kwargs) -> Any:
        return self.func(**kwargs)

class HelloAgent:
    def __init__(self, model: str = "gpt-4o", system_prompt: str = ""):
        self.client = OpenAI()
        self.model = model
        self.system_prompt = system_prompt
        self.tools: dict[str, Tool] = {}
        self.messages: list[dict] = []
        
        if system_prompt:
            self.messages.append({"role": "system", "content": system_prompt})
    
    def register_tool(self, tool: Tool):
        """Register a tool"""
        self.tools[tool.name] = tool
    
    def run(self, user_input: str) -> str:
        """Run the Agent"""
        self.messages.append({"role": "user", "content": user_input})
        
        while True:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                tools=[t.to_openai_schema() for t in self.tools.values()] or None,
                tool_choice="auto" if self.tools else None
            )
            
            assistant_msg = response.choices[0].message
            self.messages.append(assistant_msg)
            
            if not assistant_msg.tool_calls:
                return assistant_msg.content
            
            # Execute tool calls
            for tool_call in assistant_msg.tool_calls:
                tool_name = tool_call.function.name
                tool_args = json.loads(tool_call.function.arguments)
                
                if tool_name in self.tools:
                    result = self.tools[tool_name](**tool_args)
                    self.messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": str(result)
                    })

🔧 Using HelloAgents

Python - Example Usage

agent = HelloAgent(
    model="gpt-4o",
    system_prompt="You are a professional data analysis assistant"
)

# Register a tool
def get_weather(city: str) -> str:
    return f"It's sunny in {city} today, 25°C"

weather_tool = Tool(
    func=get_weather,
    name="get_weather",
    description="Get the current weather for a city",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
    }
)

agent.register_tool(weather_tool)
result = agent.run("What's the weather like in New York today?")
print(result)

💡 Full code and more examples: github.com/jjyaoao/helloagents

Part 3 · Advanced Topics

Chapter 8

Memory & Retrieval

Memory system design, RAG principles and practice, vector storage

🧠 Agent Memory Systems

Memory is the key to enabling Agents to learn continuously and accumulate knowledge across sessions. Drawing on human memory, Agent memory can be categorized as follows:

Memory Types

Sensory Memory: Brief retention of raw input — e.g., the current token stream
Working Memory (Short-Term): Current conversation context — the messages list
Long-Term Memory: Persistent cross-session knowledge stored in databases or vector stores
Procedural Memory: Skills and behavior patterns, implemented via few-shot examples or fine-tuning

🔍 Retrieval-Augmented Generation (RAG)

RAG is the core technology for overcoming LLM knowledge limitations — it retrieves relevant external knowledge at generation time to improve answer quality.

Basic RAG Pipeline

Document Processing: Split documents into appropriately sized chunks
Vectorization: Use an embedding model to convert chunks into vectors
Storage: Store vectors in a vector database (Chroma, Pinecone, Faiss, etc.)
Retrieval: Vectorize the user question and retrieve the most relevant chunks
Generation: Inject retrieved results into the prompt; LLM generates an answer accordingly

Python - Simple RAG Implementation

from openai import OpenAI
import chromadb

client = OpenAI()
chroma = chromadb.Client()
collection = chroma.create_collection("knowledge_base")

def add_documents(docs: list[str]):
    """Add documents to the knowledge base"""
    embeddings = client.embeddings.create(
        model="text-embedding-3-small",
        input=docs
    ).data
    
    collection.add(
        documents=docs,
        embeddings=[e.embedding for e in embeddings],
        ids=[f"doc_{i}" for i in range(len(docs))]
    )

def rag_query(question: str, top_k: int = 3) -> str:
    """RAG question answering"""
    # Retrieve relevant documents
    q_embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=[question]
    ).data[0].embedding
    
    results = collection.query(
        query_embeddings=[q_embedding],
        n_results=top_k
    )
    
    context = "\n".join(results['documents'][0])
    
    # Generate answer
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer the question based on the following context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

📈 Advanced RAG Techniques

Hybrid Retrieval: Combine vector retrieval with BM25 keyword retrieval
Reranking: Apply a second-pass ranking to retrieved results
Multi-Hop Retrieval: Solve complex reasoning problems through multiple retrieval rounds
Adaptive Retrieval: Dynamically adjust retrieval strategy based on question type

Part 3 · Advanced Topics

Chapter 9

Context Engineering

How to efficiently manage an LLM's context window for continuous interaction

📐 What Is Context Engineering?

Context Engineering is the systematic design, management, and optimization of the context fed to an LLM to maximize output quality within a limited context window.

💡 Andrej Karpathy's definition: "Context engineering is the subtle art and science of filling the context window with just the right information at just the right time."

🗂️ What Makes Up the Context Window?

System Prompt: Defines the Agent's role, capabilities, and behavior guidelines
Tool Definitions: Tells the LLM which tools are available and their parameters
Conversation History: Records the user–Agent interaction history
External Knowledge: Relevant documents retrieved from the RAG system
Tool Outputs: Results from previous tool calls
User Input: The current user message

⚙️ Core Techniques

1. Conversation Compression

When conversation history grows too long and exceeds the context window, compression is needed:

Sliding Window: Keep only the most recent N turns of dialogue
Summary Compression: Use an LLM to generate a summary of the conversation history
Importance Filtering: Retain key information based on importance scores

2. Dynamic Context Injection

Dynamically decide which context to inject based on the current question:

Retrieve relevant memories based on user intent
Select relevant tools based on task type
Adjust the system prompt based on the conversation stage

3. KV Cache Optimization

By leveraging the LLM's KV Cache mechanism — placing unchanged system prompts and tool definitions at the very beginning of the context — you can significantly reduce API call costs.

💡 Best practice: When designing context, always place the most important information at the very beginning (system prompt) and the very end (latest user input). LLMs pay the most attention to content at these positions.

Part 3 · Advanced Topics

Chapter 10

Agent Communication Protocols

Deep analysis and practice of MCP, A2A, ANP, and other protocols

🌐 Why Do We Need Agent Communication Protocols?

As Agent systems grow more complex, communication between different Agents and between Agents and tools becomes a key challenge. Communication protocols solve interoperability — allowing Agents and tools built by different developers to collaborate seamlessly.

🔌 MCP (Model Context Protocol)

An open standard protocol proposed by Anthropic that defines how LLM applications communicate with external tools and data sources.

Core Concepts: Server (tool provider) and Client (AI application)
Transport Layer: Supports stdio and HTTP/SSE transport
Capability Types: Tools, Resources, Prompt Templates
Ecosystem: Hundreds of official and community MCP Servers already available

Python - MCP Server Example

from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import mcp.types as types

app = Server("my-mcp-server")

@app.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="get_stock_price",
            description="Get the real-time price of a stock",
            inputSchema={
                "type": "object",
                "properties": {
                    "symbol": {"type": "string", "description": "Stock ticker symbol"}
                },
                "required": ["symbol"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    if name == "get_stock_price":
        symbol = arguments["symbol"]
        # Call the stock API
        price = fetch_stock_price(symbol)
        return [TextContent(type="text", text=f"{symbol} current price: {price}")]

async def main():
    async with stdio_server() as streams:
        await app.run(*streams)

🤝 A2A (Agent-to-Agent) Protocol

A protocol proposed by Google for inter-Agent communication, focused on interoperability between different AI Agent systems. A2A defines standard ways for Agents to discover each other, declare capabilities, and delegate tasks.

🔗 ANP (Agent Network Protocol)

An Agent communication protocol for open networks, enabling decentralized discovery and communication between Agents on the open internet.

📌 Protocol selection guide: Tool integration → MCP; Multi-Agent enterprise systems → A2A; Open-network Agent ecosystems → ANP

Part 3 · Advanced Topics

Chapter 11

Agentic-RL

Full pipeline from SFT to GRPO for hands-on LLM training

🎓 Why Train Agents?

General-purpose LLMs often perform poorly on specific Agent tasks. Targeted training can teach models to use tools more effectively, plan, and execute complex tasks.

📚 Training Paradigm Overview

1. Supervised Fine-Tuning (SFT)

Supervisedly train a model on high-quality Agent trajectory data to learn correct behavior patterns.

Data Format: (task, tool-call sequence, final result) triples
Advantages: Stable training, high data efficiency
Disadvantages: Requires large amounts of high-quality annotated data; hard to exceed expert demonstrations

2. RLHF (Reinforcement Learning from Human Feedback)

Optimize model behavior through human ratings of model outputs — the core training technique behind ChatGPT.

3. GRPO (Group Relative Policy Optimization)

An efficient RL algorithm proposed by DeepSeek — the key technology behind DeepSeek-R1's success.

Sample multiple outputs for the same question; score them relative to each other within the group
No separate Critic model needed — significantly reduces compute cost
Especially suited for tasks with clear correct answers, like math and code

Python - GRPO Reward Function Example

def compute_reward(completions: list[str], ground_truth: str) -> list[float]:
    """
    GRPO reward function example
    Compute rewards for multiple model outputs on the same question
    """
    rewards = []
    for completion in completions:
        # Format reward: does the output contain a chain of thought?
        format_reward = 1.0 if "<think>" in completion else 0.0
        
        # Correctness reward: is the final answer correct?
        if extract_answer(completion) == ground_truth:
            correctness_reward = 1.0
        else:
            correctness_reward = 0.0
        
        rewards.append(0.3 * format_reward + 0.7 * correctness_reward)
    
    return rewards

🛠️ Agent Training Best Practices

Start with small models and small datasets to verify your pipeline is correct
Reward function design is critical — think carefully about it
Use tools like Weights & Biases to monitor training
Regularly evaluate on a test set to prevent overfitting

Part 3 · Advanced Topics

Chapter 12

Agent Evaluation

Core metrics, benchmarks & evaluation frameworks explained

📏 Why Does Evaluation Matter?

No improvement without evaluation. In Agent systems, evaluation is especially difficult — Agent output is often a multi-step trajectory, not a single answer, and many tasks have no unique correct answer.

📊 Core Evaluation Dimensions

Task Success Rate: Proportion of tasks the Agent completes successfully
Step Efficiency: Average number of steps required to complete a task
Tool Accuracy: Proportion of correct tool selections and calls
Hallucination Rate: Frequency of fabricated information from the model
Cost Efficiency: Token consumption and API cost per task
Latency: Average time to complete a task

🏆 Leading Agent Benchmarks

GAIA: Tests Agent general capability on real-world tasks
WebArena: Evaluates Agents' ability to operate web interfaces
SWE-bench: Evaluates Agents solving real software engineering problems
HotPotQA: Multi-hop question-answering and reasoning evaluation
AgentBench: Comprehensive multi-task Agent evaluation

🔧 Evaluation Framework in Practice

Python - Simple Evaluation Framework

from dataclasses import dataclass
from typing import Callable

@dataclass
class EvalResult:
    task_id: str
    success: bool
    steps: int
    tokens_used: int
    latency_ms: float
    error: str = None

class AgentEvaluator:
    def __init__(self, agent, test_cases: list[dict]):
        self.agent = agent
        self.test_cases = test_cases
    
    def run(self) -> list[EvalResult]:
        results = []
        for case in self.test_cases:
            import time
            start = time.time()
            try:
                output = self.agent.run(case["input"])
                success = case["judge"](output)
                results.append(EvalResult(
                    task_id=case["id"],
                    success=success,
                    steps=self.agent.step_count,
                    tokens_used=self.agent.token_count,
                    latency_ms=(time.time()-start)*1000
                ))
            except Exception as e:
                results.append(EvalResult(
                    task_id=case["id"],
                    success=False,
                    steps=0, tokens_used=0,
                    latency_ms=(time.time()-start)*1000,
                    error=str(e)
                ))
        return results
    
    def summarize(self, results: list[EvalResult]) -> dict:
        success_rate = sum(r.success for r in results) / len(results)
        avg_steps = sum(r.steps for r in results) / len(results)
        return {
            "success_rate": f"{success_rate:.1%}",
            "avg_steps": f"{avg_steps:.1f}",
            "total_tokens": sum(r.tokens_used for r in results)
        }

💡 Evaluation tip: Build a fixed evaluation set and run it with every iteration — that's the only way to track real progress. Don't just look at success rate; step efficiency and cost matter equally.

Part 4 · Case Studies

Chapter 13

Smart Travel Assistant

Hands-on intelligent travel planning system using MCP and multi-Agent collaboration

🌍 Project Overview

The Smart Travel Assistant is a hands-on project that applies MCP tool protocols and multi-Agent collaboration. The system automatically plans itineraries, queries flights and hotels, and generates travel guides based on user needs.

🏗️ System Architecture

Master Agent: Understands user intent and coordinates sub-Agents
Flight Search Agent: Connects to flight search APIs via MCP
Hotel Recommendation Agent: Recommends accommodations based on preferences and budget
Attraction Planning Agent: Retrieves destination attraction information and recommended routes
Budget Calculation Agent: Aggregates and optimizes total travel costs

🔧 Core MCP Tools

search_flights(origin, dest, date) — Search available flights
get_hotels(city, checkin, checkout, budget) — Query hotels
get_attractions(city, interests) — Get attraction recommendations
get_weather_forecast(city, dates) — Query weather forecast
calculate_budget(items) — Calculate travel budget

Python - Travel Assistant Master Workflow

async def travel_agent_workflow(user_request: str) -> str:
    """
    Travel assistant master workflow
    """
    # Step 1: Parse user intent
    parsed = await intent_parser.parse(user_request)
    # {destination: "Tokyo", dates: ["2025-03-15", "2025-03-22"], budget: 2000}
    
    # Step 2: Parallel queries (improve efficiency)
    flights, hotels, weather = await asyncio.gather(
        flight_agent.search(parsed["origin"], parsed["destination"], parsed["dates"]),
        hotel_agent.search(parsed["destination"], parsed["dates"], parsed["budget"]),
        weather_agent.forecast(parsed["destination"], parsed["dates"])
    )
    
    # Step 3: Attraction planning
    attractions = await attraction_agent.plan(
        city=parsed["destination"],
        duration=7,
        interests=parsed.get("interests", [])
    )
    
    # Step 4: Generate comprehensive travel plan
    plan = await planner_agent.generate(
        flights=flights,
        hotels=hotels,
        attractions=attractions,
        weather=weather,
        budget=parsed["budget"]
    )
    
    return plan.to_markdown()

🗺️ Result: The user inputs "Plan a 7-day trip to Tokyo next week with a $2000 budget" and the system generates a complete travel plan — including flight recommendations, hotel choices, a daily itinerary, and a cost breakdown — in under 30 seconds.

Part 4 · Case Studies

Chapter 14

Automated Deep Research Agent

DeepResearch Agent reproduction & analysis — making AI work like a researcher

🔬 What Is a DeepResearch Agent?

DeepResearch is a deep research feature from OpenAI that can autonomously perform multi-round web searches, read literature, integrate information, and finally produce professional-grade research reports. This chapter reproduces its core logic.

🔁 Core Workflow

Problem Analysis: Decompose complex research questions into searchable sub-questions
Iterative Search: Multiple rounds of search, each refining the strategy based on the previous round
Content Extraction: Extract key information from search results
Knowledge Integration: Integrate information from multiple sources into a coherent knowledge base
Report Generation: Generate a structured research report from the integrated knowledge

Python - DeepResearch Core Loop

class DeepResearchAgent:
    def __init__(self, max_iterations: int = 5):
        self.max_iterations = max_iterations
        self.knowledge_base = []
    
    async def research(self, topic: str) -> str:
        # 1. Generate initial search plan
        search_plan = await self.generate_search_plan(topic)
        
        for iteration in range(self.max_iterations):
            # 2. Execute searches
            search_results = await asyncio.gather(*[
                self.web_search(query) 
                for query in search_plan.queries
            ])
            
            # 3. Extract and validate information
            extracted = await self.extract_information(search_results)
            self.knowledge_base.extend(extracted)
            
            # 4. Assess knowledge completeness
            assessment = await self.assess_knowledge_gaps(
                topic=topic,
                knowledge=self.knowledge_base
            )
            
            if assessment.is_sufficient:
                break
            
            # 5. Generate next round search plan
            search_plan = await self.generate_next_plan(assessment.gaps)
        
        # 6. Generate final report
        return await self.generate_report(topic, self.knowledge_base)

💡 Key Technical Points

Problem Decomposition: Use tree structures to break complex questions into sub-problems
Query Optimization: Dynamically adjust search queries based on existing knowledge
Deduplication: Identify and merge duplicate information from different sources
Citation Management: Maintain source citations to ensure reports are traceable

Part 4 · Case Studies

Chapter 15

Cyber Town Simulation

Combining Agents with games to simulate real social dynamics and group behavior

🏙️ What Is Cyber Town?

Cyber Town is a multi-Agent social simulation project inspired by Stanford's "Smallville" paper. In a virtual town, multiple Agents with independent personalities, memories, and goals live together — producing emergent social behavior.

🎮 System Architecture

Game World

Map System: Contains locations like homes, cafes, libraries, and parks
Time System: Simulates 24 hours of a day
Interaction System: Agents can interact with objects and other Agents

Agent Design

Persona: Each Agent has a unique personality, profession, and interests
Memory System: Agents remember past experiences and conversations
Planning System: Agents create daily plans based on their goals
Social System: Agents can build relationships and hold conversations

Python - Town Resident Agent

class TownResident:
    def __init__(self, name: str, persona: str):
        self.name = name
        self.persona = persona
        self.memories = MemoryStream()
        self.location = "home"
        self.schedule = []
    
    async def perceive(self, environment: dict) -> list[str]:
        """Perceive the current environment"""
        observations = []
        for entity in environment.get("entities", []):
            if self.is_relevant(entity):
                observations.append(f"I see {entity['description']}")
                await self.memories.add(entity)
        return observations
    
    async def plan_day(self) -> list[dict]:
        """Plan the day's schedule"""
        context = f"You are {self.name}. {self.persona}\n"
        context += f"Today's memories: {await self.memories.get_recent(5)}\n"
        
        schedule = await llm.generate_schedule(
            context=context,
            current_time="08:00",
            available_locations=TOWN_LOCATIONS
        )
        self.schedule = schedule
        return schedule
    
    async def react(self, observation: str) -> str:
        """React to an observed event"""
        relevant_memories = await self.memories.retrieve(observation)
        
        response = await llm.chat(
            system=f"You are {self.name}. {self.persona}",
            context=relevant_memories,
            user=observation
        )
        
        await self.memories.add({
            "content": f"I said: {response}",
            "importance": 7
        })
        return response

🌟 Emergent Behavior Example: After running for 48 hours, town residents spontaneously organized a "party" — one Agent decided to host it, the invitation spread through the social network, and other Agents decided whether to attend based on their own plans. This behavior was not pre-programmed; it was a natural result of multi-Agent interaction.

Part 5 · Capstone & Future

Chapter 16

Capstone Project

Build your own complete multi-Agent application and demonstrate your learning

🎓 Capstone Goals

Congratulations on completing all of Hello-Agents! The capstone project is your chance to apply everything you've learned to build a complete multi-Agent application. Through this project, you will:

Apply Agent architecture, tool calling, memory systems, and multi-Agent collaboration
Experience the full Agent application development lifecycle
Build real project experience to enrich your portfolio

📋 Project Requirements

Core Requirements (Must Complete)

✅ Implement at least one core Agent capability (tool calling, planning, etc.)
✅ Integrate at least 3 different types of tools
✅ Implement basic memory / context management
✅ Provide clear code documentation and a README

Advanced Requirements (Optional)

⭐ Implement multi-Agent collaboration
⭐ Integrate a RAG knowledge base
⭐ Deploy to the cloud with a public URL
⭐ Build a web interface

💡 Topic Ideas

📚

Personal Knowledge Assistant

Manage personal notes and documents with intelligent Q&A and a knowledge graph

💼

Workplace Productivity Agent

Automate email handling, meeting notes, and task management

🎯

Learning Assistant

Personalized learning plans, question generation, and concept explanation

💻

Code Assistant

Code review, bug fixing, and documentation generation

🚀 Showcase & Share

After completing your capstone, share your work:

Share your project link in Issues or Discussions
Post on social media to let more people discover your work
Follow @reyzowter on X to share updates

🌟 Final words: The Agent field is evolving at an unprecedented pace. What you've learned here is a solid foundation for entering this space. Stay curious, keep up with the latest developments, and be bold in building and sharing your work.

You've grown from an LLM "user" into an Agent "builder"! 🎉

Community

Extra Chapters

Community-contributed supplementary content covering interviews, tools, and real-world experience

00

Community Capstone Projects

Showcases of outstanding capstone projects from the community

01

Agent Interview Questions

Curated high-frequency interview questions for Agent-related roles

01

Agent Interview Reference Answers

Reference answers and explanations for the interview questions

02

Context Engineering Supplement

In-depth extensions and case studies for Chapter 9

03

Dify Agent Step-by-Step Tutorial

A hand-holding guide to creating a Dify Agent from scratch

04

Hello-Agents Common Questions

Answers to the most common questions from course participants

05

Agent Skills vs. MCP Comparison

Technical deep-dive comparing two tool integration approaches

06

GUI Agent: Introduction & Practice

GUI Agent concepts, principles, and hands-on tutorials

07

Environment Setup Guide

Detailed OpenAI API and Python environment configuration

08

How to Write a Great Skill

Best practices and examples for Agent Skill design

09

Agent Development Lessons & Experience

Real-world lessons and insights from building Code Agents

10

Agent Self-Evolution

Four feedback loops and representative self-evolving Agent projects

11

Web Agent: Introduction & Practice

Web Agent principles, anti-scraping practice, and HelloAgents integration

12

Travel Assistant Post-Training

Training the travel planning demo into a real, usable planner

Community

Contributors

Thank you to everyone who contributed to Hello-Agents

🌟 Core Contributors

C

Chen Sizhou

Project Lead
Datawhale Member
Full text author & editor

S

Sun Tao

Co-Founder
Datawhale Member
CAMEL-AI

J

Jiang Shufan

Co-Founder
Datawhale Member
Exercise design & review

H

Huang Peilin

Datawhale Associate
Agent Dev Engineer
Chapter 5 contributor

Z

Zeng Xinmin

Agent Engineer
Niuke Technology
Chapter 14 case study

Z

Zhu Xinzhong

Advisor
Datawhale Chief Scientist
ZJU Professor

👥 Extra Chapter Contributors

W

WH

Content Contributor

Z

Zhou Aojie

DW Contributor Team
Xi'an Jiaotong Univ.
Extra02 content

Z

Zhang Chenxu

Independent Developer
Imperial College London
Extra03 content

H

Huang Honghan

DW Contributor Team
Shenzhen University
Extra04 content

W

Wang Dapeng

Datawhale Member
Senior Developer
Extra08 content

Y

You Yihui

Independent Developer
NUIST
Extra09 content

Y

Yin Xin

Independent Developer
Zhejiang University
Extra10 content

P

Pranav J.

Independent Developer
TinyFish
Extra11 content

W

Wang Yufei

Independent Developer
BUPT
Extra12 content

🤝 How to Contribute

We welcome all forms of contribution!

🐛 Report Bugs — Found a content or code issue? Please open an Issue
💡 Suggest Ideas — Have a great idea for the project? Start a discussion
📝 Improve Content — Help improve the tutorial; submit your PR
✍️ Share Your Work — Share your notes and projects in the community section

Follow @reyzowter on X