Covers Agent Principles ยท Classic Paradigms ยท Framework Development ยท Real-World Cases
โจ What You'll Learn
Completely Free & Open Source
All content is free โ grow together with the community.
Core Principles
Deep dive into Agent concepts, history, and classic paradigms to build a solid foundation.
Hands-On Building
Master leading no-code platforms and Agent frameworks in practice.
Build Your Own Framework
Build your own Agent framework from scratch using the OpenAI native API.
Advanced Skills
Step-by-step: context engineering, memory, protocols, evaluation, and more.
Model Training
Master Agentic RL โ full pipeline from SFT to GRPO for LLM training.
Real-World Projects
Build a Smart Travel Assistant, Cyber Town, and other comprehensive projects.
Interview Preparation
Study Agent interview questions to boost your career in AI.
๐ Course Navigation
๐ก How to Learn
Welcome, future AI systems builder! This course balances theory and practice, helping you master the design and development of single-agent to multi-agent systems end-to-end. It is especially suited for AI developers, software engineers, and students with a basic programming background, as well as self-learners with a strong interest in cutting-edge AI.
Before starting, you should have basic Python programming skills and a general understanding of large language models (e.g., knowing how to call an LLM via API). This course focuses on application and building โ no deep algorithmic or model-training background is required.
code folder.
๐ Project Origins
If 2024 was the year of the "model wars," then 2025 is undoubtedly the "Year of Agents." The technical focus is shifting from training larger foundation models to building smarter Agent applications. Yet systematic, practice-oriented tutorials remain scarce.
That's why we launched Hello-Agents โ to provide a complete guide for building Agent systems from scratch, balancing theory and hands-on practice.
๐ฏ What This Course Is
Hello-Agents is a systematic Agent learning curriculum. Current Agent development falls into two main camps: software-engineering Agents (like Dify, Coze, n8n โ essentially LLM-backed workflow automation), and truly AI-native Agents driven by genuine AI reasoning.
This course guides you to understand and build the latter โ truly AI-native Agents. We'll cut through the surface of frameworks, start from the core principles of Agents, explore their architecture, understand classic paradigms, and ultimately build your own multi-Agent application.
๐ฅ Who This Is For
- AI developers, software engineers, and students with basic programming skills
- Self-learners with a strong interest in cutting-edge AI
- Developers who want to level up from "using LLMs" to "building Agents"
- Candidates preparing for Agent-related roles
๐ Prerequisites
- Basic Python programming ability
- General understanding of LLMs (knowing how to call an LLM via API is enough)
- No deep algorithmic or model-training background required
๐ค What Is an Agent?
An Agent is a computational entity that perceives its environment, makes decisions, and takes actions to achieve specific goals. Unlike traditional software, an Agent has autonomy, reactivity, proactivity, and social ability.
In AI, a modern LLM Agent uses a large language model as its "brain" and can use tools, plan tasks, interact with its environment, and complete complex tasks.
๐ท๏ธ Core Components of an Agent
1. Perception
Agents perceive the state of their environment through various inputs (text, images, sensor data, etc.). LLM Agents typically receive information via user messages, tool outputs, and memory systems.
2. Planning
Agents decompose complex tasks into executable sub-steps and formulate action plans. This is the core capability of LLM Agents and what distinguishes them from plain LLMs.
3. Action
Agents execute concrete operations by calling tools (APIs, code execution, search, etc.) or interacting with other Agents, changing the state of the environment.
4. Memory
Agents maintain short-term memory (current conversation context) and long-term memory (persistent knowledge and experience) to support continuous interaction and learning.
๐ Types of Agents
- Simple Reflex Agents: Respond directly to current perception with no internal state
- Model-Based Agents: Maintain an internal world model and consider consequences of actions
- Goal-Based Agents: Plan and act with a goal in mind
- Utility-Based Agents: Weigh different action options via a utility function
- Learning Agents: Can learn from experience and improve behavior
- Multi-Agent Systems: Multiple Agents work together to solve complex problems
๐ LLM Agent Application Scenarios
- ๐ Information Retrieval & Research: Automatically search, summarize, and analyze large volumes of information
- ๐ป Code Development Assistance: Code generation, debugging, and test automation
- ๐ Data Analysis: Automated data processing and insight extraction
- ๐ฃ๏ธ Customer Service: Intelligent support and problem resolution
- ๐ฏ Task Automation: Workflow automation and task delegation
- ๐ฎ Game NPCs: Game characters with realistic behavior
from openai import OpenAI
client = OpenAI()
def simple_agent(user_query: str) -> str:
"""A minimal LLM Agent example"""
messages = [
{"role": "system", "content": "You are an intelligent assistant that can answer questions and perform simple tasks."},
{"role": "user", "content": user_query}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return response.choices[0].message.content
# Test
result = simple_agent("Analyze the development trends in AI agents")
print(result)
๐ Agent Development Timeline
The history of Agents is an important thread in AI research โ from early symbolic AI to today's LLM-centric modern Agents, spanning decades of evolution.
๐๏ธ Phase 1: Symbolic AI Era (1950sโ1980s)
Early AI research was dominated by rule systems and expert systems โ hand-crafted logical rules to simulate intelligent behavior.
- STRIPS (1971): The first true AI planning system
- MYCIN (1974): Medical diagnosis expert system with ~600 rules
- Shakey the Robot (1972): An autonomous robot capable of perception and planning
๐ง Phase 2: Machine Learning & Reinforcement Learning (1990sโ2010s)
With the rise of machine learning, Agents began learning behavior policies from data instead of relying on pre-written rules.
- TD-Gammon (1992): Learned to play backgammon through self-play
- Deep Blue (1997): Defeated chess world champion Kasparov
- AlphaGo (2016): Defeated the Go world champion โ a landmark in deep reinforcement learning
๐ Phase 3: The LLM Agent Era (2020sโPresent)
The GPT series of large language models completely changed the Agent design paradigm. LLMs as powerful "reasoning engines" gave Agents unprecedented general capabilities.
- GPT-3 (2020): Demonstrated few-shot learning capabilities of LLMs
- ChatGPT (2022): Explosion of conversational AI, the RLHF training paradigm
- ReAct (2022): First Agent paradigm combining reasoning and action
- AutoGPT (2023): Large-scale popularization of the autonomous Agent concept
- OpenAI Assistants API (2023): Official Agent-building platform
- Claude MCP (2024): Standardized Agent tool-calling protocol
๐ฎ Future Trends
- From single Agents to multi-Agent collaborative systems
- From text input to multimodal perception (vision, audio)
- From passive response to proactive planning and long-term goal pursuit
- From limited tools to an open tool ecosystem
- Agent self-evolution and continuous learning
๐๏ธ Transformer Architecture
Transformer is the core architecture of modern large language models, introduced by Google in the 2017 paper "Attention is All You Need." Its core mechanism is Self-Attention, which allows the model to consider all other tokens in the sequence when processing each token.
Core Components
- Self-Attention: Allows every token to attend to every other token, capturing long-range dependencies
- Feed-Forward Network: Processes each position independently for feature transformation
- Layer Normalization: Stabilizes training and speeds up convergence
- Positional Encoding: Injects position information since Transformers have no inherent order
โ๏ธ Prompt Engineering
Prompt engineering is the art of designing inputs to make LLMs produce the desired outputs. For Agent development, mastering prompting is fundamental.
Key Techniques
- Zero-Shot: Directly describe the task without examples
- Few-Shot: Provide a few examples to guide the model
- Chain-of-Thought (CoT): Guide the model to reason step by step, improving complex task performance
- System Prompt: Define the Agent's role and behavior guidelines
๐ค Major LLMs
- GPT-4o (OpenAI): Currently one of the most capable multimodal models
- Claude 3.5 (Anthropic): Excellent reasoning, very long context window
- Gemini 1.5 (Google): Multimodal capabilities, supports 1M+ context
- Qwen 2.5 (Alibaba): Strong Chinese and code performance, open source
- DeepSeek-V3 (DeepSeek): Top open-source model, cost-effective
โ ๏ธ LLM Limitations
- Knowledge Cutoff: Cannot access information after the training cutoff date
- Hallucination: May generate plausible-sounding but factually incorrect information
- Context Window Limits: Can only process a limited amount of text at once
- No Execution Capability: Cannot directly perform actions such as searching the web or running code
โก ReAct Paradigm
ReAct (Reasoning + Acting) is one of the most important Agent paradigms. It interleaves reasoning (Thought) and action (Act) in a loop, allowing the Agent to continuously adjust its strategy based on environmental feedback.
ReAct Loop
- Thought: LLM analyzes the current situation and decides the next step
- Action: Execute a specific tool call or operation
- Observation: Receive the result of the action
- Repeat the loop until the task is complete
from openai import OpenAI
import json
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for real-time information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}
]
def react_agent(user_query: str) -> str:
messages = [
{"role": "system", "content": "You are an intelligent assistant. You can use tools to help users solve problems."},
{"role": "user", "content": user_query}
]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
if message.tool_calls:
# Handle tool calls
messages.append(message)
for tool_call in message.tool_calls:
result = execute_tool(tool_call)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
else:
return message.content
๐ Plan-and-Solve Paradigm
Plan-and-Solve divides tasks into two phases: first create a complete plan, then execute step by step. This approach suits complex tasks requiring long-range planning.
- Planning Phase: LLM decomposes the task into an ordered list of sub-tasks
- Execution Phase: Execute each sub-task in sequence and collect results
- Integration Phase: Aggregate all sub-task results into a final answer
๐ Reflection Paradigm
Reflection lets an Agent review its own output, identify errors, and improve. This greatly increases the quality of complex task outputs.
- Generate: Produce an initial output
- Reflect: Evaluate output quality and find issues
- Refine: Improve the output based on reflection
๐๏ธ Why No-Code Platforms?
No-code Agent platforms allow users without deep programming backgrounds to quickly build powerful Agent applications. These platforms use visual interfaces, pre-built components, and templates to dramatically lower the barrier to Agent development.
๐ง Coze
ByteDance's Agent development platform with a rich plugin ecosystem and a simple, intuitive conversation flow design interface.
- Plugin Marketplace: Hundreds of pre-built plugins covering search, computation, content generation, and more
- Workflow: Visually orchestrate complex multi-step task flows
- Knowledge Base: Upload documents to build a private knowledge base
- Publish Channels: One-click publish to WeChat, Feishu, Discord, and other platforms
๐ง Dify
An open-source LLM application development platform with private deployment support โ a popular choice for enterprise Agent applications.
- Application Types: Supports conversational assistants, text generation, Agents, and more
- RAG Capabilities: Powerful document processing and retrieval-augmented generation
- Workflow Orchestration: Visual node-based workflows with conditional branching and loops
- API Integration: Standard API for easy integration with existing systems
๐ง n8n
A powerful workflow automation platform with 400+ service integrations, ideal for building complex automated Agent workflows.
- Node Connection: Drag and drop to connect nodes from different services
- Triggers: Supports webhooks, scheduled tasks, event-driven, and more
- Code Nodes: Insert custom JavaScript code when needed
- Self-Hosted: Fully open source โ run on your own server
โ๏ธ Platform Comparison
๐ท LangChain / LangGraph
LangChain is the most popular LLM application development framework. LangGraph extends it with a State Graph concept โ particularly suited for Agent systems with complex control flows.
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List
class AgentState(TypedDict):
messages: List[dict]
next_step: str
def create_agent_graph():
llm = ChatOpenAI(model="gpt-4o")
graph = StateGraph(AgentState)
# Add nodes
graph.add_node("reason", reason_node)
graph.add_node("act", action_node)
graph.add_node("observe", observe_node)
# Add edges
graph.add_edge("reason", "act")
graph.add_edge("act", "observe")
graph.add_conditional_edges(
"observe",
should_continue,
{"continue": "reason", "end": END}
)
graph.set_entry_point("reason")
return graph.compile()
๐ท AutoGen
Microsoft's open-source multi-Agent conversation framework, focused on multi-Agent collaboration. Its core idea is letting multiple Agents with different roles converse with each other to solve complex problems.
- ConversableAgent: Base Agent type that can converse with other Agents
- AssistantAgent: Assistant Agent with code generation and execution capabilities
- UserProxyAgent: Proxy Agent representing the user in conversations
- GroupChat: Group conversation supporting multiple Agents
๐ท AgentScope
Alibaba's open-source multi-Agent framework, designed to make multi-Agent application development simpler and more reliable.
- Message System: Unified message format based on Msg
- Pipeline: Flexible Agent collaboration pipelines
- Distributed: Native support for distributed multi-Agent systems
๐ก How to Choose a Framework?
- Want a mature ecosystem with lots of examples โ LangChain / LangGraph
- Focus on multi-Agent collaboration โ AutoGen
- Want to understand Agents from the ground up โ Build your own (Ch.7)
๐ฏ Why Build Your Own Framework?
The best way to understand an Agent framework is to implement one yourself. By building from scratch, you'll deeply understand every component's role โ rather than just "using the wheel." This chapter walks you through building HelloAgents โ a clean Agent framework built on the OpenAI native API.
๐๏ธ HelloAgents Architecture
Core Components
- Agent Core: LLM calls, tool management, message history
- Tool System: Standardized tool registration and invocation interface
- Memory System: Short-term and long-term memory management
- Planner: Task decomposition and execution plan generation
from openai import OpenAI
from typing import Callable, Any
import json
class Tool:
def __init__(self, func: Callable, name: str, description: str, parameters: dict):
self.func = func
self.name = name
self.description = description
self.parameters = parameters
def to_openai_schema(self) -> dict:
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters
}
}
def __call__(self, **kwargs) -> Any:
return self.func(**kwargs)
class HelloAgent:
def __init__(self, model: str = "gpt-4o", system_prompt: str = ""):
self.client = OpenAI()
self.model = model
self.system_prompt = system_prompt
self.tools: dict[str, Tool] = {}
self.messages: list[dict] = []
if system_prompt:
self.messages.append({"role": "system", "content": system_prompt})
def register_tool(self, tool: Tool):
"""Register a tool"""
self.tools[tool.name] = tool
def run(self, user_input: str) -> str:
"""Run the Agent"""
self.messages.append({"role": "user", "content": user_input})
while True:
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
tools=[t.to_openai_schema() for t in self.tools.values()] or None,
tool_choice="auto" if self.tools else None
)
assistant_msg = response.choices[0].message
self.messages.append(assistant_msg)
if not assistant_msg.tool_calls:
return assistant_msg.content
# Execute tool calls
for tool_call in assistant_msg.tool_calls:
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
if tool_name in self.tools:
result = self.tools[tool_name](**tool_args)
self.messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
๐ง Using HelloAgents
agent = HelloAgent(
model="gpt-4o",
system_prompt="You are a professional data analysis assistant"
)
# Register a tool
def get_weather(city: str) -> str:
return f"It's sunny in {city} today, 25ยฐC"
weather_tool = Tool(
func=get_weather,
name="get_weather",
description="Get the current weather for a city",
parameters={
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
)
agent.register_tool(weather_tool)
result = agent.run("What's the weather like in New York today?")
print(result)
๐ง Agent Memory Systems
Memory is the key to enabling Agents to learn continuously and accumulate knowledge across sessions. Drawing on human memory, Agent memory can be categorized as follows:
Memory Types
- Sensory Memory: Brief retention of raw input โ e.g., the current token stream
- Working Memory (Short-Term): Current conversation context โ the messages list
- Long-Term Memory: Persistent cross-session knowledge stored in databases or vector stores
- Procedural Memory: Skills and behavior patterns, implemented via few-shot examples or fine-tuning
๐ Retrieval-Augmented Generation (RAG)
RAG is the core technology for overcoming LLM knowledge limitations โ it retrieves relevant external knowledge at generation time to improve answer quality.
Basic RAG Pipeline
- Document Processing: Split documents into appropriately sized chunks
- Vectorization: Use an embedding model to convert chunks into vectors
- Storage: Store vectors in a vector database (Chroma, Pinecone, Faiss, etc.)
- Retrieval: Vectorize the user question and retrieve the most relevant chunks
- Generation: Inject retrieved results into the prompt; LLM generates an answer accordingly
from openai import OpenAI
import chromadb
client = OpenAI()
chroma = chromadb.Client()
collection = chroma.create_collection("knowledge_base")
def add_documents(docs: list[str]):
"""Add documents to the knowledge base"""
embeddings = client.embeddings.create(
model="text-embedding-3-small",
input=docs
).data
collection.add(
documents=docs,
embeddings=[e.embedding for e in embeddings],
ids=[f"doc_{i}" for i in range(len(docs))]
)
def rag_query(question: str, top_k: int = 3) -> str:
"""RAG question answering"""
# Retrieve relevant documents
q_embedding = client.embeddings.create(
model="text-embedding-3-small",
input=[question]
).data[0].embedding
results = collection.query(
query_embeddings=[q_embedding],
n_results=top_k
)
context = "\n".join(results['documents'][0])
# Generate answer
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer the question based on the following context:\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
๐ Advanced RAG Techniques
- Hybrid Retrieval: Combine vector retrieval with BM25 keyword retrieval
- Reranking: Apply a second-pass ranking to retrieved results
- Multi-Hop Retrieval: Solve complex reasoning problems through multiple retrieval rounds
- Adaptive Retrieval: Dynamically adjust retrieval strategy based on question type
๐ What Is Context Engineering?
Context Engineering is the systematic design, management, and optimization of the context fed to an LLM to maximize output quality within a limited context window.
๐๏ธ What Makes Up the Context Window?
- System Prompt: Defines the Agent's role, capabilities, and behavior guidelines
- Tool Definitions: Tells the LLM which tools are available and their parameters
- Conversation History: Records the userโAgent interaction history
- External Knowledge: Relevant documents retrieved from the RAG system
- Tool Outputs: Results from previous tool calls
- User Input: The current user message
โ๏ธ Core Techniques
1. Conversation Compression
When conversation history grows too long and exceeds the context window, compression is needed:
- Sliding Window: Keep only the most recent N turns of dialogue
- Summary Compression: Use an LLM to generate a summary of the conversation history
- Importance Filtering: Retain key information based on importance scores
2. Dynamic Context Injection
Dynamically decide which context to inject based on the current question:
- Retrieve relevant memories based on user intent
- Select relevant tools based on task type
- Adjust the system prompt based on the conversation stage
3. KV Cache Optimization
By leveraging the LLM's KV Cache mechanism โ placing unchanged system prompts and tool definitions at the very beginning of the context โ you can significantly reduce API call costs.
๐ Why Do We Need Agent Communication Protocols?
As Agent systems grow more complex, communication between different Agents and between Agents and tools becomes a key challenge. Communication protocols solve interoperability โ allowing Agents and tools built by different developers to collaborate seamlessly.
๐ MCP (Model Context Protocol)
An open standard protocol proposed by Anthropic that defines how LLM applications communicate with external tools and data sources.
- Core Concepts: Server (tool provider) and Client (AI application)
- Transport Layer: Supports stdio and HTTP/SSE transport
- Capability Types: Tools, Resources, Prompt Templates
- Ecosystem: Hundreds of official and community MCP Servers already available
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import mcp.types as types
app = Server("my-mcp-server")
@app.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="get_stock_price",
description="Get the real-time price of a stock",
inputSchema={
"type": "object",
"properties": {
"symbol": {"type": "string", "description": "Stock ticker symbol"}
},
"required": ["symbol"]
}
)
]
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
if name == "get_stock_price":
symbol = arguments["symbol"]
# Call the stock API
price = fetch_stock_price(symbol)
return [TextContent(type="text", text=f"{symbol} current price: {price}")]
async def main():
async with stdio_server() as streams:
await app.run(*streams)
๐ค A2A (Agent-to-Agent) Protocol
A protocol proposed by Google for inter-Agent communication, focused on interoperability between different AI Agent systems. A2A defines standard ways for Agents to discover each other, declare capabilities, and delegate tasks.
๐ ANP (Agent Network Protocol)
An Agent communication protocol for open networks, enabling decentralized discovery and communication between Agents on the open internet.
๐ Why Train Agents?
General-purpose LLMs often perform poorly on specific Agent tasks. Targeted training can teach models to use tools more effectively, plan, and execute complex tasks.
๐ Training Paradigm Overview
1. Supervised Fine-Tuning (SFT)
Supervisedly train a model on high-quality Agent trajectory data to learn correct behavior patterns.
- Data Format: (task, tool-call sequence, final result) triples
- Advantages: Stable training, high data efficiency
- Disadvantages: Requires large amounts of high-quality annotated data; hard to exceed expert demonstrations
2. RLHF (Reinforcement Learning from Human Feedback)
Optimize model behavior through human ratings of model outputs โ the core training technique behind ChatGPT.
3. GRPO (Group Relative Policy Optimization)
An efficient RL algorithm proposed by DeepSeek โ the key technology behind DeepSeek-R1's success.
- Sample multiple outputs for the same question; score them relative to each other within the group
- No separate Critic model needed โ significantly reduces compute cost
- Especially suited for tasks with clear correct answers, like math and code
def compute_reward(completions: list[str], ground_truth: str) -> list[float]:
"""
GRPO reward function example
Compute rewards for multiple model outputs on the same question
"""
rewards = []
for completion in completions:
# Format reward: does the output contain a chain of thought?
format_reward = 1.0 if "<think>" in completion else 0.0
# Correctness reward: is the final answer correct?
if extract_answer(completion) == ground_truth:
correctness_reward = 1.0
else:
correctness_reward = 0.0
rewards.append(0.3 * format_reward + 0.7 * correctness_reward)
return rewards
๐ ๏ธ Agent Training Best Practices
- Start with small models and small datasets to verify your pipeline is correct
- Reward function design is critical โ think carefully about it
- Use tools like Weights & Biases to monitor training
- Regularly evaluate on a test set to prevent overfitting
๐ Why Does Evaluation Matter?
No improvement without evaluation. In Agent systems, evaluation is especially difficult โ Agent output is often a multi-step trajectory, not a single answer, and many tasks have no unique correct answer.
๐ Core Evaluation Dimensions
- Task Success Rate: Proportion of tasks the Agent completes successfully
- Step Efficiency: Average number of steps required to complete a task
- Tool Accuracy: Proportion of correct tool selections and calls
- Hallucination Rate: Frequency of fabricated information from the model
- Cost Efficiency: Token consumption and API cost per task
- Latency: Average time to complete a task
๐ Leading Agent Benchmarks
- GAIA: Tests Agent general capability on real-world tasks
- WebArena: Evaluates Agents' ability to operate web interfaces
- SWE-bench: Evaluates Agents solving real software engineering problems
- HotPotQA: Multi-hop question-answering and reasoning evaluation
- AgentBench: Comprehensive multi-task Agent evaluation
๐ง Evaluation Framework in Practice
from dataclasses import dataclass
from typing import Callable
@dataclass
class EvalResult:
task_id: str
success: bool
steps: int
tokens_used: int
latency_ms: float
error: str = None
class AgentEvaluator:
def __init__(self, agent, test_cases: list[dict]):
self.agent = agent
self.test_cases = test_cases
def run(self) -> list[EvalResult]:
results = []
for case in self.test_cases:
import time
start = time.time()
try:
output = self.agent.run(case["input"])
success = case["judge"](output)
results.append(EvalResult(
task_id=case["id"],
success=success,
steps=self.agent.step_count,
tokens_used=self.agent.token_count,
latency_ms=(time.time()-start)*1000
))
except Exception as e:
results.append(EvalResult(
task_id=case["id"],
success=False,
steps=0, tokens_used=0,
latency_ms=(time.time()-start)*1000,
error=str(e)
))
return results
def summarize(self, results: list[EvalResult]) -> dict:
success_rate = sum(r.success for r in results) / len(results)
avg_steps = sum(r.steps for r in results) / len(results)
return {
"success_rate": f"{success_rate:.1%}",
"avg_steps": f"{avg_steps:.1f}",
"total_tokens": sum(r.tokens_used for r in results)
}
๐ Project Overview
The Smart Travel Assistant is a hands-on project that applies MCP tool protocols and multi-Agent collaboration. The system automatically plans itineraries, queries flights and hotels, and generates travel guides based on user needs.
๐๏ธ System Architecture
- Master Agent: Understands user intent and coordinates sub-Agents
- Flight Search Agent: Connects to flight search APIs via MCP
- Hotel Recommendation Agent: Recommends accommodations based on preferences and budget
- Attraction Planning Agent: Retrieves destination attraction information and recommended routes
- Budget Calculation Agent: Aggregates and optimizes total travel costs
๐ง Core MCP Tools
search_flights(origin, dest, date)โ Search available flightsget_hotels(city, checkin, checkout, budget)โ Query hotelsget_attractions(city, interests)โ Get attraction recommendationsget_weather_forecast(city, dates)โ Query weather forecastcalculate_budget(items)โ Calculate travel budget
async def travel_agent_workflow(user_request: str) -> str:
"""
Travel assistant master workflow
"""
# Step 1: Parse user intent
parsed = await intent_parser.parse(user_request)
# {destination: "Tokyo", dates: ["2025-03-15", "2025-03-22"], budget: 2000}
# Step 2: Parallel queries (improve efficiency)
flights, hotels, weather = await asyncio.gather(
flight_agent.search(parsed["origin"], parsed["destination"], parsed["dates"]),
hotel_agent.search(parsed["destination"], parsed["dates"], parsed["budget"]),
weather_agent.forecast(parsed["destination"], parsed["dates"])
)
# Step 3: Attraction planning
attractions = await attraction_agent.plan(
city=parsed["destination"],
duration=7,
interests=parsed.get("interests", [])
)
# Step 4: Generate comprehensive travel plan
plan = await planner_agent.generate(
flights=flights,
hotels=hotels,
attractions=attractions,
weather=weather,
budget=parsed["budget"]
)
return plan.to_markdown()
๐ฌ What Is a DeepResearch Agent?
DeepResearch is a deep research feature from OpenAI that can autonomously perform multi-round web searches, read literature, integrate information, and finally produce professional-grade research reports. This chapter reproduces its core logic.
๐ Core Workflow
- Problem Analysis: Decompose complex research questions into searchable sub-questions
- Iterative Search: Multiple rounds of search, each refining the strategy based on the previous round
- Content Extraction: Extract key information from search results
- Knowledge Integration: Integrate information from multiple sources into a coherent knowledge base
- Report Generation: Generate a structured research report from the integrated knowledge
class DeepResearchAgent:
def __init__(self, max_iterations: int = 5):
self.max_iterations = max_iterations
self.knowledge_base = []
async def research(self, topic: str) -> str:
# 1. Generate initial search plan
search_plan = await self.generate_search_plan(topic)
for iteration in range(self.max_iterations):
# 2. Execute searches
search_results = await asyncio.gather(*[
self.web_search(query)
for query in search_plan.queries
])
# 3. Extract and validate information
extracted = await self.extract_information(search_results)
self.knowledge_base.extend(extracted)
# 4. Assess knowledge completeness
assessment = await self.assess_knowledge_gaps(
topic=topic,
knowledge=self.knowledge_base
)
if assessment.is_sufficient:
break
# 5. Generate next round search plan
search_plan = await self.generate_next_plan(assessment.gaps)
# 6. Generate final report
return await self.generate_report(topic, self.knowledge_base)
๐ก Key Technical Points
- Problem Decomposition: Use tree structures to break complex questions into sub-problems
- Query Optimization: Dynamically adjust search queries based on existing knowledge
- Deduplication: Identify and merge duplicate information from different sources
- Citation Management: Maintain source citations to ensure reports are traceable
๐๏ธ What Is Cyber Town?
Cyber Town is a multi-Agent social simulation project inspired by Stanford's "Smallville" paper. In a virtual town, multiple Agents with independent personalities, memories, and goals live together โ producing emergent social behavior.
๐ฎ System Architecture
Game World
- Map System: Contains locations like homes, cafes, libraries, and parks
- Time System: Simulates 24 hours of a day
- Interaction System: Agents can interact with objects and other Agents
Agent Design
- Persona: Each Agent has a unique personality, profession, and interests
- Memory System: Agents remember past experiences and conversations
- Planning System: Agents create daily plans based on their goals
- Social System: Agents can build relationships and hold conversations
class TownResident:
def __init__(self, name: str, persona: str):
self.name = name
self.persona = persona
self.memories = MemoryStream()
self.location = "home"
self.schedule = []
async def perceive(self, environment: dict) -> list[str]:
"""Perceive the current environment"""
observations = []
for entity in environment.get("entities", []):
if self.is_relevant(entity):
observations.append(f"I see {entity['description']}")
await self.memories.add(entity)
return observations
async def plan_day(self) -> list[dict]:
"""Plan the day's schedule"""
context = f"You are {self.name}. {self.persona}\n"
context += f"Today's memories: {await self.memories.get_recent(5)}\n"
schedule = await llm.generate_schedule(
context=context,
current_time="08:00",
available_locations=TOWN_LOCATIONS
)
self.schedule = schedule
return schedule
async def react(self, observation: str) -> str:
"""React to an observed event"""
relevant_memories = await self.memories.retrieve(observation)
response = await llm.chat(
system=f"You are {self.name}. {self.persona}",
context=relevant_memories,
user=observation
)
await self.memories.add({
"content": f"I said: {response}",
"importance": 7
})
return response
๐ Capstone Goals
Congratulations on completing all of Hello-Agents! The capstone project is your chance to apply everything you've learned to build a complete multi-Agent application. Through this project, you will:
- Apply Agent architecture, tool calling, memory systems, and multi-Agent collaboration
- Experience the full Agent application development lifecycle
- Build real project experience to enrich your portfolio
๐ Project Requirements
Core Requirements (Must Complete)
- โ Implement at least one core Agent capability (tool calling, planning, etc.)
- โ Integrate at least 3 different types of tools
- โ Implement basic memory / context management
- โ Provide clear code documentation and a README
Advanced Requirements (Optional)
- โญ Implement multi-Agent collaboration
- โญ Integrate a RAG knowledge base
- โญ Deploy to the cloud with a public URL
- โญ Build a web interface
๐ก Topic Ideas
Personal Knowledge Assistant
Manage personal notes and documents with intelligent Q&A and a knowledge graph
Workplace Productivity Agent
Automate email handling, meeting notes, and task management
Learning Assistant
Personalized learning plans, question generation, and concept explanation
Code Assistant
Code review, bug fixing, and documentation generation
๐ Showcase & Share
After completing your capstone, share your work:
- Share your project link in Issues or Discussions
- Post on social media to let more people discover your work
- Follow @reyzowter on X to share updates
You've grown from an LLM "user" into an Agent "builder"! ๐
Community Capstone Projects
Showcases of outstanding capstone projects from the community
Agent Interview Questions
Curated high-frequency interview questions for Agent-related roles
Agent Interview Reference Answers
Reference answers and explanations for the interview questions
Context Engineering Supplement
In-depth extensions and case studies for Chapter 9
Dify Agent Step-by-Step Tutorial
A hand-holding guide to creating a Dify Agent from scratch
Hello-Agents Common Questions
Answers to the most common questions from course participants
Agent Skills vs. MCP Comparison
Technical deep-dive comparing two tool integration approaches
GUI Agent: Introduction & Practice
GUI Agent concepts, principles, and hands-on tutorials
Environment Setup Guide
Detailed OpenAI API and Python environment configuration
How to Write a Great Skill
Best practices and examples for Agent Skill design
Agent Development Lessons & Experience
Real-world lessons and insights from building Code Agents
Agent Self-Evolution
Four feedback loops and representative self-evolving Agent projects
Web Agent: Introduction & Practice
Web Agent principles, anti-scraping practice, and HelloAgents integration
Travel Assistant Post-Training
Training the travel planning demo into a real, usable planner
๐ Core Contributors
Chen Sizhou
Project Lead
Datawhale Member
Full text author & editor
Sun Tao
Co-Founder
Datawhale Member
CAMEL-AI
Jiang Shufan
Co-Founder
Datawhale Member
Exercise design & review
Huang Peilin
Datawhale Associate
Agent Dev Engineer
Chapter 5 contributor
Zeng Xinmin
Agent Engineer
Niuke Technology
Chapter 14 case study
Zhu Xinzhong
Advisor
Datawhale Chief Scientist
ZJU Professor
๐ฅ Extra Chapter Contributors
WH
Content Contributor
Zhou Aojie
DW Contributor Team
Xi'an Jiaotong Univ.
Extra02 content
Zhang Chenxu
Independent Developer
Imperial College London
Extra03 content
Huang Honghan
DW Contributor Team
Shenzhen University
Extra04 content
Wang Dapeng
Datawhale Member
Senior Developer
Extra08 content
You Yihui
Independent Developer
NUIST
Extra09 content
Yin Xin
Independent Developer
Zhejiang University
Extra10 content
Pranav J.
Independent Developer
TinyFish
Extra11 content
Wang Yufei
Independent Developer
BUPT
Extra12 content
We welcome all forms of contribution!
- ๐ Report Bugs โ Found a content or code issue? Please open an Issue
- ๐ก Suggest Ideas โ Have a great idea for the project? Start a discussion
- ๐ Improve Content โ Help improve the tutorial; submit your PR
- โ๏ธ Share Your Work โ Share your notes and projects in the community section