The First Open-Source AI Phone Automation Assistant Without PC
Vision-Language Model (VLM) · Native Android Kotlin · Multi-Agent Architecture
English | 简体中文
In December 2025, ByteDance partnered with ZTE to release "Doubao Phone Assistant" - an AI assistant that can automatically operate your phone to complete complex tasks. It can compare prices and place orders, batch submit job applications, scroll through videos, and even play games for you.
The first batch of 30,000 engineering units priced at 3,499 CNY (~$480) sold out on launch day, with resale prices reaching 5,000+ CNY.
Can't buy one? Let's build our own.
And so Roubao was born - a fully open-source AI phone automation assistant.
Why "Roubao" (肉包, meaning "meat bun")? Because the author doesn't like vegetables. 🥟
| Feature | Roubao | Doubao Phone | Other Open Source |
|---|---|---|---|
| Requires PC | ❌ No | ❌ No | ✅ Most do |
| Requires Hardware | ❌ No | ✅ $480+ | ❌ No |
| Native Android | ✅ Kotlin | ✅ Native | ❌ Python |
| Open Source | ✅ MIT | ❌ Closed | ✅ Yes |
| Skills/Tools Architecture | ✅ Full | ❓ Unknown | ❌ No |
| UI Design | ⭐⭐⭐½ | ⭐⭐⭐⭐ | ⭐⭐ |
| Custom Models | ✅ Yes | ❌ Doubao only | ✅ Partial |
Pain points of traditional phone automation:
Roubao's Solution:
One app, install and use. No computer, no cables, no technical background required.
Open App → Configure API Key → Tell it what you want → Done.
Almost all phone automation open-source projects (including Alibaba's MobileAgent) are Python implementations, requiring:
Roubao is completely different.
We rewrote the entire MobileAgent framework in Kotlin, running natively on Android:
For security reasons, regular Android apps cannot:
input tap or screencapTraditional solutions require connecting to a computer for ADB commands. Shizuku is an elegant solution:
This allows Roubao to execute screenshots, taps, and input directly on the phone, truly achieving "one app does it all."
Inspired by Claude Code, Roubao implements a Tools + Skills dual-layer Agent framework:
User: "Order me some food" │ ▼ ┌─────────────┐ │ SkillManager │ ← Intent Recognition └─────────────┘ │ ┌────┴────┐ │ │ ▼ ▼ 🚀 Fast Path 🤖 Standard Path (Delegation) (GUI Automation) │ │ ▼ ▼ Direct DeepLink Agent Loop Open Xiaomei AI Operate Meituan App
Tools Layer (Atomic Capabilities)
Low-level toolkit where each Tool performs an independent operation:
| Tool | Function |
|---|---|
search_apps | Smart app search (pinyin, semantic support) |
open_app | Open application |
deep_link | Jump to specific app page via DeepLink |
clipboard | Read/write clipboard |
shell | Execute Shell commands |
http | HTTP requests (call external APIs) |
Skills Layer (User Intent)
User-facing task layer that maps natural language to specific operations:
| Skill | Type | Description |
|---|---|---|
| Order Food (Xiaomei) | Delegation | Directly open Xiaomei AI to help order |
| Order Food (Meituan) | GUI Automation | Step-by-step operation on Meituan App |
| Navigate (Amap) | Delegation | DeepLink directly to Amap search |
| Generate Image (Jimeng) | Delegation | Open Jimeng AI to generate images |
| Send WeChat | GUI Automation | Auto-operate WeChat to send messages |
Two Execution Modes:
Delegation: For high-confidence matches, directly open AI-capable apps (like Xiaomei, Doubao, Jimeng) via DeepLink to complete tasks. Fast, one-step.
GUI Automation: For apps without AI capability (like Meituan, WeChat), complete tasks through traditional screenshot-analyze-operate loops. Skills provide step guidance for better success rates.
This is probably the best-looking UI among all open-source phone automation projects.
When Shizuku runs with Root privileges, Roubao can enable Root mode:
su -c commands (use with caution)Shizuku is an open-source tool that allows regular apps to gain ADB-level permissions without Root.
Startup Methods (choose one):
Wireless Debugging (Recommended, requires Android 11+)
Settings > Developer Options > Wireless DebuggingComputer ADB
adb shell sh /storage/emulated/0/Android/data/moe.shizuku.privileged.api/start.shDownload the latest APK from Releases page.
Alibaba Qwen-VL (Recommended for China users)
OpenAI (Requires proxy in some regions)
Order a tasty burger nearby Open NetEase Music and play daily recommendations Post my latest photo to Weibo Order pork trotter rice on Meituan Watch trending videos on Bilibili
┌──────────────────────────────────────────────────────────────┐ │ Roubao App │ ├──────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ UI Layer (Compose) │ │ │ │ HomeScreen / Settings / History │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────▼────────────────────────────┐ │ │ │ Skills Layer │ │ │ │ SkillManager → Intent Recognition → Fast/Standard │ │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ │ │ Order Food │ Navigate │ Taxi │ WeChat │ AI Art │ │ │ │ └─────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────▼────────────────────────────┐ │ │ │ Tools Layer │ │ │ │ ToolManager → Atomic Capability Wrapper │ │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ │ │ search_apps │ open_app │ deep_link │ clipboard │ │ │ │ │ shell │ http │ screenshot │ tap │ swipe │ type │ │ │ │ └─────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────▼────────────────────────────┐ │ │ │ Agent Layer │ │ │ │ MobileAgent (ported from MobileAgent-v3) │ │ │ │ ┌───────────┬───────────┬───────────┬──────────┐ │ │ │ │ │ Manager │ Executor │ Reflector │ Notetaker│ │ │ │ │ │ (Planning)│(Execution)│(Reflection)│ (Notes) │ │ │ │ │ └───────────┴───────────┴───────────┴──────────┘ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────▼────────────────────────────┐ │ │ │ VLM Client │ │ │ │ Qwen-VL / GPT-4V / Claude │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ ├────────────────────────────┼────────────────────────────────┤ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Shizuku │ │ │ │ System-level Control │ │ │ │ screencap │ input tap │ input swipe │ am start │ │ │ └─────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────┘
User Input │ ▼ ┌─────────────────┐ │ Skills Match │ ← Check for matching Skill └─────────────────┘ │ ├── High-confidence Delegation Skill ──▶ Direct DeepLink ──▶ Done │ ▼ ┌─────────────────┐ │ Standard Agent │ │ Loop │ └─────────────────┘ │ ▼ ┌──────────────────────────────────────────────┐ │ 1. Screenshot - Shizuku screencap │ │ 2. Manager Planning - VLM analyzes state │ │ 3. Executor Decision - Determine next step │ │ 4. Execute Action - tap/swipe/type/open_app │ │ 5. Reflector - Evaluate action outcome │ │ 6. Loop until done or safety limit │ └──────────────────────────────────────────────┘
app/src/main/java/com/roubao/autopilot/ ├── agent/ # AI Agent Core (ported from MobileAgent-v3) │ ├── MobileAgent.kt # Agent main loop │ ├── Manager.kt # Planning Agent │ ├── Executor.kt # Execution Agent │ ├── ActionReflector.kt # Reflection Agent │ ├── Notetaker.kt # Notes Agent │ └── InfoPool.kt # State pool │ ├── tools/ # Tools Layer - Atomic Capabilities │ ├── Tool.kt # Tool interface definition │ ├── ToolManager.kt # Tool manager │ ├── SearchAppsTool.kt # App search │ ├── OpenAppTool.kt # Open app │ ├── DeepLinkTool.kt # DeepLink jump │ ├── ClipboardTool.kt # Clipboard operations │ ├── ShellTool.kt # Shell commands │ └── HttpTool.kt # HTTP requests │ ├── skills/ # Skills Layer - User Intent │ ├── Skill.kt # Skill interface definition │ ├── SkillRegistry.kt # Skill registry │ └── SkillManager.kt # Skill manager │ ├── controller/ # Device Control │ ├── DeviceController.kt # Shizuku controller │ └── AppScanner.kt # App scanner (pinyin/semantic search) │ ├── vlm/ # VLM Client │ └── VLMClient.kt # API wrapper │ ├── ui/ # User Interface │ ├── screens/ # Screen composables │ ├── theme/ # Theme definitions │ └── OverlayService.kt # Overlay service │ ├── data/ # Data Layer │ └── SettingsManager.kt # Settings management │ └── App.kt # Application entry app/src/main/assets/ └── skills.json # Skills configuration file
Major update in progress on
roubao2.0+AccessibilityServicebranch
Accessibility Service Hybrid Mode - Integrate AccessibilityService for more precise UI control
UI Tree Awareness - Agent can access complete UI structure
Macro Script System - Record, store, and replay action sequences
Settings Enhancement
# Clone repository
git clone https://github.com/yourusername/roubao.git
cd roubao
# Build Debug version
./gradlew assembleDebug
# Install to device
./gradlew installDebug
Encountered a crash or bug? Here's how to report:
💡 Log files do NOT contain your API Key or personal information
Please submit issues on GitHub Issues with:
Issues and Pull Requests are welcome!
git checkout -b feature/amazing-feature)git commit -m 'Add some amazing feature')git push origin feature/amazing-feature)This project is open-sourced under the MIT License. See LICENSE file for details.
Made with ❤️ by Roubao Team