nexa/FunOCR

Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

FunOCR/README.md

宅记

refactor(deploy): 支持构建基于镜像源的容器镜像

9b964a3d

PreviewCode viewBlame

Raw

FunOCR

开箱即用的本地私有化 OCR API 服务，基于微服务架构，支持多种 OCR 引擎灵活组合部署。

特性

微服务架构，每个 OCR 引擎独立运行，灵活组合部署
统一 /v1/ocr API 接口，适配所有模型
支持 RapidOCR、DeepSeek-OCR、PaddleOCR-VL、PP-StructureV3
支持 vLLM / Transformers / PaddlePaddle 多种推理后端
通过 docker-compose 按需组合服务，支持多 GPU 分配
完整数据返回：文本 + Markdown + 坐标 + 置信度 + 元素类型

架构


            Gateway (8000)
        API 路由 + 负载均衡
               |
    ┌──────────┼──────────┐
    |          |          |
RapidOCR  DeepSeek-OCR  PaddleOCR-VL  PP-StructureV3
 (8001)     (8002)        (8003)        (8004)
  CPU     GPU/vLLM      GPU/vLLM      GPU/Paddle

支持的 OCR 引擎

引擎	类型	设备	推理后端	特点
RapidOCR	传统 OCR	CPU	ONNX Runtime	轻量快速，无需 GPU
DeepSeek-OCR	多模态 VLM	GPU	vLLM / Transformers	高精度，复杂文档理解
PaddleOCR-VL	0.9B VLM	GPU	vLLM / PaddlePaddle	轻量 VLM，109 种语言
PP-StructureV3	文档解析	GPU	PaddlePaddle	版面分析、表格、公式

快速开始

Docker 部署

选择适合的部署方式：


# CPU 版本 (网关 + RapidOCR)
docker-compose -f docker-compose.cpu.yml up -d

# GPU vLLM 版本 (网关 + RapidOCR + DeepSeek-OCR + PaddleOCR-VL)
docker-compose -f docker-compose.gpu-vllm.yml up -d

# 完整版本 (所有服务)
docker-compose up -d

国内网络环境:

如果在中国境内构建镜像，建议使用中国镜像源加速：


# 使用 Makefile 构建（推荐）
make build-all-cn

# 或直接使用 docker build
docker build --build-arg USE_CN_MIRROR=1 -f gateway/Dockerfile.base -t funocr/gateway:base .

详见中国镜像源支持文档。

访问 http://localhost:8000/docs 查看 API 文档。

本地开发

需要 Python 3.10+ 和 uv 包管理器。


# 安装 uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# 查看服务状态
python dev.py status

# 初始化 CPU 服务
python dev.py init gateway rapidocr

# 启动服务 (分别在不同终端)
python dev.py run rapidocr    # 终端 1: 端口 8001
python dev.py run gateway     # 终端 2: 端口 8000

详细开发指南请参考 docs/DEVELOPMENT.md。

API 使用

Gateway 与 Service 接口说明

FunOCR 采用微服务架构，Gateway 作为统一入口，转发请求到各个 Service：

层级	端口	接口	说明
Gateway	8000	`POST /v1/ocr`	异步模式，立即返回 `task_id`，轮询获取结果
Gateway	8000	`POST /v1/ocr/sync`	同步模式，等待结果返回（与 Service 接口完全一致）
Gateway	8000	`GET /v1/tasks/{task_id}`	查询异步任务状态和结果
Service	8001-8004	`POST /v1/ocr`	各服务的原生接口，同步返回结果

重要: Gateway 的 POST /v1/ocr/sync 接口与各 Service 的 POST /v1/ocr 接口参数和返回格式完全一致，仅多了模型路由功能。如果直接访问 Service（如 http://localhost:8002/v1/ocr），效果与 Gateway 的同步接口相同。

推荐使用方式:

生产环境: 通过 Gateway（端口 8000）访问，支持异步任务和负载均衡
开发调试: 可直接访问 Service 端口进行测试

FunOCR Gateway 提供两种 API 模式：

异步模式（推荐）: POST /v1/ocr 立即返回任务 ID，通过轮询获取结果，适合长时间处理任务
同步模式: POST /v1/ocr/sync 等待结果返回，向后兼容旧版本

POST /v1/ocr - 创建 OCR 任务（异步）

创建异步 OCR 任务，立即返回 task_id，客户端通过 GET /v1/tasks/{task_id} 轮询结果。

参数:

参数	类型	必填	默认值	说明
file	File	是	-	图片或 PDF 文件
model	String	否	rapidocr	模型: `rapidocr`, `deepseek-ocr`, `paddleocr-vl`, `paddle-structure`
output_format	String	否	json	输出格式: `json`, `markdown`, `text`
pagination	Boolean	否	true	PDF 分页模式（详见下方说明）
include_images	Boolean	否	false	是否提取图片（返回 ZIP 文件，仅 `deepseek-ocr`/`paddleocr-vl`/`paddle-structure` 支持）


{
  "code": 200,
  "msg": "success",
  "data": {
    "task_id": "2e9a064c-4ae8-475d-92fe-9c86c25d428a",
    "status": "pending",
    "model": "paddle-structure",
    "filename": "document.pdf",
    "message": "任务已创建,请使用 GET /v1/tasks/{task_id} 查询结果"
  }
}

示例:


# 创建异步任务
curl -X POST http://localhost:8000/v1/ocr \
  -F "file=@document.pdf" \
  -F "model=paddle-structure"

# 提取图片（返回 ZIP）
curl -X POST http://localhost:8000/v1/ocr \
  -F "file=@document.pdf" \
  -F "model=deepseek-ocr" \
  -F "include_images=true"

GET /v1/tasks/{task_id} - 查询任务状态

查询异步任务的状态和结果。


{
  "code": 200,
  "msg": "success",
  "data": {
    "task_id": "2e9a064c-4ae8-475d-92fe-9c86c25d428a",
    "model": "paddle-structure",
    "status": "completed",
    "created_at": "2024-01-23T10:00:00",
    "updated_at": "2024-01-23T10:00:15",
    "filename": "document.pdf",
    "result_type": "json",
    "result": {
      "text": "识别的文本...",
      "markdown": "# 标题\n\n内容...",
      "blocks": [...],
      "metadata": {...}
    }
  }
}

状态说明:

状态	说明
`pending`	任务等待处理
`processing`	任务处理中
`completed`	任务已完成，`result_type` 为 `json` 时直接返回结果，为 `zip` 时提供 `download_url`
`failed`	任务失败，包含 `error` 字段

示例:


# 轮询任务状态
curl http://localhost:8000/v1/tasks/2e9a064c-4ae8-475d-92fe-9c86c25d428a

GET /v1/tasks/{task_id}/download - 下载任务结果文件

下载任务结果文件（JSON 或 ZIP）。当 result_type 为 zip 时使用此接口。

示例:


# 下载 ZIP 结果
curl -O http://localhost:8000/v1/tasks/2e9a064c-4ae8-475d-92fe-9c86c25d428a/download

GET /v1/tasks - 列出任务

列出所有任务，支持按模型和状态筛选。

参数:

参数	类型	必填	默认值	说明
model	String	否	-	筛选模型
status	String	否	-	筛选状态 (`pending`/`processing`/`completed`/`failed`)
limit	Integer	否	100	返回数量限制

示例:


# 列出所有任务
curl http://localhost:8000/v1/tasks

# 筛选已完成的任务
curl "http://localhost:8000/v1/tasks?status=completed&limit=50"

# 筛选特定模型的任务
curl "http://localhost:8000/v1/tasks?model=paddle-structure"

POST /v1/ocr/sync - OCR 识别（同步）

同步 OCR 接口，等待结果返回后才响应。适合快速任务或向后兼容。

参数: 与 POST /v1/ocr 相同

示例:


# 使用默认模型 (RapidOCR)
curl -X POST http://localhost:8000/v1/ocr/sync \
  -F "file=@document.png"

# 指定模型和输出格式
curl -X POST http://localhost:8000/v1/ocr/sync \
  -F "file=@document.png" \
  -F "model=deepseek-ocr" \
  -F "output_format=markdown"

# PDF 合并模式 + 提取图片（返回 ZIP）
curl -X POST http://localhost:8000/v1/ocr/sync \
  -F "file=@document.pdf" \
  -F "model=paddle-structure" \
  -F "pagination=false" \
  -F "include_images=true" \
  -o document_result.zip

pagination 参数说明:

值	说明
`true` (默认)	分页模式: 返回 `PaginatedOCRResult`，每页独立存储，便于按页处理
`false`	合并模式: 返回 `OCRResult`，text/markdown 无缝合并，blocks 带 `page` 字段标识所属页码

响应结构

所有接口统一返回以下 JSON 结构：


{
  "code": 200,
  "msg": "success",
  "data": { ... }
}

字段	类型	说明
`code`	Integer	HTTP 状态码
`msg`	String	消息（成功时为 "success"）
`data`	Object/null	实际数据（错误时可能为 null）

注意: ZIP 文件响应（include_images=true）直接返回二进制流，不包装在 ApiResponse 中。

图片响应 (OCRResult)


{
  "code": 200,
  "msg": "success",
  "data": {
    "text": "识别的完整文本",
    "markdown": "# 标题\n\n段落文本...",
    "blocks": [
      {
        "text": "标题文本",
        "type": "title",
        "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]],
        "score": 0.95,
        "block_id": 0,
        "order": 0,
        "group_id": 0
      }
    ],
    "metadata": {
      "model": "deepseek-ocr",
      "engine": "vllm"
    }
  }
}

PDF 分页模式响应 (PaginatedOCRResult, pagination=true)


{
  "code": 200,
  "msg": "success",
  "data": {
    "pages": [
      {
        "text": "第一页内容...",
        "markdown": "# 标题\n\n第一页内容...",
        "blocks": [
          {"text": "标题", "type": "title", "box": [...], "score": 0.99}
        ],
        "metadata": {...}
      },
      {
        "text": "第二页内容...",
        "markdown": "第二页内容...",
        "blocks": [...],
        "metadata": {...}
      }
    ],
    "total_pages": 2,
    "metadata": {
      "model": "paddle-structure",
      "filename": "document.pdf"
    }
  }
}

PDF 合并模式响应 (OCRResult, pagination=false)


{
  "code": 200,
  "msg": "success",
  "data": {
    "text": "第一页内容...第二页内容...",
    "markdown": "# 标题\n\n第一页内容...第二页内容...",
    "blocks": [
      {"text": "标题", "type": "title", "box": [...], "score": 0.99, "page": 1},
      {"text": "内容", "type": "text", "box": [...], "score": 0.98, "page": 1},
      {"text": "更多", "type": "text", "box": [...], "score": 0.97, "page": 2}
    ],
    "metadata": {
      "model": "paddle-structure",
      "filename": "document.pdf",
      "total_pages": 2
    }
  }
}

合并模式说明:

text 和 markdown 会智能合并多页内容，识别跨页段落（中文直接连接，英文加空格）

PP-StructureV3 使用官方 concatenate_markdown_pages 方法，支持更精确的段落续接检测

每个 block 带有 page 字段（从 1 开始）标识所属页码

响应字段说明:

字段	类型	说明	支持的模型
`text`	String	识别的完整文本	所有模型
`markdown`	String	Markdown 格式内容	所有模型
`blocks`	Array	文本块列表（详见下方）	所有模型
`metadata`	Object	元数据（模型名称、引擎等）	所有模型

blocks 数组字段:

字段	类型	说明	支持的模型
`text`	String	文本内容	所有模型
`type`	String	元素类型（text/title/table/formula/figure 等）	所有模型
`box`	Array	四点坐标 `[[x1,y1], [x2,y2], [x3,y3], [x4,y4]]`	所有模型
`score`	Float	置信度 (0-1)	所有模型
`block_id`	Integer	区块唯一标识符（每页从 0 开始）	DeepSeek-OCR, PaddleOCR-VL, PP-StructureV3
`order`	Integer	阅读顺序（按此排序获得正确阅读顺序）	PaddleOCR-VL, PP-StructureV3
`group_id`	Integer	分组 ID（多列布局时，同组区块属于同一列）	仅 PaddleOCR-VL
`page`	Integer	所属页码（从 1 开始，仅 PDF 合并模式）	所有模型 (pagination=false)

注意:

block_id 在 PDF 处理时，每页独立编号（从 0 开始），不同页可能有相同的 block_id。

RapidOCR 不返回 block_id、order、group_id 字段（值为 null）。

PaddleOCR-VL 和 PP-StructureV3 提供完整的文档结构信息（block_id、order、group_id），便于处理复杂版面。

PP-StructureV3 特殊情况:

跨列元素（如占据两列的图片、图表等）的 order 可能为 null，这些元素在排序时会被置于文档末尾，需要根据 box 坐标手动调整位置。

某些元素的 block_id 可能为 null（取决于 PaddleOCR 官方库返回）。

错误响应

所有接口在发生错误时返回统一的 JSON 格式：


{
  "code": 400,
  "msg": "错误消息",
  "data": null
}

HTTP 状态码

状态码	场景	示例
400	客户端错误（验证失败、参数错误）	文件过大、不支持的格式、无效参数
404	资源不存在	任务不存在、结果文件不存在
500	服务器内部错误	OCR 处理异常、模型推理错误
503	服务不可用	无法连接到后端 Service
504	请求超时	Service 请求超时

常见错误示例

模型不可用 (400):


{
  "code": 400,
  "msg": "模型 'invalid-model' 不可用。可用模型: rapidocr, deepseek-ocr, paddleocr-vl, paddle-structure",
  "data": null
}

文件验证失败 (400):


{
  "code": 400,
  "msg": "图片文件过大: 25000000 bytes (最大 20971520 bytes)",
  "data": null
}


{
  "code": 400,
  "msg": "不支持的图片格式: bmp，支持: jpg, jpeg, png, webp, tiff, tif",
  "data": null
}


{
  "code": 400,
  "msg": "PDF 页数过多: 150 (最大 100)",
  "data": null
}

功能不支持 (400):


{
  "code": 400,
  "msg": "RapidOCR 服务不支持图片提取功能（include_images=True），请使用 paddle-structure、deepseek-ocr 或 paddleocr-vl 服务",
  "data": null
}

任务不存在 (404):


{
  "code": 404,
  "msg": "任务不存在",
  "data": null
}

服务连接失败 (503):


{
  "code": 503,
  "msg": "无法连接到服务 deepseek-ocr",
  "data": null
}

请求超时 (504):


{
  "code": 504,
  "msg": "服务 paddleocr-vl 请求超时",
  "data": null
}

OCR 处理失败 (500):


{
  "code": 500,
  "msg": "服务器内部错误: CUDA out of memory",
  "data": null
}

异步任务失败

异步任务（POST /v1/ocr）失败时，通过 GET /v1/tasks/{task_id} 查询会返回错误响应：


{
  "code": 500,
  "msg": "服务 deepseek-ocr 请求超时",
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "model": "deepseek-ocr",
    "status": "failed",
    "created_at": "2024-01-23T10:30:00",
    "updated_at": "2024-01-23T10:35:00",
    "filename": "document.pdf"
  }
}

文件大小限制

文件类型	最大大小	其他限制
图片	20 MB	支持格式: jpg, jpeg, png, webp, tiff, tif
PDF	100 MB	最大 100 页

GET /health - 健康检查

返回网关和所有服务的健康状态。

GET /models - 列出模型

返回所有可用模型及其信息。

项目结构


FunOCR/
├── gateway/              # 网关服务
├── services/             # OCR 微服务
│   ├── rapidocr/         # RapidOCR (CPU)
│   ├── deepseek-ocr/     # DeepSeek-OCR (vLLM/Transformers)
│   ├── paddleocr-vl/     # PaddleOCR-VL (vLLM/Paddle)
│   └── paddle-structure/ # PP-StructureV3 (Paddle)
├── shared/               # 共享代码 (models.py, utils.py)
├── docker-compose.yml            # 完整版本
├── docker-compose.cpu.yml        # CPU 版本
├── docker-compose.gpu-vllm.yml   # GPU vLLM 版本
├── dev.py                        # 本地开发工具
└── docs/DEVELOPMENT.md           # 开发指南

配置

Docker Compose 文件

文件	包含服务	适用场景
`docker-compose.cpu.yml`	Gateway + RapidOCR	无 GPU 环境
`docker-compose.gpu-vllm.yml`	Gateway + RapidOCR + DeepSeek + PaddleOCR-VL	单 GPU vLLM 推理
`docker-compose.yml`	所有服务	多 GPU 完整部署

多 GPU 分配

在 docker-compose.yml 中可以为不同服务分配不同 GPU：


# 所有服务默认可见所有 GPU，通过 DEVICE 环境变量控制使用哪个
deepseek-ocr:
  environment:
    - DEVICE=0  # 使用 GPU 0
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all  # 所有 GPU 对容器可见
            capabilities: [gpu]

paddle-structure:
  environment:
    - DEVICE=1  # 使用 GPU 1
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all  # 所有 GPU 对容器可见
            capabilities: [gpu]

主要环境变量

网关 (Gateway):

AVAILABLE_SERVICES: 可用服务列表 (逗号分隔)
DEFAULT_MODEL: 默认模型 (默认: rapidocr)
REQUEST_TIMEOUT: 请求超时秒数 (默认: 300)

DeepSeek-OCR / PaddleOCR-VL:

USE_VLLM: 是否使用 vLLM 推理 (默认: false)
MODEL_PATH: 模型路径或 HuggingFace 仓库
GPU_MEMORY_UTILIZATION: GPU 内存使用率 (默认: 0.75)

详细配置请参考 docs/DEVELOPMENT.md。

常见问题

如何切换推理后端？

DeepSeek-OCR 和 PaddleOCR-VL 支持多种推理后端，通过不同的 Dockerfile 构建：


# vLLM 版本
docker build -f services/deepseek-ocr/Dockerfile.vllm -t deepseek-ocr:vllm .

# Transformers 版本
docker build -f services/deepseek-ocr/Dockerfile.transformers -t deepseek-ocr:transformers .

GPU 内存不足？

减少 GPU_MEMORY_UTILIZATION 值
调整 MAX_MODEL_LEN 降低序列长度
减少同时运行的 GPU 服务数量

如何添加新服务？

在 services/ 下创建新服务目录
实现 /health, /v1/ocr, /info 接口
返回统一的 OCRResult 数据结构
在 docker-compose 中添加服务配置
更新网关的 AVAILABLE_SERVICES 环境变量

详见 docs/DEVELOPMENT.md。

性能评估

并发架构

FunOCR 针对不同服务采用了优化的并发策略,充分利用 GPU 资源:

架构优化特性

Gateway 异步任务管理
- POST /v1/ocr 立即返回 task_id
- GET /v1/tasks/{task_id} 查询结果
- SQLite 持久化任务队列
- 向后兼容: POST /v1/ocr/sync 同步等待结果
Service 层并发优化
- 使用 asyncio.to_thread 避免阻塞事件循环
- 智能批处理 (DeepSeek-OCR vLLM)
- 多 GPU 负载均衡 (Gunicorn multi-worker)
Worker/GPU 日志
- 所有日志打印 [Worker {pid} GPU {id}]
- OCR 响应包含 worker_id, gpu_id, inference_time
- 方便验证负载均衡和性能调优

各服务并发能力

服务	推理后端	并发策略	批处理支持	性能提升	多 GPU 支持
RapidOCR	ONNX Runtime	异步 I/O	❌	-	❌ (CPU only)
DeepSeek-OCR (vLLM)	vLLM	单进程 + 内部批处理	✅ 自动批处理	2-4x	✅ Tensor Parallelism
DeepSeek-OCR (Transformers)	PyTorch	Multi-worker + 批处理队列	✅ 应用层批处理	1.1-1.2x	✅ 每 worker 1 GPU
PaddleOCR-VL (vLLM)	vLLM Server	Multi-worker + asyncio	⚠️ Server 端自动批处理	1.1-1.2x	✅ 每 worker 1 GPU
PaddleOCR-VL (Paddle)	PaddlePaddle	Multi-worker + asyncio	❌	1.05-1.1x	✅ 每 worker 1 GPU
PP-StructureV3	PaddlePaddle	Multi-worker + asyncio	❌	1.05-1.1x	✅ 每 worker 1 GPU

性能说明

DeepSeek-OCR (vLLM 模式) - 最优性能 🚀

架构: 单进程 uvicorn + vLLM 内部 tensor parallelism
批处理: vLLM 动态批处理,4 个请求从 8s 降到 2.5s
吞吐量提升: 200-400% (2-4x)
适用场景: 高并发,多 GPU 推理

配置:


# 自动使用所有 GPU
DEVICES=auto docker-compose -f docker-compose.gpu-vllm.yml up -d

# 指定 GPU (如 3 块 GPU)
DEVICES=0,1,2 docker-compose -f docker-compose.gpu-vllm.yml up -d

DeepSeek-OCR (Transformers 模式)

架构: Gunicorn multi-worker + BatchProcessor
批处理: 应用层批处理队列 (max_batch_size=4, max_wait_ms=50)
吞吐量提升: 10-20%
适用场景: 中等并发,推理时间相对稳定

配置:


# 3 个 worker,每个绑定 1 个 GPU
DEVICES=0,1,2 WORKERS=3 docker-compose up deepseek-ocr

PaddleOCR-VL (vLLM Server 模式)

架构: Gunicorn multi-worker + vLLM server 后台批处理
批处理: vLLM server 端自动批处理 (无法直接控制)
吞吐量提升: 10-20%
限制: 多阶段管道 (layout → unwarp → vl_rec),部分阶段无法批处理
适用场景: 多语言支持 (109 种),轻量 VLM

其他服务

RapidOCR: CPU 推理,异步 I/O 优化
PaddleOCR-VL (Paddle): 异步避免阻塞,5-10% 提升
PP-StructureV3: 异步避免阻塞,5-10% 提升

性能调优建议

1. 批处理参数 (DeepSeek-OCR Transformers)

在 services/deepseek-ocr/config.py:


# 高并发场景
MAX_BATCH_SIZE=8    # 更大批次
MAX_WAIT_MS=30      # 更短等待

# 低延迟场景
MAX_BATCH_SIZE=2    # 更小批次
MAX_WAIT_MS=20      # 更短等待

# 均衡场景 (默认)
MAX_BATCH_SIZE=4
MAX_WAIT_MS=50

2. 多 GPU 配置


# 方式 1: 自动检测所有 GPU
DEVICES=auto

# 方式 2: 指定 GPU (推荐)
DEVICES=0,1,2

# 方式 3: 指定 worker 数量 (覆盖自动计算)
DEVICES=0,1,2 WORKERS=3

3. 验证负载均衡

查看日志中的 Worker/GPU 分配:


docker-compose logs -f deepseek-ocr | grep "Worker"

# 应该看到:
# [Worker 100 GPU 0] 初始化推理引擎
# [Worker 101 GPU 1] 初始化推理引擎
# [Worker 102 GPU 2] 初始化推理引擎

OCR 响应也包含 Worker/GPU 信息:


{
  "text": "...",
  "metadata": {
    "model": "deepseek-ocr",
    "worker_id": 100,
    "gpu_id": "0",
    "inference_time": 2.345
  }
}

性能测试

详细的性能优化实施和测试方法请参考: OPTIMIZATION_SUMMARY.md

许可证

本项目采用 MIT 许可证。详见 LICENSE 文件。

致谢

RapidOCR - 轻量级 OCR 引擎
DeepSeek-OCR - 高精度多模态 OCR
PaddleOCR - PaddleOCR-VL 和 PP-StructureV3
vLLM - 高性能 LLM 推理框架
FastAPI - Web 框架

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111