使用 run_paddleocr_with_docker.py 一键完成所有步骤:
python run_paddleocr_with_docker.py
该脚本会自动:
test_pdf/ 目录下的所有 PDF 文件如需手动管理容器:
# 拉取新构建的镜像
docker pull docker.cnb.cool/ai-models/paddlepaddle/paddleocr-vl-vllm:latest
# 启动新容器
docker run -d \
--name paddleocr-vl \
-p 8080:8080 \
--gpus all \
--shm-size 16g \
--restart unless-stopped \
docker.cnb.cool/ai-models/paddlepaddle/paddleocr-vl-vllm:latest
# 实时查看日志
docker logs -f paddleocr-vl
先尝试最小依赖
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
python -m pip install paddleocr
python -m pip install -U "paddleocr[doc-parser]"
备份依赖
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ python -m pip install -U "paddleocr[doc-parser]" python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
paddleocr doc_parser \ -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png \ --vl_rec_backend vllm-server \ --vl_rec_server_url http://127.0.0.1:8080/v1
from paddleocr import PaddleOCRVL
pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:8080/v1")
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
for res in output:
res.print()
res.save_to_json(save_path="output")
res.save_to_markdown(save_path="output")