DeepSeek-OCR vLLM Docker Image

This Docker image packages DeepSeek-OCR model with vLLM for serving OCR requests via OpenAI-compatible API.

Prerequisites

Before building the image, clone the model repository locally:


git clone https://cnb.cool/ai-models/deepseek-ai/DeepSeek-OCR model

Build the Image


docker build -t deepseek-ocr/deepseek-ocr:latest .

Run the Container


docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus all \
  --ipc=host \
  deepseek-ocr/deepseek-ocr:latest

直接启动构建好的


docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus all \
  --ipc=host \
  docker.cnb.cool/ai-models/deepseek-ai/deepseek-ocr-vllm:latest

限制显存


docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus '"device=0,memory=10G"' \
  --ipc=host \
  docker.cnb.cool/ai-models/deepseek-ai/deepseek-ocr-vllm:latest

Usage Example

查看有哪些模型：


curl http://localhost:8080/v1/models

进行OCR：


curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "deepseek-ocr",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
            }
          },
          {
            "type": "text",
            "text": "Free OCR."
          }
        ]
      }
    ],
    "max_tokens": 2048,
    "temperature": 0.0,
    "skip_special_tokens": false,
    "extra_body": {
      "vllm_xargs": {
        "ngram_size": 30,
        "window_size": 90,
        "whitelist_token_ids": [128821, 128822]
      }
    }
  }'

进行图片解读：


curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "deepseek-ocr",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
            }
          },
          {
            "type": "text",
            "text": "\n 这是一张"
          }
        ]
      }
    ],
    "max_tokens": 2048,
    "temperature": 0.0,
    "skip_special_tokens": false,
    "extra_body": {
      "vllm_xargs": {
        "ngram_size": 30,
        "window_size": 90,
        "whitelist_token_ids": [128821, 128822]
      }
    }
  }'


import time
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8080/v1",
    timeout=3600
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
                }
            },
            {
                "type": "text",
                "text": "Free OCR."
            }
        ]
    }
]

start = time.time()
response = client.chat.completions.create(
    model="/workspace/model",
    messages=messages,
    max_tokens=2048,
    temperature=0.0,
    extra_body={
        "skip_special_tokens": False,
        # args used to control custom logits processor
        "vllm_xargs": {
            "ngram_size": 30,
            "window_size": 90,
            # whitelist: <td>, </td>
            "whitelist_token_ids": [128821, 128822],
        },
    },
)
print(f"Response costs: {time.time() - start:.2f}s")
print(f"Generated text: {response.choices[0].message.content}")

Alternative Usage with Grounding

For document-to-markdown conversion:


messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "your_image_url"
                }
            },
            {
                "type": "text",
                "text": "<|grounding|>Convert the document to markdown."
            }
        ]
    }
]

Custom Parameters

You can pass additional vLLM parameters when running the container:


docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus all \
  --ipc=host \
  deepseek-ocr/deepseek-ocr:latest \
  --max-model-len 8192 \
  --max-num-batched-tokens 4096

Model Sizes

DeepSeek-OCR supports different processing modes:

Tiny: base_size = 512, image_size = 512
Small: base_size = 640, image_size = 640
Base: base_size = 1024, image_size = 1024
Large: base_size = 1280, image_size = 1280
Gundam: base_size = 1024, image_size = 640, crop_mode = True

References

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111