This Docker image packages DeepSeek-OCR model with vLLM for serving OCR requests via OpenAI-compatible API.
Before building the image, clone the model repository locally:
git clone https://cnb.cool/ai-models/deepseek-ai/DeepSeek-OCR model
docker build -t deepseek-ocr/deepseek-ocr:latest .
docker run -d \ --name deepseek-ocr \ -p 8080:8080 \ --gpus all \ --ipc=host \ deepseek-ocr/deepseek-ocr:latest
docker run -d \ --name deepseek-ocr \ -p 8080:8080 \ --gpus all \ --ipc=host \ docker.cnb.cool/ai-models/deepseek-ai/deepseek-ocr-vllm:latest
限制显存
docker run -d \
--name deepseek-ocr \
-p 8080:8080 \
--gpus '"device=0,memory=10G"' \
--ipc=host \
docker.cnb.cool/ai-models/deepseek-ai/deepseek-ocr-vllm:latest
查看有哪些模型:
curl http://localhost:8080/v1/models
进行OCR:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "deepseek-ocr",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "Free OCR."
}
]
}
],
"max_tokens": 2048,
"temperature": 0.0,
"skip_special_tokens": false,
"extra_body": {
"vllm_xargs": {
"ngram_size": 30,
"window_size": 90,
"whitelist_token_ids": [128821, 128822]
}
}
}'
进行图片解读:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "deepseek-ocr",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "\n 这是一张"
}
]
}
],
"max_tokens": 2048,
"temperature": 0.0,
"skip_special_tokens": false,
"extra_body": {
"vllm_xargs": {
"ngram_size": 30,
"window_size": 90,
"whitelist_token_ids": [128821, 128822]
}
}
}'
import time
from openai import OpenAI
client = OpenAI(
api_key="EMPTY",
base_url="http://localhost:8080/v1",
timeout=3600
)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "Free OCR."
}
]
}
]
start = time.time()
response = client.chat.completions.create(
model="/workspace/model",
messages=messages,
max_tokens=2048,
temperature=0.0,
extra_body={
"skip_special_tokens": False,
# args used to control custom logits processor
"vllm_xargs": {
"ngram_size": 30,
"window_size": 90,
# whitelist: <td>, </td>
"whitelist_token_ids": [128821, 128822],
},
},
)
print(f"Response costs: {time.time() - start:.2f}s")
print(f"Generated text: {response.choices[0].message.content}")
For document-to-markdown conversion:
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "your_image_url"
}
},
{
"type": "text",
"text": "<|grounding|>Convert the document to markdown."
}
]
}
]
You can pass additional vLLM parameters when running the container:
docker run -d \ --name deepseek-ocr \ -p 8080:8080 \ --gpus all \ --ipc=host \ deepseek-ocr/deepseek-ocr:latest \ --max-model-len 8192 \ --max-num-batched-tokens 4096
DeepSeek-OCR supports different processing modes: