Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend.

Recent Updates (v2.1.1)
- ✅ Fixed image removal button - now properly clears and allows re-upload
- ✅ Fixed multiple bounding boxes parsing - handles
[[x1,y1,x2,y2], [x1,y1,x2,y2]]format- ✅ Simplified to 4 core working modes for better stability
- ✅ Fixed bounding box coordinate scaling (normalized 0-999 → actual pixels)
- ✅ Fixed HTML rendering (model outputs HTML, not Markdown)
- ✅ Increased file upload limit to 100MB (configurable)
- ✅ Added .env configuration support
Clone and configure:
git clone <repository-url>
cd deepseek_ocr_app
# Copy and customize environment variables
cp .env.example .env
# Edit .env to configure ports, upload limits, etc.
Start the application:
docker compose up --build
The first run will download the model (~5-10GB), which may take some time.
Access the application:
The application can be configured via the .env file:
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
# Frontend Configuration
FRONTEND_PORT=3000
# Model Configuration
MODEL_NAME=deepseek-ai/DeepSeek-OCR
HF_HOME=/models
# Upload Configuration
MAX_UPLOAD_SIZE_MB=100 # Maximum file upload size
# Processing Configuration
BASE_SIZE=1024 # Base processing resolution
IMAGE_SIZE=640 # Tile processing resolution
CROP_MODE=true # Enable dynamic cropping for large images
API_HOST: Backend API host (default: 0.0.0.0)API_PORT: Backend API port (default: 8000)FRONTEND_PORT: Frontend port (default: 3000)MODEL_NAME: HuggingFace model identifierHF_HOME: Model cache directoryMAX_UPLOAD_SIZE_MB: Maximum file upload size in megabytesBASE_SIZE: Base image processing size (affects memory usage)IMAGE_SIZE: Tile size for dynamic croppingCROP_MODE: Enable/disable dynamic image croppingdeepseek-ocr/ ├── backend/ # FastAPI backend │ ├── main.py │ ├── requirements.txt │ └── Dockerfile ├── frontend/ # React frontend │ ├── src/ │ │ ├── components/ │ │ ├── App.jsx │ │ └── main.jsx │ ├── package.json │ ├── nginx.conf │ └── Dockerfile ├── models/ # Model cache └── docker-compose.yml
Docker compose cycle to test:
docker compose down docker compose up --build
Docker & Docker Compose (latest version recommended)
NVIDIA Driver - Installing NVIDIA Drivers on Ubuntu (Blackwell/RTX 5090)
Note: Getting NVIDIA drivers working on Blackwell GPUs can be a pain! Here's what worked:
The key requirements for RTX 5090 on Ubuntu 24.04:
Step-by-Step Instructions:
Install NVIDIA Open Driver (580 or newer)
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt remove --purge nvidia*
sudo nvidia-installer --uninstall # If you have it
sudo apt autoremove
sudo apt install nvidia-driver-580-open
Upgrade Linux Kernel to 6.11+ (for Ubuntu 24.04 LTS)
sudo apt install --install-recommends linux-generic-hwe-24.04 linux-headers-generic-hwe-24.04 sudo update-initramfs -u sudo apt autoremove
Reboot
sudo reboot
Enable Resize Bar in UEFI/BIOS
Verify Installation
nvidia-smi
You should see your RTX 5090 listed!
💡 Why open drivers? I dunno, but the open drivers have better support for Blackwell GPUs. Without Resize Bar enabled, you'll get a black screen even with correct drivers!
Credit: Solution adapted from this Reddit thread.
NVIDIA Container Toolkit (required for GPU access in Docker)
[[x1,y1,x2,y2], [x1,y1,x2,y2]] failedast.literal_evalactual_coord = (normalized_coord / 999) * image_dimensiondangerouslySetInnerHTMLclient_max_body_size to 100MB (configurable via .env)The DeepSeek-OCR model uses a normalized coordinate system (0-999) for bounding boxes:
pixel_coord = (model_coord / 999) * actual_dimensionFor large images, the model uses dynamic cropping:
process/image_process.py for implementation details<|ref|>label<|/ref|><|det|>[[coords]]<|/det|> tagsParameters:
image (file, required) - Image file to process (up to 100MB)mode (string) - OCR mode: plain_ocr | describe | find_ref | freeformprompt (string) - Custom prompt for freeform modegrounding (bool) - Enable bounding boxes (auto-enabled for find_ref)find_term (string) - Term to locate in find_ref mode (supports multiple matches)base_size (int) - Base processing size (default: 1024)image_size (int) - Tile size for cropping (default: 640)crop_mode (bool) - Enable dynamic cropping (default: true)include_caption (bool) - Add image description (default: false)Response:
{
"success": true,
"text": "Extracted text or HTML output...",
"boxes": [{"label": "field", "box": [x1, y1, x2, y2]}],
"image_dims": {"w": 1920, "h": 1080},
"metadata": {
"mode": "layout_map",
"grounding": true,
"base_size": 1024,
"image_size": 640,
"crop_mode": true
}
}
Note on Bounding Boxes:
[[x1,y1,x2,y2], [x1,y1,x2,y2], ...]Here are some example images showcasing different OCR capabilities:



nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
sudo lsof -i :3000 sudo lsof -i :8000
cd frontend
rm -rf node_modules package-lock.json
docker-compose build frontend
This project uses the DeepSeek-OCR model. Refer to the model's license terms.