logo
0
0
WeChat Login
Copilot<198982749+Copilot@users.noreply.github.com>
Update .cnb.yml to fetch en-zh model URLs from new models.json API (#14)

LinguaSpark - Translation Service

GitHub Repo Docker Image

A lightweight multilingual translation service based on Rust and Bergamot translation engine, compatible with multiple translation frontend APIs.

简体中文

Project Background

This project originated when I discovered the MTranServer repository, which uses Firefox Translations Models for machine translation and is compatible with APIs like Immersive Translate and Kiss Translator, but found that it wasn't open-sourced yet.

While searching for similar projects, I found Mozilla's translation-service, which works but hasn't been updated for a year and isn't compatible with Immersive Translate or Kiss Translator APIs. Since that project is written in C++ and I'm not very familiar with C++, I rewrote this project in Rust.

Features

Tech Stack

Deployment

Docker is the only recommended deployment method for this service.

Option 1: Using pre-built image (with your own translation models)

# Create models directory mkdir -p models # Download your models here # Pull and start container docker run -d --name translation-service \ -p 3000:3000 \ -v "$(pwd)/models:/app/models" \ ghcr.io/linguaspark/server:main

Option 2: Using pre-built image with English-Chinese model (China mirror)

docker run -d --name translation-service \ -p 3000:3000 \ docker.cnb.cool/aalivexy/translation-service:latest

Note: The English-Chinese model image is about 70MiB, and each worker uses approximately 300MiB+ of memory with low translation latency.

Docker Compose Deployment

Create a compose.yaml file:

services: translation-service: image: ghcr.io/linguaspark/server:main ports: - "3000:3000" volumes: - ./models:/app/models environment: API_KEY: "your_api_key" # Optional, leave empty to disable API key protection restart: unless-stopped healthcheck: test: ["CMD", "/bin/sh", "-c", "echo -e 'GET /health HTTP/1.1\r\nHost: localhost:3000\r\n\r\n' | timeout 5 bash -c 'cat > /dev/tcp/localhost/3000' && echo 'Health check passed'"] interval: 30s timeout: 10s retries: 3

Start the service:

docker compose up -d

Custom Image for Specific Language Pairs

If you need to create a custom image with specific language pairs, use this Dockerfile template:

FROM ghcr.io/linguaspark/server:main COPY ./your-models-directory /app/models ENV MODELS_DIR=/app/models ENV NUM_WORKERS=1 ENV IP=0.0.0.0 ENV PORT=3000 ENV RUST_LOG=info EXPOSE 3000 ENTRYPOINT ["/app/server"]

Translation Models

Getting Models

  1. Download pre-trained models from Firefox Translations Models
  2. Place them in the models directory with the following structure:
models/ ├── enzh/ # Language pair directory name format: "[source language code][target language code]" │ ├── model.intgemm8.bin # Translation model │ ├── model.s2t.bin # Shortlist file │ ├── srcvocab.spm # Source language vocabulary │ └── trgvocab.spm # Target language vocabulary └── zhen/ # Another language pair └── ...

Language Pair Support

The translation service will automatically scan all language pair directories under the models directory and load them. Directory names should follow the [source language][target language] format using ISO 639-1 language codes.

Environment Variables

Variable NameDescriptionDefault Value
MODELS_DIRPath to models directory/app/models
NUM_WORKERSNumber of translation worker threads1
IPIP address for the service to listen on127.0.0.1
PORTPort for the service to listen on3000
API_KEYAPI key (leave empty to disable)""
RUST_LOGLog levelinfo

API Endpoints

Native API

Translate

POST /translate

Request body:

{ "text": "Hello world", "from": "en", // Optional, omit to auto-detect "to": "zh" }

Response:

{ "text": "你好世界", "from": "en", "to": "zh" }

Language Detection

POST /detect

Request body:

{ "text": "Hello world" }

Response:

{ "language": "en" }

Compatible APIs

Immersive Translate API

POST /imme

Request body:

{ "source_lang": "auto", // Optional, omit to auto-detect "target_lang": "zh", "text_list": ["Hello world", "How are you?"] }

Response:

{ "translations": [ { "detected_source_lang": "en", "text": "你好世界" }, { "detected_source_lang": "en", "text": "你好吗?" } ] }

Kiss Translator API

POST /kiss

Request body:

{ "text": "Hello world", "from": "en", // Optional, omit to auto-detect "to": "zh" }

Response:

{ "text": "你好世界", "from": "en", "to": "zh" }

HCFY API

POST /hcfy

Request body:

{ "text": "Hello world", "source": "英语", // Optional, omit to auto-detect "destination": ["中文(简体)"] }

Response:

{ "text": "Hello world", "from": "英语", "to": "中文(简体)", "result": ["你好世界"] }

DeepLX API

POST /deeplx

Request body:

{ "text": "Hello world", "source_lang": "EN", "target_lang": "ZH" }

Response:

{ "code": 200, "id": 1744646400, "data": "你好世界", "alternatives": [], "source_lang": "EN", "target_lang": "ZH", "method": "Free" }

Health Check

GET /health

Response:

{ "status": "ok" }

Authentication

If the API_KEY environment variable is set, all API requests must provide authentication credentials using one of the following methods:

  1. Authorization header: Authorization: Bearer your_api_key
  2. Query parameter: ?token=your_api_key

License

This project is open-sourced under the AGPL-3.0 license.

Acknowledgements

About

BergaRust - 一个基于 Rust 和 Bergamot 翻译引擎的轻量级多语言翻译服务,兼容多种翻译前端 API。

800.00 KiB
0 forks0 stars1 branches0 TagREADMEAGPL-3.0 license
Language
Rust96.5%
Dockerfile3.5%