GitLab AI Gateway is a standalone-service that will give access to AI features to all users of GitLab, no matter which instance they are using: self-managed, dedicated or GitLab.com.
See API.
You'll need:
docker compose >= 1.28gcloud CLIlibsqlite3-dev or sqlite-devel (depending on your platform);
install this before installing Python so it can compile against these libraries.mise (recommended) or asdf
mise is recommended over asdfmise, see instructions here.Set up a Google Cloud project with access to the Vertex AI API and authenticate to it locally by following these instructions.
See test doc.
This project uses the following linting tools:
To lint the entire projects, you can use the following command:
make lint
We are incrementally rolling out mypy static type checker to the project
(issue).
To show outstanding mypy warnings, you can use the following command:
make check-mypy TODO=true
To fix linting errors, you can use the following command:
make format
The format command only addresses black and isort issues.
There is an internal recording for GitLab members that provides an overview of this project.
We use Lefthook to lint code and doc
prior to Git committing. This repository comes with a Lefthook configuration
(lefthook.yml), but it must be installed.
Install Lefthook managed Git hooks:
lefthook install
Test Lefthook is working by running the Lefthook pre-commit Git hook:
lefthook run pre-commit
This should return the Lefthook version and the list of executable commands with output.
To disable Lefthook temporarily, you can set the LEFTHOOK environment variable
to 0. For instance:
LEFTHOOK=0 git commit ...
To run the pre-commit Git hook, run:
lefthook run pre-commit
This project is built with the following frameworks:
This repository follows The Clean Architecture paradigm, which define layers present in the system as well as their relations with each other, please refer to the linked article for more details.
For the Code Suggestions feature, most of the code is hosted at /ai_gateway.
In that directory, the following artifacts can be of interest:
app.py - main entry point for web applicationcode_suggestions/processing/base.py - that contains base classes for ModelEngine.code_suggestions/processing/completions.py and suggestions/processing/generations.py - contains ModelEngineCompletions and ModelEngineGenerations classes respectively.api/v2/endpoints/code.py - that houses implementation of main production Code Suggestions APIapi/v2/experimental/code.py - implements experimental endpoints that route requests to fixed external models for experimentation and testingMiddlewares are hosted at ai_gateway/api/middleware.py and interact with the context global variable that represents the API request.
Clone project and change to project directory.
Depending on the version manager you are using, run mise install or asdf install.
Init shell: poetry shell.
Install dependencies: poetry install.
Copy the example.env file to .env: cp example.env .env
Update the .env file in the root folder with the following variables:
ANTHROPIC_API_KEY=<API_KEY>
You can enable hot reload by setting the AIGW_FASTAPI__RELOAD environment variable to true in the .env file.
Ensure you're authenticated with the gcloud CLI by running gcloud auth application-default login.
Start the model-gateway server locally: poetry run ai_gateway.
Open http://localhost:5052/docs in your browser and run any requests to the model.
You might encounter a known symlink failure when installing poetry during mise install.
The error may look something like:
Error output: dyld[87914]: Library not loaded: @executable_path/../lib/libpython3.10.dylib Referenced from: <4C4C4415-5555-3144-A171-523C428CAE71> /Users/yourusername/Code/ai-assist/.venv/bin/python Reason: tried: '/Users/yourusername/Code/ai-assist/.venv/lib/libpython3.10.dylib' (no such file)
To fix the issue, locate the libpython3.10.dylib on your system. Once you have located the file, use the ln -s command to create a symbolic link from the location where poetry expects it to be to where it is actually located.
Example command:
ln -s /Users/yourusername/.local/share/mise/installs/python/3.10.14/lib/libpython3.10.dylib /Users/yourusername/Code/ai-assist/.venv/lib/libpython3.10.dylib
Next, try installing poetry again.
If you do not require real models to run and evaluate the input data, you can mock the model responses
by setting the environment variable AIGW_MOCK_MODEL_RESPONSES=true.
The models will start echoing the given prompts, while allowing you to run a fully functional AI gateway.
This can be useful for testing middleware, request/response interface contracts, logging, and other uses cases that do not require an AI model to execute.
Agentic Chat can be mocked by setting the environment variables AIGW_USE_AGENTIC_MOCK=true and AIGW_MOCK_MODEL_RESPONSES=true. You can specify a sequence of responses to simulate a multi-step flow.
See the documentation for details.
AI Gateway workflow includes additional pre and post-processing steps. By default, the log level is INFO and
application writes log to stdout. If you want to log data between different steps for development purposes
and to a file, please update the .env file by setting the following variables:
AIGW_LOGGING__LEVEL=debug AIGW_LOGGING__TO_FILE=../modelgateway_debug.log
poetry shell or poetry install should create the virtualenv environment.. ./.venv/bin/activate.deactivate.poetry env list.poetry env remove [name of virtualenv].poetry install --sync
If you're experiencing unexpected package conflicts, import errors, or your development environment has accumulated extra packages
over time, the --sync flag ensures your environment exactly matches the project's lock file. This command installs missing
dependencies, removes any extraneous packages that aren't defined in poetry.lock, effectively resetting your environment to a clean
state.
This is particularly useful when switching between branches with different dependencies, after removing packages from
pyproject.toml, or when your local environment has diverged from the project's intended state.
Make sure you have credentials for a Google Cloud project (with the Vertex API enabled) located at ~/.config/gcloud/application_default_credentials.json.
This should happen automatically when you run gcloud auth application-default login. If for any reason this JSON file is at a
different path, you will need to override the volumes configuration by creating or updating a docker-compose.override.yaml file.
You can either run make develop-local or docker-compose -f docker-compose.dev.yaml up --build --remove-orphans.
If you need to change configuration for a Docker Compose service, you can add it to docker-compose.override.yaml.
Any changes made to services in this file will be merged into the default settings.
Next open the VS Code extension project, and run the development version of the GitLab Workflow extension locally. See Configuring Development Environment for more information.
In VS Code code, we need to set the MODEL_GATEWAY_AI_ASSISTED_CODE_SUGGESTIONS_API_URL constant to http://localhost:5000/completions.
Since the feature is only for SaaS, you need to run GDK in SaaS mode:
export GITLAB_SIMULATE_SAAS=1 gdk restart
Then go to /admin/application_settings/general, expand Account and limit, and enable Allow use of licensed EE features.
You also need to make sure that the group you are allowing, is actually ultimate as it's an ultimate only feature,
go to /admin/groups select Edit on the group you are using, set Plan to Ultimate.
See authentication and authorization doc.
See internal events doc for more information on how to add internal events and test internal event collection with Snowplow locally.
In above diagram, the main components are shown.
The Client has the following functions:
inlineCompletions.We are supporting the following clients:
AI Gateway is continuously deployed to Runway.
This deployment is currently available at https://ai-gateway.runway.gitlab.net.
Note, however, that clients should not connect to this host directly, but use cloud.gitlab.com/ai instead,
which is managed by Cloudflare and is the entry point GitLab instances use instead.
When an MR gets merged, CI will build a new Docker image, and trigger a Runway downstream pipeline that will deploy this image to staging, and then production. Downstream pipelines run against the deployment project.
The service overview dashboard is available at https://dashboards.gitlab.net/d/ai-gateway-main/ai-gateway-overview.
Note that while the runway pods are running in the gitlab-runway-production GCP project, all Vertex API calls target the gitlab-ai-framework-prod (and -stage, -dev) GCP project for isolation purposes. This project is managed through terraform. Monitoring for those calls is provided through stackdriver-exporter.
Duo Workflow Service is continuously deployed to Runway.
This deployment is currently available at https://duo-workflow-svc.runway.gitlab.net.
When an MR gets merged, CI will build a new Docker image, and trigger a Runway downstream pipeline that will deploy this image to staging, and then production. Downstream pipelines run against the deployment project.
The service overview dashboard is available at here.
Currently, the service doesn't have a dependency on gitlab-runway-production GCP project. In the future, it could use Vertex AI resources through it.
For staging-ref environment, the deployment is powered by Runway, and is named as ai-gateway-custom.
The deployment for staging-ref differs from other production environments in both its nature and configuration. This deployment specifically powers Code Suggestions and Duo Chat when using Custom Models, and may use a different set of secret variables compared to other production deployments. The Group Custom Models team (#g_custom_models on Slack) is responsible for managing changes to deployments in this environment and maintains ownership of it.
Important MRs:
ai-gateway-customai-gateway-custom's URL as the AI Gateway endpoint for staging-refai-gateway-custom to Runway ProvisionerFor more information and assistance, please check out:
#f_runway in Slack.See release doc.
Access to AI Gateway is subjected to rate limiting defined as part of https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/cloud_connector/README.md#rate-limiting.
By default, the AI Gateway runs a single process to handle HTTP requests. To increase throughput, you may want to spawn multiple workers. To do this, there are a number of environment variables that need to be set:
WEB_CONCURRENCY: The number of worker processes to run (1 is default).PROMETHEUS_MULTIPROC_DIR: This is needed to support scraping of Prometheus metrics from a single endpoint.This directory holds the metrics from the processes and should be cleared before the application starts.
On every merge to the main branch, a GitLab Pages job automatically deploys the following components:
The prompt directory structure is deployed to /prompt_directory_structure.
This endpoint exposes the available prompt versions for various AI features and model families supported by the AI Gateway. Introduced in !2139.
See Maintainership.