logo
0
0
Login

GitLab AI Gateway

GitLab AI Gateway is a standalone-service that will give access to AI features to all users of GitLab, no matter which instance they are using: self-managed, dedicated or GitLab.com.

API

See API.

Prerequisites

You'll need:

  • Docker
  • docker compose >= 1.28
  • gcloud CLI
  • sqlite development libraries
    • This package is usually called libsqlite3-dev or sqlite-devel (depending on your platform); install this before installing Python so it can compile against these libraries.
  • mise (recommended) or asdf

Google Cloud SDK

Set up a Google Cloud project with access to the Vertex AI API and authenticate to it locally by following these instructions.

Testing

See test doc.

Linting

This project uses the following linting tools:

  • Black: Enforces a consistent code style.
  • isort: Organizes and sorts imports.
  • pylint: Analyzes code for potential errors and style issues.
  • mypy: Performs static type checking.

To lint the entire projects, you can use the following command:

make lint

We are incrementally rolling out mypy static type checker to the project (issue). To show outstanding mypy warnings, you can use the following command:

make check-mypy TODO=true

To fix linting errors, you can use the following command:

make format

The format command only addresses black and isort issues.

There is an internal recording for GitLab members that provides an overview of this project.

Running lint on Git commit

We use Lefthook to lint code and doc prior to Git committing. This repository comes with a Lefthook configuration (lefthook.yml), but it must be installed.

  1. Install Lefthook managed Git hooks:

    lefthook install
  2. Test Lefthook is working by running the Lefthook pre-commit Git hook:

    lefthook run pre-commit

    This should return the Lefthook version and the list of executable commands with output.

Disable Lefthook temporarily

To disable Lefthook temporarily, you can set the LEFTHOOK environment variable to 0. For instance:

LEFTHOOK=0 git commit ...

Run Lefthook hooks manually

To run the pre-commit Git hook, run:

lefthook run pre-commit

Frameworks

This project is built with the following frameworks:

  1. FastAPI
  2. Dependency Injector

Project architecture

This repository follows The Clean Architecture paradigm, which define layers present in the system as well as their relations with each other, please refer to the linked article for more details.

Project structure

For the Code Suggestions feature, most of the code is hosted at /ai_gateway. In that directory, the following artifacts can be of interest:

  1. app.py - main entry point for web application
  2. code_suggestions/processing/base.py - that contains base classes for ModelEngine.
  3. code_suggestions/processing/completions.py and suggestions/processing/generations.py - contains ModelEngineCompletions and ModelEngineGenerations classes respectively.
  4. api/v2/endpoints/code.py - that houses implementation of main production Code Suggestions API
  5. api/v2/experimental/code.py - implements experimental endpoints that route requests to fixed external models for experimentation and testing

Middlewares are hosted at ai_gateway/api/middleware.py and interact with the context global variable that represents the API request.

Application settings

See Application settings doc

How to run the server locally

  1. Clone project and change to project directory.

  2. Depending on the version manager you are using, run mise install or asdf install.

  3. Install the poetry-plugin-shell

  4. Init shell: poetry shell.

  5. Activate virtualenv.

  6. Install dependencies: poetry install.

  7. Copy the example.env file to .env: cp example.env .env

  8. Update the .env file in the root folder with the following variables:

    ANTHROPIC_API_KEY=<API_KEY>
  9. You can enable hot reload by setting the AIGW_FASTAPI__RELOAD environment variable to true in the .env file.

  10. Ensure you're authenticated with the gcloud CLI by running gcloud auth application-default login.

  11. Start the model-gateway server locally: poetry run ai_gateway.

  12. Open http://localhost:5052/docs in your browser and run any requests to the model.

Troubleshooting

Installation of Poetry 1.8.3 fails

You might encounter a known symlink failure when installing poetry during mise install.

The error may look something like:

Error output: dyld[87914]: Library not loaded: @executable_path/../lib/libpython3.10.dylib Referenced from: <4C4C4415-5555-3144-A171-523C428CAE71> /Users/yourusername/Code/ai-assist/.venv/bin/python Reason: tried: '/Users/yourusername/Code/ai-assist/.venv/lib/libpython3.10.dylib' (no such file)

To fix the issue, locate the libpython3.10.dylib on your system. Once you have located the file, use the ln -s command to create a symbolic link from the location where poetry expects it to be to where it is actually located.

Example command:

ln -s /Users/yourusername/.local/share/mise/installs/python/3.10.14/lib/libpython3.10.dylib /Users/yourusername/Code/ai-assist/.venv/lib/libpython3.10.dylib

Next, try installing poetry again.

Mocking AI model responses

If you do not require real models to run and evaluate the input data, you can mock the model responses by setting the environment variable AIGW_MOCK_MODEL_RESPONSES=true. The models will start echoing the given prompts, while allowing you to run a fully functional AI gateway.

This can be useful for testing middleware, request/response interface contracts, logging, and other uses cases that do not require an AI model to execute.

Agentic Chat can be mocked by setting the environment variables AIGW_USE_AGENTIC_MOCK=true and AIGW_MOCK_MODEL_RESPONSES=true. You can specify a sequence of responses to simulate a multi-step flow. See the documentation for details.

Logging requests and responses during development

AI Gateway workflow includes additional pre and post-processing steps. By default, the log level is INFO and application writes log to stdout. If you want to log data between different steps for development purposes and to a file, please update the .env file by setting the following variables:

AIGW_LOGGING__LEVEL=debug AIGW_LOGGING__TO_FILE=../modelgateway_debug.log

How to manually activate the virtualenv

  • poetry shell or poetry install should create the virtualenv environment.
  • To activate virtualenv, use command: . ./.venv/bin/activate.
  • To deactivate your virtualenv, use command: deactivate.
  • To list virtualenvs, use poetry env list.
  • To remove virtualenv, use poetry env remove [name of virtualenv].

Resolving Dependency Conflicts with Poetry

poetry install --sync

If you're experiencing unexpected package conflicts, import errors, or your development environment has accumulated extra packages over time, the --sync flag ensures your environment exactly matches the project's lock file. This command installs missing dependencies, removes any extraneous packages that aren't defined in poetry.lock, effectively resetting your environment to a clean state.

This is particularly useful when switching between branches with different dependencies, after removing packages from pyproject.toml, or when your local environment has diverged from the project's intended state.

Local development using GDK

Prerequisites

Make sure you have credentials for a Google Cloud project (with the Vertex API enabled) located at ~/.config/gcloud/application_default_credentials.json. This should happen automatically when you run gcloud auth application-default login. If for any reason this JSON file is at a different path, you will need to override the volumes configuration by creating or updating a docker-compose.override.yaml file.

Running the API

You can either run make develop-local or docker-compose -f docker-compose.dev.yaml up --build --remove-orphans. If you need to change configuration for a Docker Compose service, you can add it to docker-compose.override.yaml. Any changes made to services in this file will be merged into the default settings.

Next open the VS Code extension project, and run the development version of the GitLab Workflow extension locally. See Configuring Development Environment for more information.

In VS Code code, we need to set the MODEL_GATEWAY_AI_ASSISTED_CODE_SUGGESTIONS_API_URL constant to http://localhost:5000/completions.

Since the feature is only for SaaS, you need to run GDK in SaaS mode:

export GITLAB_SIMULATE_SAAS=1 gdk restart

Then go to /admin/application_settings/general, expand Account and limit, and enable Allow use of licensed EE features.

You also need to make sure that the group you are allowing, is actually ultimate as it's an ultimate only feature, go to /admin/groups select Edit on the group you are using, set Plan to Ultimate.

Authentication

See authentication and authorization doc.

Internal Events

See internal events doc for more information on how to add internal events and test internal event collection with Snowplow locally.

Component overview

In above diagram, the main components are shown.

Client

The Client has the following functions:

  1. Determine input parameters.
    1. Stop sequences.
    2. Gather code for the prompt.
  2. Send the input parameters to the AI Gateway API.
  3. Parse results from AI Gateway and present them as inlineCompletions.

We are supporting the following clients:

Deployment

For production AI Gateway environments

AI Gateway is continuously deployed to Runway.

This deployment is currently available at https://ai-gateway.runway.gitlab.net. Note, however, that clients should not connect to this host directly, but use cloud.gitlab.com/ai instead, which is managed by Cloudflare and is the entry point GitLab instances use instead.

When an MR gets merged, CI will build a new Docker image, and trigger a Runway downstream pipeline that will deploy this image to staging, and then production. Downstream pipelines run against the deployment project.

The service overview dashboard is available at https://dashboards.gitlab.net/d/ai-gateway-main/ai-gateway-overview.

Note that while the runway pods are running in the gitlab-runway-production GCP project, all Vertex API calls target the gitlab-ai-framework-prod (and -stage, -dev) GCP project for isolation purposes. This project is managed through terraform. Monitoring for those calls is provided through stackdriver-exporter.

For production Duo Workflow Service environments

Duo Workflow Service is continuously deployed to Runway.

This deployment is currently available at https://duo-workflow-svc.runway.gitlab.net.

When an MR gets merged, CI will build a new Docker image, and trigger a Runway downstream pipeline that will deploy this image to staging, and then production. Downstream pipelines run against the deployment project.

The service overview dashboard is available at here.

Currently, the service doesn't have a dependency on gitlab-runway-production GCP project. In the future, it could use Vertex AI resources through it.

For staging-ref

For staging-ref environment, the deployment is powered by Runway, and is named as ai-gateway-custom.

The deployment for staging-ref differs from other production environments in both its nature and configuration. This deployment specifically powers Code Suggestions and Duo Chat when using Custom Models, and may use a different set of secret variables compared to other production deployments. The Group Custom Models team (#g_custom_models on Slack) is responsible for managing changes to deployments in this environment and maintains ownership of it.

Important MRs:

For more information and assistance, please check out:

Release

See release doc.

Rate limiting

Access to AI Gateway is subjected to rate limiting defined as part of https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/cloud_connector/README.md#rate-limiting.

Multiple worker processes

By default, the AI Gateway runs a single process to handle HTTP requests. To increase throughput, you may want to spawn multiple workers. To do this, there are a number of environment variables that need to be set:

This directory holds the metrics from the processes and should be cleared before the application starts.

GitLab Pages Deployment

On every merge to the main branch, a GitLab Pages job automatically deploys the following components:

Prompt directory structure

The prompt directory structure is deployed to /prompt_directory_structure.

This endpoint exposes the available prompt versions for various AI features and model families supported by the AI Gateway. Introduced in !2139.

How to become a project maintainer

See Maintainership.

About

No description, topics, or website provided.
17.39 MiB
0 forks0 stars1 branches45 TagREADMEOther license
Language
Python95.5%
Jinja3.6%
Ruby0.2%
Shell0.2%
Others0.5%