Public

Code Issues Pull requests Events Packages Insights

main

Branch

Tag

小朋友艾沫雨

更新README.md

9a0199b1

17 commits

project
python_process
results
Framework.md
How2OptimizeAnOperator.md
README.md

LLM4Operator

Let LLM empowered the operator optimization.
The project name may be changed later.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

This project is a research project by C4Y, who is a student of Optima Research Group, and aims to use LLM to automatically optimize mathmetical calculating operators towards hardware design.

(back to top)

Getting Started

To get a local copy up and running follow these simple example steps.

The author strongly recommand to run the project in a Linux environment and use online LLM API services.

Prerequisites

This project is mainly divided into two parts: the operator library implemented in C language and the automatic optimizer implemented in Python. These two parts need to be configured separately.

Additionally, a remote or local operator performance evaluation program is required, which also needs to be configured separately.

This project also utilizes a large language model for optimization, which requires access to an LLM service that complies with the OpenAI API SDK. In the actual code implementation, the Deepseek API is used.

To configure the C language portion of this project, you will need a GCC environment, a RISC-V bare metal machine or an equivalent emulator (such as QEMU), and support for the GCC module with RVV 1.0 extensions.


sudo apt update
sudo apt install build-essential bison flex libgmp3-dev libmpc-dev libmpfr-dev texinfo
./configure --target=riscv64-unknown-elf --prefix=/opt/riscv --enable-languages=c,c++ --with-arch=rv64gc --with-abi=lp64d --enable-multilib --with-vector=$riscv_rvv_version
sudo apt install cmake

To configure the Python language portion of this project, you will need PDM, which is a modern Python package and dependency manager supporting the latest PEP standards, to create a virtual environment, check dependencies, and install and configure both binary and PyPI dependencies.
```
curl -sSL https://pdm-project.org/install-pdm.py | python3 -
pdm --version
```
The program running on Bianbu Cloud which is used to evaluate the performance will be open-sourced soon.

Installation

Get a free API Key at deepseek (to provide a LLM api service) and Bianbu Cloud (to provide a metal risc-v enviornment).

Clone the repo


git clone https://github.com/c4yg70/LLM4Opt.git

Install Python Portion
```
cd python_process
pdm install
```

Enter your API by set up the environment value


export DS_API_KEY = 'ENTER YOUR API'
export DS_API_URL = 'ENTER YOUR URL';
export SC_API_KEY = 'ENTER YOUR API';
export SC_API_URL = 'ENTER YOUR URL';

Check the API whether is correctly working
```
pdm check llm
pdm check riscv-cloud
```
Verify that all necessary C dependencies, including the GCC environment, RISC-V toolchain, and QEMU emulator (if applicable), are correctly installed and configured.
```
pdm check riscv-local
```

Run the project


pdm start-opti --cycle 1 --llm 1 --cloud_eval 1

(back to top)

Usage

Now we could use LLM4Opt to optimize 20+ operators (BasicFunctions and ActivationFunctions) in both CMSIS-NN projects, which is the ARM-optimized NN Lib developed by the ARM officially, and muRISCV-NN, which is the RISCV-optimized NN Lib developed by project Scale4Edge, and evaluate the performance of the optimized operators.

Performance are tested both in lib level and application level. Lib level tests, which show the efficiency of single operator's optimization, are provided in eval_test. Application level tests, which show the efficiency of operator's optimization affecting the real-world application, are used MLPerfTiny, proposed by C. Banbury in 2021. All results are evaluated in Spacemit K1 processor provided by Bianbu Cloud, and provided in results.

Normally, a successful optimization will cost about 8-20 million tokens, depends on CoT length.

(back to top)

Roadmap

One-step installation script
Intergration with Glenside
Allow optimizing more operators
- MatrixCalcFunctions
Allow optimizing towards more construction sets
- SSE/SSE2
- RVC23

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!