「简体中文」|「English」
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab. It is trained on tens of millions of hours of real speech data, possessing powerful contextual understanding capabilities and industry adaptability. It supports low-latency real-time transcription and covers 31 languages. It excels in vertical domains such as education and finance, accurately recognizing professional terminology and industry expressions, effectively addressing challenges like "hallucination" generation and language confusion, achieving "clear hearing, understanding meaning, and accurate writing."
Model Repository: modelscope, huggingface
Online Experience: ModelScope Community Space, huggingface space
Fun-ASR focuses on high-precision speech recognition, multi-language support, and industry customization capabilities
pip install -r requirements.txt
from funasr import AutoModel
def main():
model_dir = "FunAudioLLM/fun-asr-nano"
model = AutoModel(
model=model_dir,
trust_remote_code=True,
remote_code="./model.py",
device="cuda:0",
)
wav_path = f"{model.model_path}/example/zh.mp3"
res = model.generate(input=[wav_path], cache={}, batch_size=1)
text = res[0]["text"]
print(text)
model = AutoModel(
model=model_dir,
trust_remote_code=True,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
remote_code="./model.py",
device="cuda:0",
)
res = model.generate(input=[wav_path], cache={}, batch_size=1)
text = res[0]["text"]
print(text)
if __name__ == "__main__":
main()
from model import FunASRNano
def main():
model_dir = "FunAudioLLM/fun-asr-nano"
m, kwargs = FunASRNano.from_pretrained(model=model_dir, device="cuda:0")
m.eval()
wav_path = f"{kwargs['model_path']}/example/zh.mp3"
res = m.inference(data_in=[wav_path], **kwargs)
text = res[0][0]["text"]
print(text)
if __name__ == "__main__":
main()
model_dir: Model name or local disk model path.trust_remote_code: Whether to trust remote code for loading custom model implementations.remote_code: Specify the location of specific model code (e.g., model.py in the current directory), supporting both absolute and relative paths.device: Specify the device to use, such as "cuda:0" or "cpu".We compared the multi-language speech recognition performance of Fun-ASR with other models on open-source benchmark datasets (including AISHELL-1, AISHELL-2, Wenetspeech, Librispeech, and Common Voice).