FreedomIntelligence
/

EchoX-8B

audio-text-to-audio-text

speech-understanding

Model card Files Files and versions

EchoX-8B / README.md

KurtDu's picture

Update README.md

50a1397 verified about 1 month ago

|

history blame contribute delete

2.47 kB

	---
	language:
	- en
	tags:
	- audio-text-to-audio-text
	- speech-understanding
	- audio
	- chat
	license: apache-2.0
	datasets:
	- custom
	metrics:
	- wer
	- bleu
	- AIR-Bench
	---
	<div align="center">
	<h1>
	EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
	</h1>
	</div>

	<p align="center">
	<font size="3">
	<a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ Github</a>&nbsp｜&nbsp
	<a href="https://arxiv.org/abs/2509.09174">📃 Paper</a>&nbsp｜&nbsp
	<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 Space</a>&nbsp｜&nbsp
	<a href="https://huggingface.co/datasets/FreedomIntelligence/EchoX-Dialougues">📊 EchoX-Dialougues</a>&nbsp｜&nbsp
	<a href="https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus">📊 EchoX-Dialogues-Plus</a>
	</font>
	</p>


	## Model Description
	EchoX is a Speech-to-Speech large language model that addresses the acoustic-semantic gap. By introducing Echo Training, EchoX integrates semantic and acoustic learning, mitigating the degradation of reasoning ability observed in existing speech-based LLMs. It is trained on only 10k hours of data while delivering state-of-the-art results in knowledge-based question answering and speech interaction tasks.

	### Key Features
	<div>
	<ul>
	<font size="3"><li>Mitigates Acoustic-Semantic Gap in Speech-to-Speech LLMs</li></font>
	<font size="3"><li>Introduces Echo Training with a Novel Three-Stage Pipeline (S2T, T2C, Echo)</li></font>
	<font size="3"><li>Trained on Only 10k Hours of Curated Data, Ensuring Efficiency</li></font>
	<font size="3"><li>Achieves State-of-the-Art Performance in Knowledge-Based QA Benchmarks</li></font>
	<font size="3"><li>Preserves Reasoning and Knowledge Abilities for Interactive Speech Tasks</li></font>
	</ul>
	</div>

	## Usage
	Load the EchoX model and run inference with your audio files as shown in the <a href="https://github.com/FreedomIntelligence/EchoX">GitHub repository</a>.

	# <span>📖 Citation</span>
	```
	@misc{zhang2025echoxmitigatingacousticsemanticgap,
	title={EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs},
	author={Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li},
	year={2025},
	eprint={2509.09174},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2509.09174},
	}
	```