I am a second-year master’s student in Zhejiang University. I also worked at MSRA-Natural Language Computing Group as a research intern in Beijing ago, doing some LLM and speech research.

I graduate from the Department of Software Engineering in JiLin University (吉林大学软件学院) with a bachelor’s degree and continue to study in Zhejiang University (浙江大学软件学院) with a master’s degree now, advised by Zhou Zhao (赵洲). I collaborate with Zhou Long (周龙), ShuJie Liu (刘树杰) from Microsoft Research Asia closely. I also collaborate with Qian chen (陈谦), Wen Wang (王雯) from Alibaba Damo Academy Speech Lab. I have learned a lot from them.

My research interest includes speech synthesis, discrete codec, generative model and LLM. I have published some papers (第一作者/共一第一) at the top international AI conferences such as ICLR2025, ACL2024, AAAI2025, ICASSP2024 with total google scholar citations .

🔥 News

  • 2025.02: I was selected as a reviewer for ICCV 2025 and NeurIPS 2025.
  • 2025.01: 🎉🎉WavTokenizer is accepted by ICLR 2025! three other papers (co-worker) are also accepted by ICLR 2025. I was selected as a reviewer for ARR (ACL 2025) and an outstanding reviewer for ICASSP 2025.
  • 2024.12: DiscreteWM is accepted by AAAI 2025, PFlow-VC is accepted by ICASSP 2025 and I was selected as a reviewer for IJCAI 2025, ICML 2025.
  • 2024.11: We release WavChat (A survey of spoken dialogue models about 60 pages) on arxiv.
  • 2024.10: 🎉🎉 I win the Nation Scholarship in the first year of master and was selected as a Top Reviewer for NeuriPS 2024.
  • 2024.10: I was selected as a reviewer for CVPR 2025 and AISTATS 2025 (Statistics and Machine Learning).
  • 2024.09: One paper(co-worker) is accepted by 2024 EMNLP Main.
  • 2024.07: Alibaba Tongyi (co-worker) open source a large speech system and release technical report FunAudioLLM(CosyVoice) which will largely influence the speech area!
  • 2024.07: One paper(co-worker) is accepted by 2024 ACMMM.
  • 2024.05: MobileSpeech is accepted by 2024 ACL Main(Top conference in nlp)!
  • 2024.04: I join Alibaba, DAMO Academy, Tongyi Lab as a research intern.
  • 2024.01: MobileSpeech has been successfully deployed into Magic6 series in Honor Mobile phone!
  • 2024.01: MagaTTS 2 (co-worker) is accepted by 2024 ICLR (Top conference in machine learning)!
  • 2023.12: TextrolSpeech is accepted by 2024 ICASSP (Top conference in speech)!
  • 2023.11: One Paper (co-worker) is accepted by CCFA IEEE Transactions on Computers.
  • 2023.11: Megatts has been successfully deployed into products at ByteDance!
  • 2023.03: 🎉🎉 I join Microsoft Research Asia(MSRA), Natural Language Computing Group as a research intern!
  • 2021.11: I join Tsinghua Shenzhen International Graduate School as a remote intern.

📝 Publications(一作/共一/高影响力)

🎙 Controllable and Zero-shot Text-to-Speech, Codec Representation

ICLR 2025
sym

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Authors: Shengpeng Ji, Ziyue Jiang, Wen Wang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Zhou Zhao

  • Hugingface Face Daily paper rank3, Our work are promoted by different media and forums, such as Speech Home, and Twitter, which is a Trending Project at both Github and Paperwithcode.
  • Audio samples are available in this website
  • Code is available in this
ICASSP 2024
sym

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

  • Audio samples are available in this website
  • Code is available in this

ACL 2024 Main
sym

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Authors: Shengpeng Ji, Ziyue Jiang, Hanting Wang, Jialong Zuo, Zhou Zhao

  • Audio samples are available in this website
  • This work was deployed in the Honor phone Magic6 series.
AAAI 2025
sym

DiscreteWM: Speech Watermarking with Discrete Representations
Authors: Shengpeng Ji, Ziyue Jiang, Jialon Zuo, Minghui Fang, Yifu Chen, Tao Jin, Zhou Zhao

  • Audio samples are available in this website

ICASSP 2025 PFlow-VC: Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model, Jialong Zuo*, Shengpeng Ji*, Minghui Fang, Ziyue Jiang, Xize Cheng, Qian Yang,Wenrui Liu, Guangyan Zhang, Zehai Tu, Yiwen Guo, Zhou Zhao

Alibaba Technical report FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs, Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang, Zhangyu Xiao, Zhijie Yan, Yexin Yang, Bin Zhang, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Siqi Zheng

ICLR 2024 (Zero-shot TTS) MegaTTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis, Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun MA, Zhou Zhao

🎖 Honors and Awards

  • 2024.10 National Scholarship (master) (Top 1%, 2/327)
  • 2023.06 Outstanding graduate of Jilin University (Top 5%)
  • 2023.06 One-class scholarship of Jilin University (Top 1%, 1/392)
  • 2022.10 Second-class scholarship of Jilin University
  • 2021.10 National Scholarship (Undergraduate) (Top 1%, 5/392)
  • 2020.10 Third-class scholarship of Jilin University

📖 Educations

  • 2023.09 - 2026.03, Master, Software Engineering, Zhejiang University.
  • 2019.09 - 2023.06, Undergraduate, Software Engineering, JiLin Univeristy.

🧑‍🎨 Professional Services

Conference Reviewer/Program Committee: EMNLP 2023, ACM-MM 2024, ECCV 2024, NeurIPS 2024 (outstanding reviewer), ICASSP 2025 (outstanding reviewer), AISTATS 2025, ICLR 2025, CVPR 2025, IJCAI 2025, ICML 2025, ACL 2025 (ARR 2), ICCV 2025, NeurIPS 2025

💻 Internships