I am a first-year master’s student in Zhejiang University. I also worked at MSRA-Natural Language Computing Group as a research intern in Beijing ago, doing some LLM and speech research.

I graduate from the Department of Software Engineering in JiLin University (吉林大学软件学院) with a bachelor’s degree and continue to study in Zhejiang University (浙江大学软件学院) with a master’s degree now, advised by Zhou Zhao (赵洲). I also collaborate with Zhou Long (周龙), ShuJie Liu (刘树杰) from Microsoft Research Asia closely.

My research interest includes speech synthesis, Generative model and LLM. I have published some papers at the top international AI conferences such as ACL2024, ICLR2024, ICASSP2024 with total google scholar citations .

🔥 News: 2024.08.30 We propose SOTA Speech/Music/Audio Tokenizer WavTokenizer on arxiv 🎉🎉, Huggingface Daily Paper Rank3, Paper With Code Rank3.

  • 2024.10: 🎉🎉 I win the Nation Scholarship in the first year of master and was selected as a Top Reviewer for NeuriPS 2024.
  • 2024.10: I was selected as a reviewer for CVPR 2025 and AISTATS 2025 (Statistics and Machine Learning).
  • 2024.09: One paper(co-worker) is accepted by 2024 EMNLP Main.
  • 2024.08: I was selected as a reviewer for ICLR 2025 and ICASSP 2025.
  • 2024.07: Alibaba Tongyi (co-worker) open source a large speech system and release technical report FunAudioLLM which will largely influence the speech area!
  • 2024.07: One paper(co-worker) is accepted by 2024 ACMMM.
  • 2024.06: We propose ControlSpeech on arxiv.
  • 2024.05: I was selected as a reviewer for NIPS 2024.
  • 2024.05: MobileSpeech is accepted by 2024 ACL Main(Top conference in nlp)!
  • 2024.04: I join Alibaba, DAMO Academy, Tongyi Lab as a research intern.
  • 2024.03: I was selected as a reviewer for ECCV 2024.
  • 2024.02: We propose SOTA codec model Language-Codec on arxiv.
  • 2024.01: I was selected as a reviewer for ACM MM 2024.
  • 2024.01: MobileSpeech has been successfully deployed into Magic6 series in Honor Mobile phone!
  • 2024.01: MagaTTS 2 (co-worker) is accepted by 2024 ICLR (Top conference in machine learning)!
  • 2023.12: TextrolSpeech is accepted by 2024 ICASSP (Top conference in speech)!
  • 2023.11: One Paper (co-worker) is accepted by CCFA IEEE Transactions on Computers.
  • 2023.11: Megatts has been successfully deployed into products at ByteDance!
  • 2023.08:I was selected as a reviewer for EMNLP 2023.
  • 2023.03: 🎉🎉 I join Microsoft Research Asia(MSRA), Natural Language Computing Group as a research intern!
  • 2022.11: I join Ping An Technology Company as a speech junior algorithm engineer in Shanghai!
  • 2022.10: I got the offer of postgraduate study in the School of Software of Zhejiang University.
  • 2021.11: I join Tsinghua Shenzhen International Graduate School as a remote intern.
  • 2021.10: 🎉🎉 I win the Nation Scholarship (Top 1%) in the second year of undergraduate!

📝 Publications(一作/共一/高影响力)

🎙 Controllable and Zero-shot Text-to-Speech, Codec Representation

ICASSP 2024
sym

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

  • Audio samples are available in this website
  • Code is available in this

ACL 2024 Main
sym

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Authors: Shengpeng Ji*, Ziyue Jiang*, Hanting Wang, Jialong Zuo, Zhou Zhao

  • Audio samples are available in this website
under anonymous reviewing
sym

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

  • Codes are available in this website
  • Audio samples are available in this website
under anonymous reviewing
sym

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Rongjie Huang, Jialong Zuo, Shulei Wang, Zhou Zhao

  • Codes are available in this website
  • Audio samples are available in this website
under anonymous reviewing
sym

VS-TTS: Controllable Voice Stylization for Text-to-Speech with Natural Language Prompts
Authors: Jialung Zuo*, Xize Cheng*, Shengpeng Ji*, Ziyue Jiang, Minghui Fang, Zhiqing Hong, Rongjie Huang, Zehan Wang, Tao Jin, Zhou Zhao

  • Audio samples are available in this website
under anonymous reviewing
sym

DiscreteWM: Speech Watermarking with Discrete Representations
Authors: Ziyue Jiang*, Shengpeng Ji*, Yi Ren, Zhenhui Ye, Rongjie Huang, Jinglin Liu, Chen Zhang, Tianyu Pang, Chao Du, Hongcheng Zhu, Zhou Zhao

  • Audio samples are available in this website

Alibaba Technical report FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs, Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang, Zhangyu Xiao, Zhijie Yan, Yexin Yang, Bin Zhang, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Siqi Zheng

ICLR 2024 (Zero-shot TTS) MegaTTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis, Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun MA, Zhou Zhao

🎖 Honors and Awards

  • 2024.10 National Scholarship (master) (Top 1%, 2/327)
  • 2023.06 Outstanding graduate of Jilin University (Top 5%)
  • 2023.06 One-class scholarship of Jilin University (Top 1%, 1/392)
  • 2022.10 Second-class scholarship of Jilin University
  • 2021.10 National Scholarship (Undergraduate) (Top 1%, 5/392)
  • 2020.10 Third-class scholarship of Jilin University

📖 Educations

  • 2023.09 - 2026.03, Master, Software Engineering, Zhejiang University.
  • 2019.09 - 2023.06, Undergraduate, Software Engineering, JiLin Univeristy.

💻 Internships