Xing Wu | Personal Homepage

I am currently an expert researcher at Tencent Hunyuan LLM team, working on pretraining data reformulation. I received my Ph.D. from the University of Chinese Academy of Sciences (UCAS), directed by Prof Songlin Hu, where my research focused on Natural Language Processing. Previously, I worked at XiaoHongShu, Kwai, and Baidu.

Research Interests

I am committed to long-term, in-depth research on pretraining data reformulation, including:

Long-context Data Reformulation — Synthesizing long-context training data to enhance long-range modeling of LLMs.
Knowledge Data Reformulation — Creating diverse entry points (narratives, contexts, co-occurring entities) for the same knowledge, so that long-tail facts are no longer long-tail in pretraining corpora.
Long Agentic Data Reformulation — My latest focus. Agentic post-training struggles to generalize; the key is agentic pretraining. Since environments (e.g., GPU clusters, distributed systems, commercial platforms, etc.) are the hardest to scale, pretraining should lower the activation cost of post-training — making correct paths easier to sample.

I also explore scalable oversight — what do we do when humans can no longer effectively supervise AI? Welcome to discuss or join me (wuxing@iie.ac.cn).

Work Experience

Baidu · PaddlePaddle

Text Pretraining

2019.08 — 2020.07

→

Kwai · MMU

Text & Multimodal Pretraining

2020.08 — 2023.07

→

XiaoHongShu · HiLab

Long-context Pretraining

2023.08 — 2025.11

→

Tencent · HunYuan

Data Reformulation

2025.12 — Now

News

🎉🎉🎉 NextLong and EntropyLong validated on Tencent HY 3.0, Zhipu GLM-5, and XiaoHongShu Dots, demonstrating consistent improvements across production LLMs.
2026.01 EntropyLong accepted as ICLR 2026 conference paper!
2025.11 LiteLong accepted as AAAI 2026 conference paper!
2025.09 LongMagpie accepted as NeurIPS 2025 conference paper!
2025.05 NextLong accepted as ICML 2025 conference paper!
2025.01 Quest accepted as ICLR 2025 conference paper!