I am a Research Scientist at ByteDance Seed, having recently graduated from Tsinghua University. I was fortunate to be advised by Hao Zhang (UCSD), Minjia Zhang (UIUC), and Jie Tang (THU) in the past.
I previously worked on efficient model architectures and video training infrastructure — including KV cache compression for long-context LLMs and sparse attention kernels and distillation pipelines for video diffusion models. I am now focused on self-evolving agents for long-horizon, real-world tasks.
I'm always excited to exchange ideas and collaborate on research. Feel free to reach out at pianoqwz@gmail.com.
Projects
Accelerate video diffusion model generation
- First open distillation recipes for video DiT, with support for distilling and finetuning state-of-the-art open video DiTs (including Sliding Tile Attention).
- Scalable training with FSDP, sequence parallelism, and selective activation checkpointing — near-linear scaling to 64 GPUs.
- Memory-efficient finetuning with LoRA, precomputed latents, and precomputed text embeddings.
Sliding Tile Attention
Fast tile-level sparse attention for video diffusion · ICML 2025 · arXiv
Efficient-vDiT
Block-sparse video diffusion with Attention Tile · ICML 2025 @ Es-FoMo · arXiv
- Discovered the Attention Tile pattern in 3D-DiT and built an efficient video diffusion pipeline.
- Achieves 7.8× speedup on a single GPU through block-sparse kernels and consistency distillation.
Evaluating LLMs as agents · ICLR 2024
- Classifies real-world browsing options and designs an auto-collected browsing-traces data framework, building a more efficient language-model-driven automated web navigation agent.
Miscellaneous
- Guitar. Lead guitarist of Susu (素数), a math-rock band active in Beijing. Recent live performance at Susu Lab.
- Pingpong. Member and referee of our department's ping-pong team.