About Me
I am a Scientist at Nex-AGI. Check Nex-AGI to explore the Nexus of Agentic Intelligence, focusing on end-to-end agent solutions ranging from Models and Data to Frameworks.
I obtained my Ph.D. degree from The University of Hong Kong (HKU) in 2020, advised by Prof. Francis C.M. Lau. Before that, I received my B.Eng in Electronic and Information Engineering from the University of Electronic Science and Technology of China (UESTC) in 2014.
My current research interests focus on LLM Agents (e.g., Agent RL, Agentic Context Engineering, and Software 3.0). We are finding ways to unleash the power of LLM in real productivity, especially to teach LLM to understand how domain experts solve problems.
Selected Publications
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Best PaperMinference 1.0: Accelerating pre-filling for long-context LLMs via dynamic sparse attention
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Optimizing Dynamic Neural Networks with Brainstorm
Dynamic Resource Allocation for Deep Learning Clusters with Separated Compute and Storage
ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters
PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training
HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees
Retiarii: A Deep Learning Exploratory-Training Framework
Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services
Gandiva: Introspective Cluster Scheduling for Deep Learning
Online File Caching in Latency-Sensitive Systems with Delayed Hits and Bypassing
Regularization-Based Coflow Scheduling in Optical Circuit Switches
Online Dispatching and Scheduling of Jobs with Heterogeneous Utilities in Edge Computing
Scheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation
OnDisc: Online Latency-Sensitive Job Dispatching and Scheduling in Heterogeneous Edge-Clouds
Joint Online Coflow Routing and Scheduling in Data Center Networks
Camul: Online Caching on Multiple Caches with Relaying and Bypassing
Energy Efficient Dynamic Virtual Machine Management in Data Centers
Efficient Online Learning Based Cross-Tier Uplink Scheduling in HetNets
Congestion Game with Agent and Resource Failures
Professional Services
Program Committee
- • IEEE INFOCOM 2021, 2022
- • MSN 2020
Journal Reviewer
- • IEEE/ACM Transactions on Networking