Staff Researcher
Samsung Advanced Institute of Technology (SAIT)
Feb 2022 - Present
- Led research on optimizing distributed LLM inference, developing automated pipelining frameworks to mitigate inter-GPU communication bottlenecks.
- Developed a fully-automated performance tuning framework for mobile NPU compilers (MLIR-based and in-house compilers), achieving significant inference throughput gains on commercial devices.
- Tuned on-device LLM inference acceleration through compiler and runtime optimizations, enabling the first deployment on Samsung Galaxy smartphones.