Hyunjun Kim Profile Picture

Hyunjun Kim

ML System Researcher

Email LinkedIn Google Scholar

About Me

Experienced in writing high performance GPU kernels, tuning NPU compiler optimizations, and deploying on-device AI models, backed by a deep understanding of hardware architectures. Skilled at bridging software and hardware to deliver performance-optimized solutions across heterogeneous platforms through advanced profiling and performance analysis.

Experience

Staff Researcher Samsung Advanced Institute of Technology (SAIT)
  • Led research on optimizing distributed LLM inference, developing automated pipelining frameworks to mitigate inter-GPU communication bottlenecks.
  • Developed a fully-automated performance tuning framework for mobile NPU compilers (MLIR-based and in-house compilers), achieving significant inference throughput gains on commercial devices.
  • Tuned on-device LLM inference acceleration through compiler and runtime optimizations, enabling the first deployment on Samsung Galaxy smartphones.
Staff Engineer Samsung Electronics
  • Improving on-device AI model inference by training with self-annotated datasets, applying quantization, and leveraging heterogeneous compute accelerators (e,g, GPU, NPU, DSP).
Postdoctoral Researcher IT Convergence Center @ SKKU
  • Conducted postdoctoral research on Unified Virtual Memory (UVM) for GPUs, investigating page eviction, prefetch, and throttling strategies to reduce overhead.
Research Assistant ARCS Lab. @ SKKU
  • Investigated GPU microarchitecture to optimize kernel performance, implementing a source-to-source transpiler and using low-level profilers for bottleneck analysis.

Education

Ph.D. in Computer Engineering Sungkyunkwan University (SKKU)
B.S. in Computer Engineering Sungkyunkwan University (SKKU)

Recent Publications

View full publication list on Google Scholar

Skills & Keywords

AI accelerators (GPU, Mobile NPU) Kernel programming (CUDA, Triton) Inference engine (vLLM) Compiler (MLIR, LLVM) Profiler (Nsight, Nvprof, Perfetto) Python, C/C++ Optimization (DP, ILP, GA, Beam-search)