NPU Runtime Software Engineer (LLM Serving)

Job group

Software

Locations

Rebellions | 리벨리온대한민국 경기도 성남시 분당구 정자일로 239, 102동 8층

We are seeking a highly skilled NPU Runtime Engineer specializing in LLM Serving to design, optimize, and deploy large language models (LLMs) for efficient inference in production environments. This role involves working with cutting-edge AI serving frameworks, optimizing NPU-based inference performance, and integrating LLMs with scalable distributed systems.

Responsibilities and Opportunities

Design, implement, and optimize LLM inference pipelines for low-latency, high-throughput serving
Develop and extend vLLM to enhance inference performance on NPUs, including support for Continuous Batching and PagedAttention
Implement custom vLLM extensions to improve memory management, parallelism, and dynamic batching strategies
Work with torch.compile and RBLN compiler toolchains to accelerate model execution on NPUs
Optimize graph transformations, operator fusion, and execution efficiency for LLM inference workloads
Collaborate with ML engineers, and infrastructure teams to deploy and scale LLM services

Key Qualifications

Strong proficiency in Python and deep learning frameworks (PyTorch, TensorFlow)
Deep understanding of LLM architectures, including Transformer-based models and inference optimization techniques
Hands-on experience with LLM serving frameworks (e.g., vLLM, TensorRT-LLM)
Solid understanding of model optimization techniques (tensor parallelism, KV cache optimizations, and memory-efficient execution)
Familiarity with hardware acceleration (GPUs, NPUs, TPUs) and efficient memory management techniques
Strong debugging and performance profiling skills for high-throughput inference environments

Ideal Qualifications

Experience with compilers and runtime optimizations
C++ experience, especially for performance-critical runtime optimizations
Understanding of torch.compile and graph optimizations
Experience with low-level kernel optimizations (CUDA, Triton, ROCm); ability to understand kernel code and apply it to new applications
Experience deploying LLMs in distributed environments
Contributions to open-source LLM serving projects

전형절차

서류전형 - On-line 인터뷰 - On-site 인터뷰 - Culture-fit 인터뷰 - 처우협의 - 최종합격
전형절차는 직무별로 다르게 운영될 수 있으며, 일정 및 상황에 따라 변동될 수 있습니다.
전형일정 및 결과는 지원 시 작성하신 이메일로 개별 안내 드립니다.

참고사항

본 공고는 모집 완료 시 조기 마감될 수 있습니다.
지원서 내용 중 허위사실이 있는 경우에는 합격이 취소될 수 있습니다.
채용 및 업무 수행과 관련하여 요구되는 법령 상 자격이 갖추어지지 않은 경우 채용이 제한될 수 있습니다.
보훈 대상자 및 장애인 여부는 채용 과정에서 어떠한 불이익도 미치지 않습니다.
담당 업무 범위는 후보자의 전반적인 경력과 경험 등 제반사정을 고려하여 변경될 수 있습니다. 이러한 변경이 필요할 경우, 최종 합격 통지 전 적절한 시기에 후보자와 커뮤니케이션 될 예정입니다.
채용 관련 문의사항은 아래 메일 주소로 문의바랍니다.
[email protected]