Framework Software Engineer

cd8TZH

Framework Software

hqff73

Rebellions | 리벨리온경기도 성남시 분당구 정자일로156번길 6, R-TOWER 3F ~ 8F

Design, develop, and optimize high-performance inference frameworks for large-scale distributed serving, including LLM workloads, using vLLM, SGLang, llm-d and PyTorch to support a wide range of serving patterns
Improve end-to-end serving performance (TTFT, ITL, throughput, tail latency) through developing techniques such as continuous batching, KV-cache management, prefix caching, speculative decoding, and pipeline/tensor parallelism
Design scalable multi-node serving architectures including prefill/decode disaggregation, distributed KV-cache, cache eviction strategies, and scheduler design for fairness and efficiency
Analyze and optimize memory usage, device utilization, communication overhead (PCIe, RDMA), and runtime behavior in real-world serving environments
Build and maintain benchmarks and simulators for AI serving workloads, and drive data-informed architectural decisions
Work closely with infrastructure, compiler, and hardware teams to co-design end-to-end AI serving systems
Actively contribute to and collaborate with open-source communities (e.g., vLLM, PyTorch, Triton, SGLang), including upstream contributions, bug fixes, and design discussions

Master's or higher degree (or equivalent experience) in Computer Science, Electrical Engineering, or a related field
Strong experience with Python, C++ and PyTorch, including model execution and runtime internals
Hands-on experience with inference serving or high-performance ML systems
Familiarity with Linux systems, profiling tools, and debugging performance bottlenecks
Strong problem-solving skills and the ability to reason about system-level trade-offs
Clear communication skills and ability to collaborate in a fast-paced engineering environment

Experience with vLLM, SGLang, TensorRT-LLM, or similar LLM serving frameworks
Deep understanding of KV-cache management, attention mechanisms, and memory-efficient inference
Experience with multi-node inference, including tensor/pipeline parallelism
Experience supporting GPU/NPU-based inference platforms
Proven track record of open-source contributions to ML, systems, or infrastructure projects
Background in building or operating Agentic AI services

전형절차

참고사항

본 공고는 모집 완료 시 조기 마감될 수 있습니다.
지원서 내용 중 허위사실이 있는 경우에는 합격이 취소될 수 있습니다.
채용 및 업무 수행과 관련하여 요구되는 법령 상 자격이 갖추어지지 않은 경우 채용이 제한될 수 있습니다.
보훈 대상자 및 장애인 여부는 채용 과정에서 어떠한 불이익도 미치지 않습니다.
담당 업무 범위는 후보자의 전반적인 경력과 경험 등 제반사정을 고려하여 변경될 수 있습니다. 이러한 변경이 필요할 경우, 최종 합격 통지 전 적절한 시기에 후보자와 커뮤니케이션 될 예정입니다.
채용 관련 문의사항은 아래 메일 주소로 문의바랍니다.
[email protected]

+uEs0S

Framework Software Engineer

Design, develop, and optimize high-performance inference frameworks for large-scale distributed serving, including LLM workloads, using vLLM, SGLang, llm-d and PyTorch to support a wide range of serving patterns
Improve end-to-end serving performance (TTFT, ITL, throughput, tail latency) through developing techniques such as continuous batching, KV-cache management, prefix caching, speculative decoding, and pipeline/tensor parallelism
Design scalable multi-node serving architectures including prefill/decode disaggregation, distributed KV-cache, cache eviction strategies, and scheduler design for fairness and efficiency
Analyze and optimize memory usage, device utilization, communication overhead (PCIe, RDMA), and runtime behavior in real-world serving environments
Build and maintain benchmarks and simulators for AI serving workloads, and drive data-informed architectural decisions
Work closely with infrastructure, compiler, and hardware teams to co-design end-to-end AI serving systems
Actively contribute to and collaborate with open-source communities (e.g., vLLM, PyTorch, Triton, SGLang), including upstream contributions, bug fixes, and design discussions

Master's or higher degree (or equivalent experience) in Computer Science, Electrical Engineering, or a related field
Strong experience with Python, C++ and PyTorch, including model execution and runtime internals
Hands-on experience with inference serving or high-performance ML systems
Familiarity with Linux systems, profiling tools, and debugging performance bottlenecks
Strong problem-solving skills and the ability to reason about system-level trade-offs
Clear communication skills and ability to collaborate in a fast-paced engineering environment

Experience with vLLM, SGLang, TensorRT-LLM, or similar LLM serving frameworks
Deep understanding of KV-cache management, attention mechanisms, and memory-efficient inference
Experience with multi-node inference, including tensor/pipeline parallelism
Experience supporting GPU/NPU-based inference platforms
Proven track record of open-source contributions to ML, systems, or infrastructure projects
Background in building or operating Agentic AI services

전형절차

참고사항

본 공고는 모집 완료 시 조기 마감될 수 있습니다.
지원서 내용 중 허위사실이 있는 경우에는 합격이 취소될 수 있습니다.
채용 및 업무 수행과 관련하여 요구되는 법령 상 자격이 갖추어지지 않은 경우 채용이 제한될 수 있습니다.
보훈 대상자 및 장애인 여부는 채용 과정에서 어떠한 불이익도 미치지 않습니다.
담당 업무 범위는 후보자의 전반적인 경력과 경험 등 제반사정을 고려하여 변경될 수 있습니다. 이러한 변경이 필요할 경우, 최종 합격 통지 전 적절한 시기에 후보자와 커뮤니케이션 될 예정입니다.
채용 관련 문의사항은 아래 메일 주소로 문의바랍니다.
[email protected]