Framework Software Engineer
직군
Framework Software
근무지
Rebellions | 리벨리온경기도 성남시 분당구 정자일로156번길 6, R-TOWER 3F ~ 8F

Responsibilities and ​Opportunities

  • Design, ​develop, ​and optimize ​high-performance inference frameworks for ​large-scale ​distributed serving, ​including LLM workloads, ​using vLLM, ​SGLang, ​llm-d and ​PyTorch ​to ​support a wide ​range ​of serving patterns
  • Improve ​end-to-end ​serving ​performance (TTFT, ITL, ​throughput, tail ​latency) ​through developing ​techniques such ​as ​continuous batching, KV-cache ​management, prefix ​caching, speculative decoding, and pipeline/tensor parallelism
  • Design scalable multi-node serving architectures including prefill/decode disaggregation, distributed KV-cache, cache eviction strategies, and scheduler design for fairness and efficiency
  • Analyze and optimize memory usage, device utilization, communication overhead (PCIe, RDMA), and runtime behavior in real-world serving environments
  • Build and maintain benchmarks and simulators for AI serving workloads, and drive data-informed architectural decisions
  • Work closely with infrastructure, compiler, and hardware teams to co-design end-to-end AI serving systems
  • Actively contribute to and collaborate with open-source communities (e.g., vLLM, PyTorch, Triton, SGLang), including upstream contributions, bug fixes, and design discussions


Key Qualifications

  • Master's or higher degree (or equivalent experience) in Computer Science, Electrical Engineering, or a related field
  • Strong experience with Python, C++ and PyTorch, including model execution and runtime internals
  • Hands-on experience with inference serving or high-performance ML systems
  • Familiarity with Linux systems, profiling tools, and debugging performance bottlenecks
  • Strong problem-solving skills and the ability to reason about system-level trade-offs
  • Clear communication skills and ability to collaborate in a fast-paced engineering environment


Ideal Qualifications

  • Experience with vLLM, SGLang, TensorRT-LLM, or similar LLM serving frameworks
  • Deep understanding of KV-cache management, attention mechanisms, and memory-efficient inference
  • Experience with multi-node inference, including tensor/pipeline parallelism
  • Experience supporting GPU/NPU-based inference platforms
  • Proven track record of open-source contributions to ML, systems, or infrastructure projects
  • Background in building or operating Agentic AI services





전형절차

  • 서류전형 > On-line 인터뷰 > On-site 인터뷰(과제 발표 포함) > Culture-fit 인터뷰 > 처우 협의 > 최종 합격
  • 전형절차는 직무별로 다르게 운영될 수 있으며, 일정 및 상황에 따라 변동될 수 있습니다.
  • 전형 일정 및 결과는 지원 시 작성하신 이메일로 개별 안내드립니다.


참고사항

  • 본 공고는 모집 완료 시 조기 마감될 수 있습니다.
  • 지원서 내용 중 허위사실이 있는 경우에는 합격이 취소될 수 있습니다.
  • 채용 및 업무 수행과 관련하여 요구되는 법령 상 자격이 갖추어지지 않은 경우 채용이 제한될 수 있습니다.
  • 보훈 대상자 및 장애인 여부는 채용 과정에서 어떠한 불이익도 미치지 않습니다.
  • 담당 업무 범위는 후보자의 전반적인 경력과 경험 등 제반사정을 고려하여 변경될 수 있습니다. 이러한 변경이 필요할 경우, 최종 합격 통지 전 적절한 시기에 후보자와 커뮤니케이션 될 예정입니다.
  • 채용 관련 문의사항은 아래 메일 주소로 문의바랍니다.
  • [email protected]
공유하기
Framework Software Engineer

Responsibilities and ​Opportunities

  • Design, ​develop, ​and optimize ​high-performance inference frameworks for ​large-scale ​distributed serving, ​including LLM workloads, ​using vLLM, ​SGLang, ​llm-d and ​PyTorch ​to ​support a wide ​range ​of serving patterns
  • Improve ​end-to-end ​serving ​performance (TTFT, ITL, ​throughput, tail ​latency) ​through developing ​techniques such ​as ​continuous batching, KV-cache ​management, prefix ​caching, speculative decoding, and pipeline/tensor parallelism
  • Design scalable multi-node serving architectures including prefill/decode disaggregation, distributed KV-cache, cache eviction strategies, and scheduler design for fairness and efficiency
  • Analyze and optimize memory usage, device utilization, communication overhead (PCIe, RDMA), and runtime behavior in real-world serving environments
  • Build and maintain benchmarks and simulators for AI serving workloads, and drive data-informed architectural decisions
  • Work closely with infrastructure, compiler, and hardware teams to co-design end-to-end AI serving systems
  • Actively contribute to and collaborate with open-source communities (e.g., vLLM, PyTorch, Triton, SGLang), including upstream contributions, bug fixes, and design discussions


Key Qualifications

  • Master's or higher degree (or equivalent experience) in Computer Science, Electrical Engineering, or a related field
  • Strong experience with Python, C++ and PyTorch, including model execution and runtime internals
  • Hands-on experience with inference serving or high-performance ML systems
  • Familiarity with Linux systems, profiling tools, and debugging performance bottlenecks
  • Strong problem-solving skills and the ability to reason about system-level trade-offs
  • Clear communication skills and ability to collaborate in a fast-paced engineering environment


Ideal Qualifications

  • Experience with vLLM, SGLang, TensorRT-LLM, or similar LLM serving frameworks
  • Deep understanding of KV-cache management, attention mechanisms, and memory-efficient inference
  • Experience with multi-node inference, including tensor/pipeline parallelism
  • Experience supporting GPU/NPU-based inference platforms
  • Proven track record of open-source contributions to ML, systems, or infrastructure projects
  • Background in building or operating Agentic AI services





전형절차

  • 서류전형 > On-line 인터뷰 > On-site 인터뷰(과제 발표 포함) > Culture-fit 인터뷰 > 처우 협의 > 최종 합격
  • 전형절차는 직무별로 다르게 운영될 수 있으며, 일정 및 상황에 따라 변동될 수 있습니다.
  • 전형 일정 및 결과는 지원 시 작성하신 이메일로 개별 안내드립니다.


참고사항

  • 본 공고는 모집 완료 시 조기 마감될 수 있습니다.
  • 지원서 내용 중 허위사실이 있는 경우에는 합격이 취소될 수 있습니다.
  • 채용 및 업무 수행과 관련하여 요구되는 법령 상 자격이 갖추어지지 않은 경우 채용이 제한될 수 있습니다.
  • 보훈 대상자 및 장애인 여부는 채용 과정에서 어떠한 불이익도 미치지 않습니다.
  • 담당 업무 범위는 후보자의 전반적인 경력과 경험 등 제반사정을 고려하여 변경될 수 있습니다. 이러한 변경이 필요할 경우, 최종 합격 통지 전 적절한 시기에 후보자와 커뮤니케이션 될 예정입니다.
  • 채용 관련 문의사항은 아래 메일 주소로 문의바랍니다.
  • [email protected]