Designing and implementing key components of a new collective communication library, specifically engineered for our NPU’s unique architecture and topology
Contributing to the technical design, API definition, and performance optimization of the communication library
Collaborating with hardware and software teams to analyze performance bottlenecks and influence future NPU and interconnect architecture
Key Qualifications
Master’s degree in Computer Science, Computer Engineering, or a related field
Minimum of 5 years of professional experience in high-performance systems software development
Strong collaboration and problem-solving skills for complex technical issues
Solid understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Proficiency in low-level systems programming (C/C++) and understanding of OS internals and networking fabrics like RDMA/RoCE
Understanding of interconnect topologies and Network-on-Chip (NoC) architectures
Proven experience developing and delivering complex, high-performance, and reliable software in a collaborative environment
Ideal Qualifications
A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
Prior experience contributing to high-performance communication libraries (e.g., NCCL, MPI) or parallel runtimes
Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)
Collective Communication Library Engineer (Software)
Responsibilities and Opportunities
Designing and implementing key components of a new collective communication library, specifically engineered for our NPU’s unique architecture and topology
Contributing to the technical design, API definition, and performance optimization of the communication library
Collaborating with hardware and software teams to analyze performance bottlenecks and influence future NPU and interconnect architecture
Key Qualifications
Master’s degree in Computer Science, Computer Engineering, or a related field
Minimum of 5 years of professional experience in high-performance systems software development
Strong collaboration and problem-solving skills for complex technical issues
Solid understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Proficiency in low-level systems programming (C/C++) and understanding of OS internals and networking fabrics like RDMA/RoCE
Understanding of interconnect topologies and Network-on-Chip (NoC) architectures
Proven experience developing and delivering complex, high-performance, and reliable software in a collaborative environment
Ideal Qualifications
A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
Prior experience contributing to high-performance communication libraries (e.g., NCCL, MPI) or parallel runtimes
Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)