Designing and implementing key components of a communication software stack including collective communication library and driver specifically engineered for our NPU and related communication topologies
Contributing to the technical design, API definition, and performance optimization of the communication stacks across software and hardware layers
Collaborating with hardware and software teams to analyze performance bottlenecks and influence future NPU and interconnect topology
Key Qualifications
Minimum of 5 years of professional experience in systems software development, or a Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture, Collective Communication, or equivalent)
Strong collaboration and problem-solving skills for complex technical issues
Proficiency in low-level systems programming (C/C++) and understanding of OS internals and networking
Proven experience developing and delivering complex, high-performance, and reliable software in a collaborative environment
Understanding of hardware accelerators (GPUs, TPUs, or other NPUs) and their performance characteristics
Ideal Qualifications
Solid understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Prior experience contributing to high-performance communication libraries (e.g., NCCL, MPI), parallel runtimes, and/or high-performance networks (e.g., RDMA/RoCE, NVLink, CXL)
Understanding of interconnect topologies and Network-on-Chip (NoC) architectures
Designing and implementing key components of a communication software stack including collective communication library and driver specifically engineered for our NPU and related communication topologies
Contributing to the technical design, API definition, and performance optimization of the communication stacks across software and hardware layers
Collaborating with hardware and software teams to analyze performance bottlenecks and influence future NPU and interconnect topology
Key Qualifications
Minimum of 5 years of professional experience in systems software development, or a Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture, Collective Communication, or equivalent)
Strong collaboration and problem-solving skills for complex technical issues
Proficiency in low-level systems programming (C/C++) and understanding of OS internals and networking
Proven experience developing and delivering complex, high-performance, and reliable software in a collaborative environment
Understanding of hardware accelerators (GPUs, TPUs, or other NPUs) and their performance characteristics
Ideal Qualifications
Solid understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Prior experience contributing to high-performance communication libraries (e.g., NCCL, MPI), parallel runtimes, and/or high-performance networks (e.g., RDMA/RoCE, NVLink, CXL)
Understanding of interconnect topologies and Network-on-Chip (NoC) architectures