Driving the hardware-software co-design process to influence future NPU and interconnect architecture
Key Qualifications
Master’s degree in Computer Science, Computer Engineering, or a related field
Minimum of 10 years of professional experience in high-performance systems software development
Strong collaboration and problem-solving skills for complex technical issues
Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures.
Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software
Ideal Qualifications
A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE
Driving the hardware-software co-design process to influence future NPU and interconnect architecture
Key Qualifications
Master’s degree in Computer Science, Computer Engineering, or a related field
Minimum of 10 years of professional experience in high-performance systems software development
Strong collaboration and problem-solving skills for complex technical issues
Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures.
Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software
Ideal Qualifications
A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE