Home Job Details
A
Information Technology 🏢 Full Time ⭐️ Verified

Senior AI Infrastructure Engineer

Apex Horizon
San Francisco
Estimated Salary
USD 160.000 – USD 240.000
New
Live Update
29 Juni 2026
Deadline
29 Jun 2027

Job Description

Are you ready to architect the digital backbone of the year 2026?

At Apex Horizon, we aren't just building software; we are defining the future. As we scale our generative AI platforms, we need a visionary Senior AI Infrastructure Engineer to design resilient, scalable, and high-performance systems. This is not just a job; it is an opportunity to lead the technological evolution that will define the next decade of computing.

In this role, you will bridge the gap between cutting-edge machine learning research and production-grade infrastructure. You will optimize our GPU clusters, streamline model deployment, and ensure our systems are ready for the exponential growth of AI in 2026 and beyond.

Why Join Apex Horizon?

  • Work with state-of-the-art hardware and software stacks.
  • Competitive compensation package with equity options.
  • Flexible remote-first culture with a vibrant SF hub.
  • Focus on sustainable, high-efficiency computing practices.

Responsibilities

  • Design and implement scalable distributed computing systems optimized for AI workloads, ensuring high availability and low latency.
  • Architect and maintain CI/CD pipelines for deploying large language models and neural networks to production environments.
  • Collaborate with data scientists to optimize model inference speeds and reduce computational costs on GPU clusters.
  • Implement robust observability and monitoring solutions to track system health and performance metrics in real-time.
  • Drive architectural decisions related to cloud migration (AWS/Azure/GCP) and on-premise hybrid infrastructure setups.
  • Ensure infrastructure security and compliance with industry standards and data privacy regulations.

Qualifications

  • 7+ years of experience in software engineering, with a focus on backend infrastructure and distributed systems.
  • Strong proficiency in Python, Go, or Rust, with deep experience in Docker and Kubernetes.
  • Demonstrated expertise in managing large-scale GPU clusters and optimizing deep learning inference pipelines.
  • Experience with cloud platforms (AWS, GCP, or Azure) and serverless architectures.
  • Excellent problem-solving skills and the ability to thrive in a fast-paced, ambiguity-tolerant environment.
  • B.S., M.S., or PhD in Computer Science, Engineering, or a related technical field.

Required Skills

Python Docker Kubernetes AWS GCP Machine Learning Deep Learning Distributed Systems CI/CD Linux GPU Clusters Go Rust

Ready to Take This Challenge?

Make sure your resume is ready. Submit your application now before the deadline.

Apply Now

Related Jobs

Similar job recommendations for you

View All