Engineering Manager (AI Inference) at Perplexity AI — AI Jobs | Neura Market | Neura Market
    Neura Market
    Neura Market
    Marketplace
    Directories
    Resources
    AI JobsEngineering Manager (AI Inference)
    Perplexity AI

    Engineering Manager (AI Inference)

    Perplexity AI

    San Francisco

    Marketplace

    • Prompts
    • Workflows
    • Agents Store
    • Bundles
    • Templates
    • Categories
    • Marketplace

    Directories

    • AI Tools Directory
    • ChatGPT
    • Claude
    • Gemini
    • Cursor
    • Grok
    • DeepSeek
    • Perplexity
    • CoPilot
    • Midjourney
    • Stable Diffusion
    • MCP Servers
    • .md Directory
    • All Directories

    Free Tools

    • AI Text Humanizer
    • AI Content Detector
    • Workflow Generator
    • Model Comparison
    • AI Pricing Calculator
    • AI Benchmarks
    • ROI Calculator
    • All Free Tools

    Resources

    • AI News
    • Blog
    • AI Models
    • Integrations
    • Alternatives
    • Resource Library
    • Documentation

    Community

    • AI Jobs
    • AI Events
    • AI Companies
    • Start Selling
    • Creator Guide
    • Advertise
    • Affiliates

    Company

    • About
    • Contact
    • Help
    • Careers
    • Pricing
    • Terms
    • Privacy
    • License
    • DMCA

    Stay Updated

    Get the latest AI tools and insights delivered to your inbox.

    Neura Market Logoneuramarket

    © 2026 Neura Market. All rights reserved.

    Senior-level / Expert
    Full-time
    On-site
    4/13/2026
    Apply

    About This Role

    About the Role

    We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs, serving millions of users with state-of-the-art AI capabilities.

    You will own the technical direction and execution of our inference systems while building and leading a world-class team of inference engineers. Our current stack includes Python, PyTorch, Rust, C++, and Kubernetes. You will help architect and scale the large-scale deployment of machine learning models behind Perplexity's Comet, Sonar, Search, Deep Research products.

    Why Perplexity?

    • Build SOTA systems that are the fastest in the industry with cutting-edge technology

    • High-impact work on a smaller team with significant ownership and autonomy

    • Opportunity to build 0-to-1 infrastructure from scratch rather than maintaining legacy systems

    • Work on the full spectrum: reducing cost, scaling traffic, and pushing the boundaries of inference

    • Direct influence on technical roadmap and team culture at a rapidly growing company

    Responsibilities

    • Lead and grow a high-performing team of AI inference engineers

    • Develop APIs for AI inference used by both internal and external customers

    • Architect and scale our inference infrastructure for reliability and efficiency

    • Benchmark and eliminate bottlenecks throughout our inference stack

    • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models

    • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.

    • Improve the reliability and observability of our systems and lead incident response

    • Own technical decisions around batching, throughput, latency, and GPU utilization

    • Partner with ML research teams on model optimization and deployment

    • Recruit, mentor, and develop engineering talent

    • Establish team processes, engineering standards, and operational excellence

    Qualifications

    • 5+ years of engineering experience with 2+ years in a technical leadership or management role

    • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)

    • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers

    • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention

    • Familiarity with GPU characteristics, roofline models, and performance analysis

    • Experience deploying reliable, distributed, real-time systems at scale

    • Track record of building and leading high-performing engineering teams

    • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism

    • Strong technical communication and cross-functional collaboration skills

    Nice to Have

    • Experience with CUDA, Triton, or custom kernel development

    • Background in training infrastructure and RL workloads

    • Experience with Kubernetes and container orchestration at scale

    • Published work or contributions to inference optimization research

    Skills & Tech Stack

    PythonRustPyTorchTensorFlowKubernetesLLMsCUDATriton

    Roles

    Engineering ManagerManager

    Location

    Region

    North America

    Country

    United States

    State / Province

    California

    City

    San Francisco

    Topics

    AI

    Related AI Jobs

    Perplexity AI

    Member of Technical Staff (ML Engineer, Recommendations & User Modeling)

    Perplexity AI·Full-time·San Francisco
    AI
    Perplexity AI

    Member of Technical Staff (AI Software Engineer, Multimodal)

    Perplexity AI·Full-time·San Francisco
    AI
    Perplexity AI

    Member of Technical Staff (AI Software Engineer, Agents)

    Perplexity AI·Full-time·San Francisco
    AI
    Perplexity AI

    Engineering Manager (Agents)

    Perplexity AI·Full-time·San Francisco
    AI
    Perplexity AI

    Engineering Manager (AI Research & Model Training)

    Perplexity AI·Full-time·San Francisco
    AI
    Perplexity AI

    Member of Technical Staff (Software Engineer, Data Flywheel)

    Perplexity AI·Full-time·London
    AI
    ← Back to all jobs