← all jobs

Incident Response Engineer-Facility Operations Center

Work from home Full-time role Hiring

We are looking for a highly motivated and skilled Incident Response Engineer to join our Facility Operations Center (FOC) team. In this critical role, we are responsible for coordination and presentation within NVIDIA’s datacenters, with a specific focus on incident response, vendor support, and maintenance performance. You will be instrumental in ensuring the reliability and availability of our datacenter environments and minimizing the blast radius of incidents. What you'll be doing: Primary role is to perform coordination and communication across NVIDIA’s datacenter portfolio from an operations perspective regarding incidents, maintenance, and reporting/monitoring. Develop standards and programs in support of reliability and operations initiatives, including Problem and Change Control, and define and maintain a health score for sites and environments, including testing methods to predict and isolate points of failure, assessing and advising on maintenance strategies, and providing related reporting and metrics. Study failure data and work with machine learning and AI teams and tools to predict future failures, and facilitate reliability studies such as critical assessments, RAM models, and RCM studies. Identify and drive automation & process improvement opportunities across catalog quality workflows and reporting. Coordinate disaster recovery tests, liaise during audits, collaborate with internal partners, and make vital progress to ensure business continuity and compliance. Perform risk assessments to ensure compliance with policies, procedures, rules & regulations, and data center standards. Own and present end-to-end key business metrics related to incident response, including ownership and representation of internal and external tooling. Lead root cause analysis for outages and adjust documentation, workflows, and operating procedures to avoid future incidents. Assess process improvement & transformation opportunities and partner with process owners & collaborators to scope opportunities, define problem statements and objectives, and structure projects and teams. Work multi-functionally with other team members and groups within the organization, and develop strong, productive relationships across peer organizations that further the organization's business objectives. This will incorporate training, coaching, and mentoring Operations teams as needed to empower them to use operations tools and systems to meet daily business needs. Other projects and duties as assigned. What we need to see: Bachelor’s degree in a related field (e.g., Electrical Engineering, Mechanical Engineering, Industrial Engineering, Computer Engineering, Telecommunication Engineering, Computer Science, or business-related field) or equivalent experience. 5+ years of operations or environmental, health, and safety experience within data centers. Proficient in developing and driving reliability activities (modeling predictions, life cycle testing, stress testing, etc.). Commercial and financial awareness, with a full comprehension of the impact of failure in translation to business costs, production targets, and fulfillment of customer orders. Highly developed numeracy, statistical, and reporting skills; ability to analyze, interpret, and apply information, data, and trends. Enthusiastic about achieving goals and maintaining organization, capable of strategizing and meeting set objectives. Demonstrated ability to be meticulous, organized, and capable of consolidating data analyses for presentation to large-scale groups. Proficient in the use of asset database and DCIM solutions to extract data and develop meaningful insights. Experience in designing, deploying, or maintaining large-scale datacenter infrastructure (whether ACSMEP or networking) or the ability to create strategic infrastructure roadmaps including on-premise, hybrid, and cloud technologies. Demonstrated knowledge and advanced proficiency working with Microsoft Office Suite software and G-Suite software. Ways to stand out from the crowd: Proven experience in reliability engineering related to electrical or mechanical cooling systems. Certifications such as CDCMP, CMRP, CRL, CRE in Maintenance and Reliability. Knowledge of relevant ISO standards and their implementation. Demonstrated expertise in statistics, forecasting, and management information methods and techniques. Strong IT systems knowledge and skills including sophisticated Office/G-Suite skills and the ability to learn new software packages.

More open positions

Senior Director, Product Management - Digital Workplace Solutions

Work from home Full-time role

Resin Modality Leader EMEA (m/f/d)

Work from home Full-time role

Relationship Manager

Work from home Full-time role

Information Security Analyst

Work from home Full-time role

Finance Administrator

Work from home Full-time role

Business Process Manager II (US)

Work from home Full-time role

On Stage Alaska Presenters

Work from home Full-time role

Remote Chat Support Officer – Part‑Time Customer Experience Specialist for careerzynith’s Digital Play & Learning Solutions

Work from home Full-time role

Title Examiner - Florida (Remote)

Work from home Full-time role

[Remote] Administrative and Project Specialist Institute

Work from home Full-time role

Remote Data Entry Specialist – Part‑Time, No Experience Required – Join careerzynith’s Dynamic E‑Commerce Team and Earn $25/hr from Home

Work from home Full-time role

Medical Imaging Reviewer - 0 uren contract

Work from home Full-time role

Credentialing/Licensing Payor Specialist

Work from home Full-time role

Staff Software Engineer - Full Remote or Hybrid (m/f/d)

Work from home Full-time role

Clinical Psychologist – Licensed in New York (Geriatric, Virtual)

Work from home Full-time role

German and English Speaking Customer Service Agent (m/f/d) - 100% remote

Work from home Full-time role

APTPUO - Fall 2026 - ENG1120 AJ00

Work from home Full-time role

Legal Assistant – Non-VAWA Editor

Work from home Full-time role

SAP Data Migration Engineer (LTMC / SLT / LVM)

Work from home Full-time role

[Remote] Account Manager, I

Work from home Full-time role

Supporter:in (m/w/d) im HYDMedia ECM/DMS Umfeld

Work from home Full-time role