Lead System Engineer (AI Automation Engineer SRE Focus)
Bothell, Washington
Join AT&T and help shape the future of communications and technology that connect the world. We value innovators who seek to explore the unknown and challenge the status quo. Bring your bold ideas and fearless spirit to redefine connectivity and transform how people share stories and experiences. At AT&T, you won’t just imagine the future—you’ll build it.
Lead System Engineer(AI Automation Engineer SRE Focus)
Role Overview - AI-Driven Reliability, Automation & Platform Engineering
We are seeking a Lead AI Automation Engineer with a strong Site Reliability Engineering (SRE) mindset to design, implement, and operate AI-driven automation and intelligent reliability capabilities across mission‑critical Front Office (CRM) and Back Office (Supply Chain, Logistics, and ERP) platforms.
This role sits at the intersection of AI automation, AIOps, platform reliability, and enterprise application engineering. You will leverage Generative AI, Large Language Models (LLMs), Agentic AI, and autonomous automation frameworks to dramatically improve system resilience, incident response, observability, and operational efficiency across complex Oracle-based and SaaS ecosystems.
You will be accountable not just for keeping systems running, but for engineering self-healing, predictive, and continuously improving platforms that reduce human toil, prevent incidents before they occur, and scale reliably as the business grows.
What You’ll Do
AI-Driven Reliability & Automation Engineering
- Architect and deliver AI-powered automation solutions for production operations, including intelligent incident triage, root cause analysis, remediation, and prevention.
- Design Agentic AI workflows that autonomously monitor systems, analyze anomalies, trigger corrective actions, and orchestrate recovery across ERP, supply chain, and integration layers.
- Apply AIOps techniques to correlate metrics, logs, events, and traces for predictive alerting, noise reduction, and proactive reliability improvements.
- Develop LLM-enabled runbooks and intelligent assistants to guide operational decision-making, accelerate incident response, and upskill operations teams.
Site Reliability Engineering (SRE) & Production Operations
- Own platform stability, uptime, and performance across Oracle EBS/ERP, Oracle Fusion Cloud, and supply chain execution systems.
- Lead incident management, coordinating rapid response, containing impact, and ensuring SLA adherence.
- Conduct blameless postmortems, using AI-assisted RCA to identify systemic issues and drive automation-first corrective actions.
- Partner with development teams to embed reliability, scalability, and observability requirements into system design and delivery.
Enterprise Application & Supply Chain Support
- Provide advanced production support for Oracle EBS/ERP modules including Procurement, Order Management, Inventory, AR, AP, FA, Project Accounting, and Supply Chain Planning.
- Support end-to-end supply chain flows including Procure-to-Pay, Order-to-Cash, inventory transactions, fulfillment, shipping, and reconciliation processes.
- Troubleshoot complex issues across configuration, master data, transactions, batch jobs, interfaces, and integrations, leveraging deep SQL and system-level analysis.
- Monitor and support 3rd-party platforms (O9, Blue Yonder/JDA, RELEX) and integrations with WMS, 3PL, and logistics providers.
Observability, Monitoring & Intelligence
- Build and evolve AI-augmented observability solutions using tools such as Dynatrace, AppDynamics, Splunk, ELK, Grafana, and custom ML models.
- Implement predictive health monitoring, capacity forecasting, and intelligent service-level indicators (SLIs/SLOs).
- Replace static alerts with context-aware, AI-ranked alerts that reduce noise and accelerate resolution.
- Create autonomous dashboards that surface actionable insights rather than raw metrics.
Integration & Automation Excellence
- Diagnose and remediate integration failures across Oracle SOA/OIC, MuleSoft, Kafka/JMS, EDI, and event-driven architectures.
- Automate error handling, replay, deduplication, and reconciliation for high-volume interfaces using AI-assisted logic.
- Collaborate with middleware, cloud, and vendor teams to resolve cross-system defects, data mismatches, latency issues, and sequencing problems.
- Continuously identify and eliminate manual operational toil through intelligent automation and self-service tooling.
Release, Cloud & Platform Engineering
- Support release management, ensuring changes meet reliability, security, and performance standards.
- Apply DevOps and SRE practices including automation-first deployments, rollback strategies, and resilience testing.
- Leverage cloud-native and containerized platforms (Docker, Kubernetes, Azure) to support scalable, resilient workloads.
- Participate in on-call rotations, with a strong emphasis on automation and AI-driven reduction of recurring incidents.
What You’ll Bring
Core Experience & Mindset Requirements
- 10+ years of experience across enterprise application engineering, SRE, and production operations, with an automation-first mindset.
- Proven experience driving AI-based automation, AIOps, or intelligent operational tooling in complex enterprise environments.
- Strong ownership mentality for system reliability, performance, and customer impact.
AI, Automation & Engineering Skills
- Hands-on experience with Generative AI, LLMs, or Agentic AI frameworks applied to automation, monitoring, or operations.
- Proficiency in Python, Shell scripting, SQL/PLSQL, and automation frameworks.
- Experience building AI-enhanced runbooks, chatbots, or autonomous operational workflows is highly desirable.
- Ability to translate operational patterns into repeatable, intelligent automation.
Technology Stack
- Deep experience with Oracle EBS and/or Oracle Fusion Cloud (AR, AP, FA, PO, INV, OM, PA, Planning).
- Strong knowledge of observability platforms: Dynatrace, AppDynamics, Splunk, ELK, Grafana.
- Experience with integration technologies: Oracle SOA/OIC, MuleSoft, Kafka/JMS, EDI.
- Familiarity with containers and cloud platforms (Docker, Kubernetes, Azure).
Professional Skills
- Exceptional problem-solving, analytical, and systems-thinking abilities.
- Strong communication skills, capable of explaining complex AI-driven and technical concepts to both technical and non-technical stakeholders.
- Experience leading incidents, facilitating postmortems, and driving cultural adoption of blameless SRE principles.
Education
- Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related field.
Supervisor: No
This position requires office presence of a minimum of 5 days per week and is only located in the location(s) posted. No relocation is offered.
Our Lead System Engineering, earns between $158,200-$237,400 USD Annual, Not to mention all the other amazing rewards that working at AT&T offers. Individual starting salary within this range may depend on geography, experience, expertise, and education/training.
Joining our team comes with amazing perks and benefits:
- Medical/Dental/Vision coverage
- 401(k) plan
- Tuition reimbursement program
- Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
- Paid Parental Leave
- Paid Caregiver Leave
- Additional sick leave beyond what state and local law require may be available but is unprotected
- Adoption Reimbursement
- Disability Benefits (short term and long term)
- Life and Accidental Death Insurance
- Supplemental benefit programs: critical illness/accident hospital indemnity/group legal
- Employee Assistance Programs (EAP)
- Extensive employee wellness programs
- Employee discounts up to 50% off on eligible AT&T mobility plans and accessories,
- AT&T internet (and fiber where available) and AT&T phone.
#LI-Onsite – Full-time office role-
Ready to join our team? Apply today.
Weekly Hours:
40Time Type:
RegularLocation:
USA:GA:Alpharetta / 500 North Point Pkwy - Adm (owned):500 North Point Pkwy, USA:TX:Plano / W Plano Pkwy - Adm:3400 W Plano Pkwy, USA:WA:Bothell / 20205 North Creek Pkwy - Adm (bothell 8):20205 North Creek PkwySalary Range:
$141,300.00 - $237,400.00It is the policy of AT&T to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, AT&T will provide reasonable accommodations for qualified individuals with disabilities. AT&T is a fair chance employer and does not initiate a background check until an offer is made.
Job ID R-98652-2 Date posted 04/08/2026Benefits
Your needs? Met. Your wants? Considered. Take a look at our comprehensive benefits.
- Paid Time Off
- Tuition Assistance
- Medical and dental plans
- Discounts
- Training & Development
Our hiring process
Apply Now
Confirm your qualifications align with the job requirements and submit your application.
Assessments
You may be required to complete one or more assessments, depending on the role.
Interview
Get ready to put your best foot forward! More than one interview may be necessary.
Conditional Job Offer
We’ll reach out to discuss a conditional job offer and the next steps to joining the team.
Background Check
Timing is important – complete the necessary actions to proceed with onboarding.
Welcome to the Team!
Congratulations! It’s time to experience #LifeAtATT.
Check your email (and SPAM) throughout the process for important messages and next steps.
Join our talent network
Didn’t find what you were looking for here? Sign up for our job alerts and get the latest AT&T news.