As a Lead Software Engineering - you’ll play a critical role in ensuring critical API platforms(Cloud/Onprem) stability, availability, and compliance while resolving high-level escalations. You’ll oversee upgrades, troubleshoot complex issues, and maintain observability, working closely with cross-functional teams to meet business and technical needs. This role requires a holistic understanding of the platform, deep technical expertise, and the ability to manage projects efficiently. Requires working with cross functional teams, collaborate with security experts, public cloud teams and network engineers to create software to implement APIs for network, consumer, and business solutions.
Experience Level: 12+ Years
Roles and Responsibilities:
Responsible for designing, deploying, and managing cloud infrastructure solutions on Microsoft Azure ensuring security, availability, and scalability of systems
Maintain a comprehensive understanding of the platform, identifying upgrade needs, resolving compliance issues, and ensuring client availability
Lead Tier 4 outage resolution, managing critical escalations to minimize downtime and restore services swiftly.
Conduct deep-dive network investigations, troubleshoot complex issues (e.g., HTTP error codes, Azure/Mulesoft networking), and configure CORA, Azure Traffic Manager, Azure Front Door, Kubernetes Ingress, DNS, and private endpoints
Manage certificates, including gateway/cluster ingestion, and sync certificates from Keyfactor to Key Vaults to clusters
Write detailed technical specifications for upgrades (e.g., AKS, RTF, NGINX, AKV2K8s, Fluentbit, Noname) and ensure alignment with technical and business requirements
Oversee observability using OpenSearch and Fluentbit, including querying, cluster management, index configuration, caching optimization, and log ingestion troubleshooting
Maintain and upgrade Azure Kubernetes (AKS) clusters, troubleshoot node pools, manage pod disruption budgets, secrets syncing, and understand daemonsets, replicasets, and deployment
Support Mulesoft integrations, perform RTF installations/upgrades, validate release notes, and troubleshoot ingress/pod templates
Remediate Astra violations across systems (Linkerd, istio,Calico, NGINX, Noname, AKV2K8s) and train teams to address violations effectively. Manage projects across multiple tools, ensuring timely progress and completion of tasks. Working with Infrastructure as Code (IaC) tools and automating tasks
Advanced troubleshooting, and performance optimization
Candidate will provide tier 3 support on a rotating basis, working closely with other teams and subject matter experts.
This candidate needs to be proactive and demonstrate the ability to analyze issues, generate ideas, and initiate action while achieving results.
Actively participate in Scrum, providing status of tasks and coordinating with project team to meet requirements.
Extended hours and weekend release work may be required.
Experience working in an environment where coordination with multiple teams is essential to success.
Primary / Mandatory skills:
Overall – 12+ years of experience in platform engineering, API platforms, Azure Kubernetes (AKS), Mulesoft, and observability tools (OpenSearch, Fluentbit)
managing Azure resources, automating deployments
Very strong written and verbal skills.
Prior experience in large API platform framework, API development using M2E, AJSC would be a plus.
Ability to prioritize individual/group work in a deadline driven environment.
Experience designing and deploying AKS environments.
Expertise in networking, troubleshooting, and configuring Azure Traffic Manager, Azure Front Door, Kubernetes Ingress, DNS, and private endpoints
Strong skills in writing technical specifications for upgrades (AKS, RTF, NGINX, AKV2K8s, Fluentbit, Noname)
Proficiency in managing certificates and syncing across systems (Keyfactor, Key Vaults, clusters)
Advanced knowledge of OpenSearch (cluster management, indexing, caching) and Fluentbit (configuration, log ingestion)
Experience with Mulesoft RTF installations, upgrades, and troubleshooting
Ability to remediate Astra violations and train teams on compliance
Strong project management skills, with experience tracking tasks across multiple platforms.
Technical Skills: Extensive expertise in Azure, Java/Python, Kubernetes, MuleSoft, OpenSearch, Fluentbit, networking, and compliance remediation.
Additional information (if any): Willing to work in Shift Duties, Willingness to learn is very important as AT&T offers excellent environment to learn Digital Transformation skills.
It is the policy of AT&T to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, AT&T will provide reasonable accommodations for qualified individuals with disabilities. AT&T is a fair chance employer and does not initiate a background check until an offer is made.
This one's for the grads and early careerists: Our leading internship and development program recruiters weigh in on how to prepare for and handle your interview.
Learn more
September 19, 2024ArticleCareer AdviceRelated Content
Go behind the scenes of our Fiber Sales team. An executive walks us through career growth, commission structure, and why a career with AT&T is more than just a job.
T&T’s India Development Centers (IDC) plays a pivotal role in AT&T’s connectivity strategy, and no one is better suited to speak to that importance more than Santosh Bijur, Vice President of the India Development Center
In our India Development Center (IDC), we’re building a talented technology team. By offering essential resources and the chance to work alongside industry leaders, our goal is to support the next generation of innovators in India.
Looking forward to staying in touch with you. We’ve always got a ton of awesome things going on and by connecting to our Talent Network, you will receive updates on #LifeAtATT, events, and opportunities.
Learn more
February 26, 2025
Benefits
Your needs? Met. Your wants? Considered. Take a look at our comprehensive benefits.