Sr Software Site Reliability Engineer (SRE) - Problem Management & RCA at AT&T in Hyderabad
Skip to Main Content
Drive innovation that's felt around the world. Image: woman smiling.

Sr Software Site Reliability Engineer (SRE) - Problem Management & RCA

Hyderabad, India


About the Company:

At AT&T, we’re connecting the world through the latest tech, top-of-the-line communications and the best in entertainment. Our groundbreaking digital solutions provide intuitive and integrated experiences for millions of customers across online, retail and care channels. Join our mission to deliver compelling communication and entertainment experiences to customers around the world as we continue to evolve as a technology-powered, human-centered organization. As part of our team, you’ll transform the way we deliver a seamless customer experience with digital at the center of all you do. In our world, digital is much larger than just an eCommerce channel, we are transforming all channels to digitally perform as one team to create a better customer experience. As we move through 2021, the digital transformation will revolutionize the digital space and you can build a career that will propel your future.

About the Team:

The mission of our Digital Operations team is to operate a fault resilient, customer-centered, proactive DevOps team. The team is responsible for supporting systems that deliver AT&T’s customer experience, across multiple internet-facing eCommerce applications, databases, platforms and technology stacks. Our customer-journey centric Ops team is made up of Ops Engineers as well as Site Reliability Engineers (SREs) who are all focused on ensuring a highly available, resilient, performant and secure customer experience.

Job Summary:

This is an exciting, hands-on, Sr Software Site Reliability Engineer (SRE) position responsible for providing 24/7 Problem/Incident Mgmt, Root Cause Analysis (RCA) assessments and functional triage, customer impact assessments, for Consumer Online Sales, Account Management, and Support websites and mobile apps. This position requires the use of technical skills and tools with a focus on our customer-journey centric mindset and model.

This position involves the understanding of Site Reliability Engineering (SRE) within the SPT Operations by moving from a traditional Ops focus to a DevOps focus. This position will also require active engagement with other teams and partners to help find resolution to production defects and issues reported by stakeholders including Business, Product owners, and leadership. This position will also require support for events such as Iconic Launches, and heightened support of SPT defect and incident triage & status updates during any major launch events.

Roles and Responsibilities:

  • RCA support for incidents
  • Analyze past incidents and develop & execute steps to avoid repetition / eliminate occurrence in future 
  • Analyze various logs and corelate to proactively identify potential issues and devise mitigation
  • Product Defect Triage and Analysis
  • Defect Management
  • Functional Triage
  • Provide customer impact assessment
  • Support for major launch events
  • Partner with downstream, business, and product partners to resolve issues.

Shift timing (if any):

  • Shift falls typically between 6 am to 10 PM India standard time. Occasionally may have to work long hours in situations when it is needed.

Location: Hyderabad

Note - Immediate / early joiners will be preferred.

Primary / Mandatory skills:

  • Overall Experience: 7+ years in IT related experience in that 5+ years of experience in Problem management background in telecom, Cloud and ecommerce. 
  • Development & Scripting knowledge in a mix of EFK/ELK/Splunk, Quantum Metric (QM)/Dynatrace/CatchPoint: 3 - Intermediate (practical application)
  • Java/Unix/SQL: 3 - Intermediate (practical application)
  • Proven ability to communicate via email, phone, chat room etc.: 2 - Novice (limited experience)
  • Excellent written and verbal English communication skills to work in a Global team
  • Proven ability to drive issues/defects to resolution in a challenging environment

Secondary / Desired skills:

  • Telecom Domain: Business Support Systems- (BSS): 2 - Novice (limited experience)
  • Understanding of customer journey and products: 2 - Novice (limited experience)
  • Amdocs Product Working experience (Telegence, Bill formatters, CRM, OMS) will be a plus

Additional information (if any): Willing to work in Shift Duties, Willingness to learn is very important as AT&T offers excellent environment to learn Digital Transformation skills such as cloud, Big data, AI, Full stack etc.

Education Qualification: Bachelor’s/ Masters degree in Computer Science or related field

Certifications (if any specific):  Any Certification related to Primary / Mandatory Skills


  • 5+ with Problem management background in telecom, cloud and ecommerce.
  • 7+ years IT related experience
  • Experience providing data/information to business leaders
  • Experience working in a large scale technically diverse organization

AT&T is leading the way to the future – for customers, businesses and the industry. We're developing new technologies to make it easier for our customers to stay connected to their world. Together, we’ve built a premier integrated communications and entertainment company and an amazing place to work and grow.  Team up with industry innovators every time you walk into work, creating the world you always imagined. Ready to #transformdigital with us? Apply now!

Job ID 2119925I Date posted 04/07/2021

Interesting to work with AT&T which always expected their employees to groom.


You will always get the credit for your work and also will be appreciated. Even in this much of big team you will always be recognisable.


Deadlines they are giving is very short period. So, always need to extend our times in the office and sometimes need to work on holidays.

Current Employee - QA Tester
  • One Star Rating
  • Two Star Rating
  • Three Star Rating
  • Four Star Rating


This is the life – the #LifeAtATT, that is. We’re creating what’s next and having a blast doing it. You’re looking for proof? Well, see for yourself.

Back to top