As a Big Data Systems Operations Engineer, you will operate and support our diverse Data Pipeline Platform, which consists of large Hadoop, HBase and Kafka clusters in an all Linux environment. The platform currently ingests 300TB of new data and runs 20,000 ETL jobs every day across 8 Hadoop, 5 HBase and 6 Kafka Clusters.
About the Team:
This growing team consists of curious, passionate, talented technologists who enjoy working on complex, large scale distributed file and messaging systems.
Our motto is to move fast and sustain optimal uptime. Our team members thrive in a learn and teach environment. Each team member is encouraged to explore solutions and efficiencies to support, optimize and maintain our systems. We are enthusiastic about automation and optimization.
The team is managing over 2,000 Linux servers via extensive automation tools.
Roles and Responsibilities.
• Support a complex Data Pipeline Platform by monitoring, maintaining, provisioning and upgrading Hadoop, HBase, Kafka, Graph and ETL systems using proprietary automation tools.
• Develop new tools to automate routine day-to-day tasks, such as security patching, software upgrades and hardware allocation. Utilize automated system monitoring tools to verify the integrity and availability of all hardware, server resources, and critical processes.
• Troubleshoot and analyze hardware or software failures and provide solutions to recovery. Identify and resolve faults, inconsistencies and systemic issues.
• Collaborate with engineering team partners to resolve complex system performance issues.
• Participate in on-call rotation, responding to alerts and system issues
Job Specification (Qualifications/Description)
• 3+ years of relevant experience in implementing, troubleshooting and supporting the Unix/Linux operating system with concrete knowledge of system administration/internals
• 3+ years of relevant experience in scripting/writing/modifying code for monitoring/deployment/automation in one of the following (or comparable): Python, Shell
• 3+ years of relevant experience with any of the following technologies: Hadoop-HDFS, Yarn-MapReduce, HBase, Kafka
• 3+ years of relevant experience with any of the following technologies: Puppet or equivalent configuration management tool
• Familiar with TCP/IP networking DNS, DHCP, HTTP etc.
• Good written and oral communication skills
• Some experience with Nagios or similar monitoring tools
• Some experience with data collection/graphing tools like Graphite and Grafana
Should be flexible to work in 24* shifts
More About You:
• You are passionate about a culture of learning and teaching. You love challenging yourself to constantly improve, and sharing your knowledge to empower others
• You like to take risks when looking for novel solutions to complex problems. If faced with roadblocks, you continue to reach higher to make greatness happen
• You care about solving big, systemic problems. You look beyond the surface to understand root causes so that you can build long-term solutions for the whole ecosystem
• You believe in not only serving customers, but also empowering them by providing knowledge and tools
Interesting to work with AT&T which always expected their employees to groom.
You will always get the credit for your work and also will be appreciated. Even in this much of big team you will always be recognisable.
Deadlines they are giving is very short period. So, always need to extend our times in the office and sometimes need to work on holidays.Current Employee - QA Tester
- One Star Rating
- Two Star Rating
- Three Star Rating
- Four Star Rating
This is the life – the #LifeAtATT, that is. We’re creating what’s next and having a blast doing it. You’re looking for proof? Well, see for yourself.