The System Administrator II at Xandr is responsible for effective provisioning, installation, configuration, operation, and maintenance of systems hardware and software and related infrastructure. This individual participates in technical research and development to enable innovation within the infrastructure. This individual ensures that system hardware, operating systems, software systems, and related procedures adhere to organizational values.
About the team:
The System Operations team at Xandr maintains operation of over 8,000 Linux servers, distributed across 6 data centers globally. The systems run various distributed applications such as Kubernetes, Nginx, and Artifactory, as well as more traditional applications such as DHCP, DNS, and Kickstart. Responsibilities on these systems include system administrative engineering and provisioning, operations and support, maintenance and research and development to ensure the platform adapts or exceeds business needs. System Administrators on the team will assist project teams with technical issues in the initiation and planning phases. These activities include the definition of needs, benefits, and technical strategy, research & development within the project life-cycle, technical analysis and design, and support of operations staff in executing, testing and rolling-out the solutions. Participation on projects is focused on smoothing the transition of projects from development staff to production staff by performing operations activities within the project life-cycle.
About the job:
System Administration Engineering and Provisioning
- Engineer and provide proof-of-concept technical solutions for various project and operational needs.
- Manage servers and configure hardware, services, settings, storage, etc. in accordance with standards and project or operational requirements.
- Research and recommend innovative, and where possible, automated approaches for system administration tasks. Identify approaches that leverage our resources and provide economies of scale.
- Identify areas of operation where automation can increase efficiency and decrease human error and implement a solution to do so
- Evaluate new versions of software/technologies and provide and implement any changes and tasks necessary to leverage it for operations or project needs
- Identify potential security risks and propose practical mitigation measures
- Assess several, often conflicting constraints and make rapid decisions in a dynamic environment
- Create, verify, and review patches to the software that runs the infrastructure in the form of pull-requests
- Provide technical leadership in planning, development, and execution of software efforts
Operations and Support
- Ensure the integrity and availability of all hardware and key services by utilizing monitoring tools, log aggregation tools, and customer reports
- Ensure business data integrity by supporting our storage systems and performing any maintenance tasks necessary to prevent data loss (hardware repairs, fire drills, integrity checks)
- Review security reports to identify any possible violations on a regular cadence
- Provide support per requests from various constituencies. Investigate and troubleshoot any issues reported.
- Repair and recover from hardware or software failures. Coordinate and communicate with impacted constituencies.
- Provide on-call support (escalations from Level 1 Support Team)
- Maintain operations runbooks, configuration, or other procedures.
- Perform periodic performance reporting to support capacity planning.
- Perform ongoing performance tuning, hardware upgrades, and resource optimization as required. This requires using various performance tuning tools to identify bottlenecks internal and external to the system.
- Provide support for datacenter maintenance and operations as needed.
About your skills:
- 5+ years of Linux experience in supporting Debian-based distributions such as Ubuntu
- 5+ years writing scalable tools using scripting languages such as Perl, python and shell
- 5+ years in configuration management tools such as Puppet, Ansible, and Terraform
- 5+ years of managing storage systems running ZFS or CephFS
- 3+ years of deploying and administering repository managers, especially with JFrog Artifactory
- 3+ years of using monitoring tools such as Nagios and Sensu
- 1+ years of deploying and administering systems using container technologies, especially with Kubernetes and Docker, as well as Helm, Spinnaker, Prometheus, Calico, Flannel, Fluentd, and influxdb
- 2+ years of building and managing Debian software packages from source, including creation of Makefiles.
- Familiarity with Git and other source control tools are required
- Familiarity with using AWS or Azure is preferred but not required
- Familiarity with configuring NGINX and Kerberos is preferred but not required
- Familiarity with log management tool such as Splunk or SumoLogic is preferred but not required
More about you:
- You are passionate about a culture of learning and teaching. You love challenging yourself to constantly improve, and sharing your knowledge to empower others
- You like to take risks when looking for novel solutions to complex problems. If faced with roadblocks, you continue to reach higher to make greatness happen
- You care about solving big, systemic problems. You look beyond the surface to understand root causes so that you can build long-term solutions for the whole ecosystem
- You believe in not only serving customers, but also empowering them by providing knowledge and tools
- You believe in solving problems, not fixing them
Associate Director Technology Development
Opportunity to work on cutting edge technologies.
Support for women in technical leadership roles.
Pride in diversity & inclusion with 12 Employee Resource Groups with 40k+ members.
Great benefits including 4+ weeks vacation, 6% salary match of 401k, paid maternity/paternity leave, financial support for adoption.
Flexibility to work from home or office in newly renovated collaboration zones.
Lots of opportunity to move around the company & work on new products.
Process heavy with lots of administrative overhead.Current Employee - Associate Director Technology Development
- One Star Rating
- Two Star Rating
- Three Star Rating
- Four Star Rating
This is the life – the #LifeAtATT, that is. We’re creating what’s next and having a blast doing it. You’re looking for proof? Well, see for yourself.