Work directly with the client user community and business analysts to define and document data requirements for data integration and business intelligence applications. Determine and document data mapping rules for movement of medium to high complexity data between applications. Adhere to and promote the use of data administration standards. Support data selection, extraction, and cleansing for corporate applications, including data warehouse and data marts. Create and sustain processes, tools, and on-going support structures and processes. Extract and analyze data from specific applications, systems and/or databases to create reports and provide recommendations based on the analysis of the data. Investigate and resolve data issues across platforms and applications, including discrepancies of definition, format and function. Create and populate meta-data into repositories. Create data models, including robust data definitions, which may be entity- relationship-attribute models, star, or dimensional models. Maintain quality control and auditing of databases, resolving data problems, and analyzing system changes for quality assurance. Utilize prior experience in designing and developing Hadoop Jobs for analyzing data using MapReduce, Spark, Hive and Pig. Apply prior experience in massively parallel processing databases such as Teradata and Vertica. Maintain encrypted data elements in compliance with company standards. Utilize past experience to build Linux shell scripts to handle diverse variety of requests. Implement advanced procedures like text analytics and processing using the in-memory computing capabilities, such as Apache Spark. Understand Big data, Hadoop architecture and various components such as HDFS, YARN, Resource manager (Job tracker), Node manager (Task Tracker), Name Node, Data Node and MapReduce programming paradigm. Install, configure, and use Hadoop ecosystem components such as Hadoop MapReduce, HDFS, Oozie, Hive, Sqoop, Pig, Spark Zookeeper, Flume. Import and export data using Sqoop from HDFS to Relational Database Systems and vice-versa. Tune and optimize Hadoop, Hive, Spark Jobs and SQL queries for the effective utilization of the cluster resources, reduce operational cost and improve system performance. Extract, cleanse, Transform, and load the data from Heterogeneous sources like Online transactional and analytical data, Flat files, Amazon S3 buckets, server logs from different servers, vendor-specific API's and custom scripts to Hadoop distributed file systems/Data lake. Support Batch and Real-time processing. Convert business process into RDD transformations using Apache Spark. Write Producers/Consumers and create messaging centric applications using Apache Kafka. Work with variety of Database like Teradata, Vertica, Oracle and MySQL Develop web applications using Java. Communicate the data gaps and issues with the upstream source systems, down streams, reporting and business units. Develop technical documents and conduct user training on procedures and technologies. Learn and adapt quickly and to correctly apply new tools and technology. Encompass strong communication and analytical skills with relevant past experience in programming and problem solving. Collaborate with designing schedule jobs using IBM workflow scheduler and Autosys schedular tools. Design and develop Hadoop Jobs for analyzing data using MapReduce, Spark, Hive and Pig Participate in all testing phases: testing, QA, System Integration Testing (SIT) and User acceptance.
AT&T is an Affirmative Action/Equal Opportunity Employer, and we are committed to hiring a diverse and talented workforce. EOE/AA/M/F/D/V *np*