Requisition Number: 46943
Corning is one of the world’s leading innovators in materials science. For more than 160 years, Corning has applied its unparalleled expertise in specialty glass, ceramics, and optical physics to develop products that have created new industries and transformed people’s lives.
Corning succeeds through sustained investment in R&D, a unique combination of material and process innovation, and close collaboration with customers to solve tough technology challenges.
As a Data Engineer for our advanced analytics platforms, your main responsibilities will be to:
- Design and implement patterns of practice for productized, portable, modular, instrumented, CI/CD automated and highly performant data ingestion pipelines that leverage structured streaming techniques, processing both batch and streamed data in unstructured, semi-structured and structure form, using Apache Spark, Deltalake, Delta Engine, Hive and other relevant tech stacks
- Ensure that data ingestion pipelines built with these patterns validate and profile inbound data reliably, identify anomalous or otherwise unexpected data conditions, and are able to trigger appropriate remediation actions by operations staff when needed
- Work with data source domain experts both within and outside the company that understand the value delivery potential their data, and collaborate to harvest, land and prepare that data at scale
- Ensure pipelines built with these patterns are architecturally and operationally integrated with data contextualization, feature engineering, outbound data engineering and production inferencing pipelines designed by your core platform development peers
- Deliver and present proofs of concept implementations that explain the key technologies you have selected for your design and the recommended patterns of practice for ongoing development and lifecycle management. The target audience for these efforts span the company and include project stakeholders, data scientists, process experts, other core software engineering team members and relevant technical communities of practice interested in leveraging your code for their own projects
- Work with your fellow developers using agile development practices, and continually improving development methods with the goal of automating the build, integration, deployment and monitoring of ingestion, enrichment and ML pipelines
- Using your expertise and influence, help establish patterns of practice for the above, and encourage their adoption by software and data engineering teams across the company
- Work with the relevant communities of practice on component roadmaps, and serving as a trusted committer for your code for inner sourcing efforts with other development teams in the company
Education & Experience
- Advanced degree in computer science strongly preferred, but at a minimum a bachelor’s degree in computer science, engineering, mathematics, or a related technical discipline.
- 5 years of programming proficiency in, at least, one modern JVM language (e.g. Java, Kotlin, Scala) and at least one other high-level programming language such as Python
- 5+ years of full-stack experience developing large scale distributed systems and multi-tier applications
- Expert level proficiency with agile software development & continuous integration + continuous deployment methodologies along with supporting tools such as Git (Gitlab), Jira, Terraform, New Relic
- Expert level proficiency with both traditional relational and polyglot persistence technologies
- 5+ years of experience in big data engineering roles, developing and maintaining ETL and ELT pipelines for data warehousing, on-premise and cloud datalake environments
- 5+ years of production experience using SQL and DDL
- 3+ years of experience high-level Apache Spark APIs (Scala, PySpark, SparkSQL), and demonstrated strong, hands-on technical familiarity with Apache Spark architecture,
- 3+ years of developing batch, micro-batch and streaming ingestion pipelines on the Apache Spark platform, leveraging both low level RDD APIs and the higher-level APIs (SparkContext, DataFrames, DataSets, GraphFrames, Spark SQL).
- Demonstrated deep technical proficiency with Spark core architecture including physical plans, UDFs, job management, resource management, S3, parquet and Delta Lake architecture, structured streaming practices
- 3+ years DevOps experience with AWS platform services, including AWS S3 & EC2, Data Migration Services (DMS), RDS, EMR, RedShift, Lambda, DynamoDB, CloudWatch, CloudTrail
- Demonstrated experience working with inner sourcing initiatives, serving both as a trusted committer and contributor
- Strong technical collaboration and communication skills
- Unwavering commitment to coding best practice and a strong proponent of code review
- Cultural bias towards continual learning, sharing best practice, encouraging and elevating less experienced colleagues as they learn
- Proven success in communicating with users, other technical teams, and senior management to collect requirements, describe data modeling decisions and data engineering strategy
Additional Technical Qualifications
- Proficiency with functional programming methods and their appropriate use in distributed systems
- Expert proficiency with data management fundamentals and data storage principles
- Expert proficiency with AWS foundational compute services, including S3 and EC2, ECS and EKS, IAM and CloudWatch
- Prior full-stack app development experience (front-end, back-end, microservices
- Proficiency working with Ceph, Kubernetes and Docker
- Familiarity with the following tools and technology practices:
- Oracle, Microsoft SQL Server, SSIS, SSRS
- Established enterprise ETL and integration tools including Informatica, Mulesoft
- Established opensource data integration and DAG tools including NiFi, Streamsets, Airflow
- Data sources and integration solutions commonly used in manufacturing enterprises, including Pi Integrator, Maximo
- Reporting and analysis tools including PowerBI, Tableau, SAS JMP
- Strong relationship building skills
- Proven success working in highly matrix environment.
- Strong bias for action and an ability to deliver results despite the complex and fluid environment.
- Excellent analytical and decision-making abilities.
- Must have a passion for success.
- Must demonstrate a proven willingness to go the extra mile, to take on the things that need to be done and maintain a positive attitude that can adapt to change.
- Strong leadership and excellent verbal and written communications skills, with the ability to develop and sell ideas.