A publicly traded technology consulting and engineering firm that helps large enterprises solve complex business challenges through AI, data, cloud, and digital engineering. The company focuses on building scalable, high-impact solutions that improve efficiency, accelerate revenue growth, reduce costs, and manage risk. Its core capabilities include AI-driven automation, data and analytics, cloud-native platforms, DevOps, and personalized digital customer experiences. It works across multiple industries such as retail and e-commerce, financial services, manufacturing, technology and media, pharma and life sciences, and insurance, partnering with Fortune 1000 companies to deliver measurable business outcomes at enterprise scale.
Mandatory:
- Readiness for evening calls (after 10 AM PST) with US more than twice per week
- Strong experience in code validation and functional testing.
- Prepare Unit and Integration tests
- Ability to verify Spark jobs, validate code, and ensure migration results are accurate
- Occasional participation in calls with US & India teams is expected, frequency can be negotiated.
- Scala - primary, Python - secondary
- Apache Spark (batch & streaming) – must!
- Deep knowledge of HDFS internals and migration strategies.
- Experience with Apache Iceberg (or similar table formats like Delta Lake / Apache Hudi) for schema evolution, ACID transactions, and time travel.
- Running Spark and/or Flink jobs on Kubernetes (e.g., Spark-on-K8s operator, Flink-on-K8s).
- Experience with distributed blob storages like Ceph or AWS S3 and similar
- Building ingestion, transformation, and enrichment pipelines for large-scale datasets.
- Infrastructure-as-Code (Terraform, Helm) for provisioning data infrastructure.
Nice to have:
- Quality Assurance engineering experinece
- Experience with Apache Flink
- Prior experience in migration projects or large-scale data platform modernization.
- Apple experience preferred (to enable him/her to get up to speed on our tooling set quickly and more independently)
Responsibilities:
- Perform functional testing, code validation, and automated data quality checks for all pipeline changes.
- Validate Spark/Flink job outputs to ensure correctness, completeness, and reliability of the migrated data.
- Apply SQL and analytics tools (e.g., Jupyter, Superset) to verify data consistency and accuracy.
- Develop and maintain data ingestion and transformation jobs; document the testing outcomes.
- Implement Spark-based ETL/ELT jobs for moving data from HDFS/Hive to cloud storage.
- Write clean, testable, and maintainable Scala/Python code for data pipelines.
- Apply data quality checks and validations during migration.
- Document technical work, including pipeline design and operational procedures.
- Participate in code reviews and follow best practices defined by the team.
- Support troubleshooting and bug fixing during migration activities.
What they offer: -
Remote start: you will begin working fully remotely for the first 4–5 months, with transition to a hybrid format afterward.
-
Opportunity to influence architecture and product direction directly.
-
Collaboration with a highly skilled, cross-functional global team.
-
Flexible working hours (core working hours: 8:00 p.m. – 12:00 a.m. (GMT+5). The remaining 4 working hours can be completed flexibly before 8:00 p.m. (GMT+5) )
-
Competitive compensation package and long-term engagement potential.
-
A culture of trust, respect, and camaraderie, focused on excellence and innovation.