Senior Big Data Engineer

Ташкент, улица Шота Руставели, 21

Описание вакансии

A publicly traded technology consulting and engineering firm that helps large enterprises solve complex business challenges through AI, data, cloud, and digital engineering. The company focuses on building scalable, high-impact solutions that improve efficiency, accelerate revenue growth, reduce costs, and manage risk. Its core capabilities include AI-driven automation, data and analytics, cloud-native platforms, DevOps, and personalized digital customer experiences. It works across multiple industries such as retail and e-commerce, financial services, manufacturing, technology and media, pharma and life sciences, and insurance, partnering with Fortune 1000 companies to deliver measurable business outcomes at enterprise scale.

Mandatory:

Readiness for evening calls (after 10 AM PST) with US more than twice per week
Strong experience in code validation and functional testing.
Prepare Unit and Integration tests
Ability to verify Spark jobs, validate code, and ensure migration results are accurate
Occasional participation in calls with US & India teams is expected, frequency can be negotiated.
Scala - primary, Python - secondary
Apache Spark (batch & streaming) – must!
Deep knowledge of HDFS internals and migration strategies.
Experience with Apache Iceberg (or similar table formats like Delta Lake / Apache Hudi) for schema evolution, ACID transactions, and time travel.
Running Spark and/or Flink jobs on Kubernetes (e.g., Spark-on-K8s operator, Flink-on-K8s).
Experience with distributed blob storages like Ceph or AWS S3 and similar
Building ingestion, transformation, and enrichment pipelines for large-scale datasets.
Infrastructure-as-Code (Terraform, Helm) for provisioning data infrastructure.

Nice to have:

Quality Assurance engineering experinece
Experience with Apache Flink
Prior experience in migration projects or large-scale data platform modernization.
Apple experience preferred (to enable him/her to get up to speed on our tooling set quickly and more independently)

Responsibilities:

Perform functional testing, code validation, and automated data quality checks for all pipeline changes.
Validate Spark/Flink job outputs to ensure correctness, completeness, and reliability of the migrated data.
Apply SQL and analytics tools (e.g., Jupyter, Superset) to verify data consistency and accuracy.
Develop and maintain data ingestion and transformation jobs; document the testing outcomes.
Implement Spark-based ETL/ELT jobs for moving data from HDFS/Hive to cloud storage.
Write clean, testable, and maintainable Scala/Python code for data pipelines.
Apply data quality checks and validations during migration.
Document technical work, including pipeline design and operational procedures.
Participate in code reviews and follow best practices defined by the team.
Support troubleshooting and bug fixing during migration activities.

What they offer:

Remote start: you will begin working fully remotely for the first 4–5 months, with transition to a hybrid format afterward.
Opportunity to influence architecture and product direction directly.
Collaboration with a highly skilled, cross-functional global team.
Flexible working hours (core working hours: 8:00 p.m. – 12:00 a.m. (GMT+5). The remaining 4 working hours can be completed flexibly before 8:00 p.m. (GMT+5) )
Competitive compensation package and long-term engagement potential.
A culture of trust, respect, and camaraderie, focused on excellence and innovation.