A stealth-mode AI-powered Cloud-Native Health-Tech company is looking for ML/ Data Engineer.
The business domain revolves around providing Better Care for the US patients; meaning, it’s about treating patients well.
Compensation and Benefits:
Paid Time Off
The company has Unlimited PTOs Policy and compensated New Years Holidays on top of that. The misuse of the policy isn’t welcomed, though it’s definitely possible to take at least two weeks – and fully compensated – vacation, or more.
Corporate Hardware
The company provides Corporate Hardware for employees who completed their Probationary Period, as well as proven their value.
Means of Communication
We use Google Workspace, Slack, Zoom, and similar collaboration tools for messaging, meetings, and document sharing across the organization.
Cloud-Native
There’s real ability to make most solutions in a Cloud-Native and Third-Party Integrations manner, rarely spinning something self-hosted (e.g., over GKE). Fully Serverless Approach, based on Cloud Functions and Cloud Run is definitely not welcomed.
Responsibilities:
Data Engineering-
Working in a Pipeline-Driven Workflow where data transformations and integrations are validated through automated tests. All components (Airflow DAGs, dbt models, and supporting services) run in fully containerized environments with full debugging support.
Day-to-day responsibilities:
Writing style guide-compliant (and complex) SQL queries over BigQuery.
Design, build, and maintain scalable, reliable Data Pipelines (batch and streaming).
Optimize data models and queries for performance and cost.
Own data models from raw to analytics-ready layers.
Implement automated Data Quality checks, monitoring, and alerting.
Translate business requirements into robust data and ML models.
Classical ML Engineering-
In addition to core data engineering tasks, the role involves building and maintaining Data Pipelines that support ML systems and developing internal ML-based solutions for data quality and monitoring, including:
Data Clustering for exploratory Data Analysis (e.g. k-means, DBSCAN)
Data Interpolation and Imputation for handling missing or irregular data (e.g. forward/backward fill, linear or time-based interpolation, basic ML-based imputers such as kNN)
Anomaly Detection for Pipelines health, Data Volume, and Data Quality monitoring (e.g. statistical thresholds, z-score/IQR, Isolation Forest)
Data Drift Detection to monitor changes in Data Distributions and ensure Pipelines stability (e.g. PSI, KS-test, ML-based drift detection approaches)
Required experience
Data Engineering & ML Engineering
6+ years in Data Engineering and Data Analytics
3+ years in Classical ML Engineering
Ability to work on ML initiatives without supervision; from R&D and Prototyping to Shipment, Deployment, and Productivization
Upper-Intermediate English or higher; ability to present work and lead discussions with US-based teammates, customers, and stakeholders
Cloud-Native & GCP
Cloud-Native Experience - at least 3+ years. Preferably, GCP
Working in a Unix-like Development Environment (e.g., macOS, Linux)
We don’t consider Legacy-only Big Data Experts. Meaning, it’s not enough to know outdated technologies, such as Teradata, Hadoop, Spark over Hadoop, etc.
Engineering and Fundamentals
Deep knowledge of Fundamentals, such as Mathematics, Statistics, Machine Learning, Algorithms, Data Structures, Data Architecture and Data Governance, etc.
Familiarity with Managed AI (e.g., Vertex AI) is a strong advantage
Experience with Generative AI and LLMs, such as OpenAI, Gemini, Claude, Seedream, GPT Image, Veo3, Sora, is required
Strong knowledge of Python. The focus is on writing Pythonic Solutions. SonarSource software is a ready-to-use helper. It’s definitely possible to write some bits in Go or Scala, where those PLs are really applicable, though the default PL is Python
Expert-level SQL. Confident use of complex joins, CTEs, subqueries, window functions. Ability to write readable, maintainable, and testable SQL (naming, structure, comments)
Strong knowledge of Data Architecture and DBs Internals, including DDL, Clustering, Partitioning, Query Optimization, Data Modeling, SCD types, etc.
Lakehouse-first Data Engineering (BigQuery, Cloud Composer, dbt) and Decoupled Distributed Data Processing are always prioritized higher than running Imperative Solutions over GKE or Coupled Massively Parallel Processing Compute
Imperative Code Solutions – including Classical Algorithms and Data Structures – implemented over Dataflow or Spark are expected to come up only when the Lakehouse-first Approach isn’t applicable or is too costly
Компания, занимающаяся разработкой облачных технологий в сфере здравоохранения на базе искусственного интеллекта ищет Senior ML/ Data Engineer.
Повседневные обязанности:
Написание сложных SQL-запросов в BigQuery.
Проектирование, создание и обслуживание масштабируемых, надежных конвейеров данных (пакетных и потоковых).
Оптимизация моделей данных и запросов с точки зрения производительности и стоимости.
Владение моделями данных от сырых до готовых к аналитике слоев.
Внедрение автоматизированных проверок качества данных, мониторинга и оповещений.
Классический ML:
Кластеризация данных для исследовательского анализа данных (например, k-means, DBSCAN)
Интерполяция и вставка данных для обработки отсутствующих или нерегулярных данных (например, прямая/обратная вставка, линейная или временная интерполяция, базовые вставки на основе машинного обучения, такие как kNN)
Обнаружение аномалий для мониторинга работоспособности конвейеров, объема данных и качества данных (например, статистические пороги, z-score/IQR, Isolation Forest)
Обнаружение дрейфа данных для мониторинга изменений в распределении данных и обеспечения стабильности конвейеров (например, PSI, KS-тест, подходы к обнаружению дрейфа на основе ML).