Senior Site Reliability Engineer

Software Country (ТОО Балхаш Системс)

Senior Site Reliability Engineer

Описание вакансии

We have 30 years of expertise in designing and building custom software systems. We provide software development services focusing on complex high-load applications, AI and BI solutions, and mobile apps.

Our client is a company in Luxembourg specializing in a knowledge assessment system with expertise in various areas, including academia (universities and schools).

As a DevOps Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, scalability, and performance of our systems. You will bridge the gap between development and operations by applying software engineering principles to infrastructure and operations problems. Your role will focus on automation, incident response, monitoring, capacity planning, and improving system resilience while supporting production workloads on Google Cloud Platform (GCP).

Responsibilities:

  • Design, implement, and maintain highly available, scalable, and resilient cloud-based infrastructure using Google Cloud Platform (GCP).
  • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
  • Conduct capacity planning, performance tuning, and load testing to optimize system performance.
  • Develop chaos engineering practices to identify and mitigate failure scenarios.
  • Develop and maintain Infrastructure as Code (IaC) using Terraform, Ansible, or equivalent tools.
  • Automate system provisioning, configuration management, and deployments using CI/CD pipelines (ArgoCD, GitOps, GitHub Actions).
  • Improve auto-healing and self-recovery capabilities in production environments.
  • Monitor system health and performance using Google Cloud Operations Suite (Stackdriver), Prometheus, Dynatrace, Grafana and Datadog.
  • Participate in on-call rotation, troubleshoot and resolve production incidents by applying root cause analysis (RCA).
  • Implement postmortem processes and drive corrective actions to prevent recurrence.
  • Implement and enforce security best practices, ensuring compliance with ISO 27001, SOC 2, and GDPR.
  • Apply IAM (Identity & Access Management) best practices for secure cloud operations.
  • Manage network security, including firewalls, VPNs, and service mesh (e.g., Istio).
  • Work closely with development, security, and operations teams to improve deployment strategies.
  • Advocate for blameless postmortems, knowledge sharing, and documentation improvements.
  • Lead SRE best practices adoption, including error budgeting and toil reduction.

Required experience and skills:

- 3+ years of experience in a DevOps, SRE, or Cloud Engineering role.

− Strong expertise in Google Cloud Platform (GCP) services, including GKE, Cloud Run, Cloud Functions, Cloud SQL, BigQuery, and Pub/Sub.

− Experience with Kubernetes (GKE) and container orchestration.

− Proficiency in Terraform, Helm, and Kubernetes operators for infrastructure automation.

− Strong scripting and automation skills in Python, Bash, or Go.

− Experience with monitoring, logging, and tracing tools (e.g., Google Cloud Operations Suite, Prometheus, OpenTelemetry).

− Strong understanding of CI/CD pipelines using tools like ArgoCD, Jenkins, or GitHub Actions.

− Knowledge of GitOps methodologies and IaC best practices.

− Strong experience with PostgreSQL, Redis, and NoSQL databases.

− Strong problem-solving and critical-thinking skills.

− Ability to work collaboratively in a fast-paced environment.

− Strong communication and documentation skills.

− Ability to manage incidents under pressure and work on call as needed.

− Experience with multi-cloud (AWS/GCP) and hybrid environments.

− Knowledge of site reliability engineering principles (Google SRE).

− Understanding of security best practices for cloud-native applications.

− Google Cloud Certification (Professional Cloud DevOps Engineer, Professional Cloud Architect) is a plus.

Our offer as your future employer:

  • full-time job with the flexible work schedule
  • possibility to work remotely
  • opportunities for professional growth.
Навыки
  • Kubernetes
  • Terraform
  • ArgoCD
  • DevOps
  • Google cloud
  • Английский язык
Посмотреть контакты работодателя

Похожие вакансии

RapidSeedbox ltd

Linux Support Engineer

RapidSeedbox ltd

Удаленная работа
  • Астана

  • до 1300 USD

Рекомендуем
One technologies

AQA engineer

One technologies

Удаленная работа
  • Астана

  • до 1300 USD

Рекомендуем
Trading Integral Solutions
Удаленная работа
  • Астана

  • до 7000 USD

Рекомендуем
Удаленная работа
  • Астана

  • до 4500 USD

Lead Backend Developer / Senior Backend Developer (Node.js)

Пахотина Ксения Евгеньевна

Удаленная работа
  • Астана

  • от 3000 USD

Удаленная работа
  • Астана

  • до 3500 USD

Группа компаний SILLAN
Удаленная работа
  • Астана

  • до 600000 KZT

LeverX
Удаленная работа
  • Астана

  • до 600000 KZT

Freedom Telecom Operations
Удаленная работа
  • Астана

  • до 600000 KZT

Новео
Удаленная работа
  • Астана

  • до 600000 KZT

RapidSeedbox ltd

HR Lead

RapidSeedbox ltd

Удаленная работа
  • Астана

  • до 2500 USD

Itransition

Data Fabric Engineer

Itransition

Удаленная работа
  • Астана

  • до 2500 USD

G5EN KAZ
Удаленная работа
  • Астана

  • до 2500 USD

Inter Solutions
Удаленная работа
  • Астана

  • до 1400000 KZT

red_mad_robot
Удаленная работа
  • Астана

  • до 1400000 KZT

Appak Group
Удаленная работа
  • Астана

  • от 350000 KZT

Senior Engineer

Capital Way

Удаленная работа
  • Астана

  • до 600000 KZT

Itransition

Python ML/AI Engineer

Itransition

Удаленная работа
  • Астана

  • до 600000 KZT

50K.IO
Удаленная работа
  • Астана

  • от 750000 KZT

Beeline, ТМ
Удаленная работа
  • Астана

  • от 750000 KZT

Хотите оставить вакансию?

Заполните форму и найдите сотрудника всего за несколько минут.
Оставить вакансию