Social Links is a global provider of OSINT and Open Data technologies. We are developing a modular Open Intelligence Platform that aggregates hundreds of data sources and delivers intelligence through AI agents, pipelines, and customizable workflows.
Our infrastructure spans both legacy on-premise deployments and a new AWS-based cloud-native platform. To support this transformation, we’re looking for a Lead Site Reliability Engineer who will own the full lifecycle of system reliability — from process design to hands-on implementation.
As Lead SRE Engineer, you will:
- Take ownership of our current on-premises systems and stabilize them.
- Build modern SRE practices from the ground up.
- Drive our transition to the AWS cloud (architecture, tooling, observability).
- Manage and mentor a team of DevOps and SysOps engineers.
This role is ideal for someone who wants to own reliability architecture and be a key strategic contributor to how a high-impact, AI-powered platform evolves.
Key Responsibilities:
- Define and implement SRE practices: SLO/SLA management, incident response, postmortems, alerting policies.
- Lead the team responsible for:
- On-prem infrastructure (Linux, VPNs, networking, firewalls, Zabbix).
- DevOps and CI/CD workflows.
- Platform observability (Prometheus, Grafana, Loki, Tempo).
- Architect and scale cloud-native infrastructure using AWS services:
- EC2, VPC, EKS, S3, IAM, CloudWatch, Route53, etc.
- Oversee migration of services and systems from on-prem to cloud.
- Own logging, metrics, recovery processes, DRP, and secure runtime environments.
- Implement infrastructure automation and self-healing mechanisms.
- Build internal documentation, runbooks, and operational guidelines.
- Act as mentor and leader for the reliability culture across engineering.
What We’re Looking For:
- 5+ years in infrastructure/SRE/DevOps roles, 2+ years in technical leadership.
- Expert knowledge of Linux, Bash, system automation.
- Deep understanding of core networking: VPN, TCP/IP, DNS, routing, NAT, firewalls.
- Hands-on experience with on-prem operations and modernization.
- Experience with monitoring: Zabbix, Prometheus, Grafana.
- Proven experience with AWS (high priority): EC2, IAM, VPC, EKS, S3, CloudWatch.
- Strong skills in CI/CD tooling: GitHub Actions, GitLab CI, ArgoCD, Helm, Kustomize.
- Experience implementing SRE disciplines: SLOs, error budgets, incident management.
- Proficiency in writing clear documentation and infrastructure standards.
Nice to Have:
- Experience with OpenFaaS, Kubernetes, Terraform, Ansible.
- Familiarity with SOC2, ISO 27001, GDPR compliance practices.
- Python scripting for automation.
- Experience with Vault, OPA, RBAC, and Zero Trust architectures.
Why Join Us:
- A strategic role where you define infrastructure and reliability culture from the ground up.
- Full ownership over reliability, observability, and platform resiliency.
- A growing, global, product-driven company with engineering at the center.
- Flexible remote environment with stock options and leadership visibility.
- A foundational role with a clear growth path toward Head of Infrastructure/SRE.
If you turn chaos into structure and systems into strategy — this role is for you.