About the project
We are working in an ITIL based model in a large range of client industries, providing multiple services across different Business Lines. Through this project you will gain experience in working in a flexible, multi-cultural and agile environment, as well as develop technical and interpersonal skills.
Your team
We are a team of 10 colleagues, working 24/7 in shifts. At the moment, we are growing, looking forward in taking more responsibilities, hence the new opening. The team members are vibrant, proactive, and always there to help you.
Job Description
- Optimize System Control: Your responsibility will be to ensure the smooth operation of central systems (z/OS) and subsystems, as well as the server infrastructure (Linux, Unix, Windows), monitoring and controlling them in a 24/7 shift environment to ensure high availability.
- Capable to propose Tailored Solutions: You will propose industry-specific observability solutions based on client requirements, aligning technical capabilities with business needs.
- Collaborate Across Teams: You will work closely with cross-functional teams to gather requirements and implement observability solutions. You will also contribute to team knowledge sharing and support the development of best practices across the organization.
- Streamline Event Management: You will independently manage event processing in alignment with support processes, making sure that all incidents are tracked and resolved in an organized manner to minimize downtime.
- Automate Operational Tasks: You will automate routine monitoring, integration, and reporting processes, driving efficiency and reducing manual workload.
- Lead Incident Reviews: You will lead incident review sessions and support root cause investigations, ensuring thorough analysis and continuous improvement in system reliability.
Qualifications
- Willingness to work in a 24/7 rotating shift schedule (day shifts, night shifts during both weekdays and weekends)
- Strong proficiency with observability tools (Prometheus, Grafana, ELK, OpenTelemetry) for system monitoring and performance optimization along with experience in event correlation, managed services, and performance reporting.
- Familiarity with automation scripting, CI/CD processes, and cloud platforms (e.g., AWS, Azure, GCP), combined with analytical, troubleshooting, and root cause analysis skills to support efficient incident resolution and system reliability.
- Solid understanding of ITIL/ITSM frameworks and technical documentation standards,
- Strong communication and consulting skills for effective collaboration with cross-functional teams.
- Good English skills - spoken and written