
DevOps Engineer
9 octombrie 2025
We are hiring!
DevOps Engineer
Job Title: Senior Site Reliability Engineer (SRE)/ /DevOps Engineer – AWS (80%) & Azure (20%)
Position Overview
We are seeking an experienced Senior Site Reliability Engineer (SRE)/ /DevOps Engineer to design, build, and operate resilient, scalable, and secure systems across multi-cloud environments. The role emphasizes AWS expertise (80%) with a strong Azure foundation (20%). You will lead initiatives in automation, observability, incident management, and release reliability, ensuring mission-critical applications run smoothly at enterprise scale.
Key Responsibilities
Cloud Infrastructure (AWS & Azure)
- Proven track record of handling high-severity incidents and driving RCA.
- Architect, implement, and manage highly available, fault-tolerant infrastructure.
- AWS (primary): EKS, ECS, Lambda, API Gateway, S3, RDS, DynamoDB, IAM, CloudWatch, CloudTrail, CloudFormation/Terraform.
- Azure (secondary): AKS, App Services, Azure Functions, Azure Monitor, Azure DevOps Pipelines.
- Implement best practices for multi-cloud security, networking, and DR/BCP.
SRE & Reliability Engineering
- Define and maintain SLIs, SLOs, and SLAs across distributed systems.
- Conduct capacity planning, fault-tolerance reviews, chaos engineering, and DR drills.
- Lead incident response, on-call rotations, and blameless postmortems.
- Continuously optimize performance, cost, and reliability.
Automation & Infrastructure as Code (IaC)
- Automate infrastructure provisioning with Terraform, Helm, Ansible, and GitOps workflows.
- Design and maintain CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, Azure DevOps).
- Enforce policy-as-code and integrate security & compliance automation.
Observability, Monitoring & Telemetry
- Build comprehensive monitoring and observability solutions: CloudWatch, Prometheus, ELK/EFK, Datadog, Grafana, Splunk, New Relic.
- Implement centralized logging, distributed tracing, OpenTelemetry standards.
- Enable proactive alerting, anomaly detection, and automated remediation.
Release & Incident Management
- Collaborate with DevOps and engineering teams to ensure reliable, safe, and repeatable releases.
- Implement blue/green, rolling, and canary deployment strategies.
- Drive root cause analysis (RCA), knowledge sharing, and preventive engineering.
- Establish incident playbooks and integrate with ITSM tools (ServiceNow, PagerDuty, Opsgenie).
Qualifications & Skills
- 7+ years in SRE / DevOps / Cloud engineering roles.
- Deep AWS expertise (60%) with working knowledge of Azure (40%).
- Strong proficiency with Kubernetes (EKS/AKS), containers, and microservices.
- Hands-on with Terraform, Helm, CI/CD platforms, observability stacks.
- Solid foundation in networking, IAM, cloud security, and compliance (SOC2, HIPAA, NIST).
- Proven track record of handling high-severity incidents and driving RCA.
- Preferred Certifications:
- AWS Solutions Architect – Professional
- Azure Solutions Architect Expert
- Certified Kubernetes Administrator (CKA)
What We Offer
- Opportunity to work on cutting-edge multi-cloud platforms (AWS + Azure).
- Lead enterprise-scale reliability engineering initiatives.
- A culture of ownership, innovation, and continuous improvement.
- Exposure to automation-first, observability-driven operations.
E-mail:
recruitmentmd@eu.rsystems.com
Joburi similare
Alte joburi ale companiei
Alte joburi ale companiei






Joburi similare
Alte joburi ale companiei
Alte joburi ale companiei





