DevOps Engineer

Computaris

9 octombrie 2025

Chișinău

Peste 5 ani

Full-time

Orice studii

În locația angajatorului

We are hiring!

DevOps Engineer

Job Title: Senior Site Reliability Engineer (SRE)/ /DevOps Engineer – AWS (80%) & Azure (20%)

Position Overview

We are seeking an experienced Senior Site Reliability Engineer (SRE)/ /DevOps Engineer to design, build, and operate resilient, scalable, and secure systems across multi-cloud environments. The role emphasizes AWS expertise (80%) with a strong Azure foundation (20%). You will lead initiatives in automation, observability, incident management, and release reliability, ensuring mission-critical applications run smoothly at enterprise scale.

Key Responsibilities

Cloud Infrastructure (AWS & Azure)

Proven track record of handling high-severity incidents and driving RCA.
Architect, implement, and manage highly available, fault-tolerant infrastructure.
AWS (primary): EKS, ECS, Lambda, API Gateway, S3, RDS, DynamoDB, IAM, CloudWatch, CloudTrail, CloudFormation/Terraform.
Azure (secondary): AKS, App Services, Azure Functions, Azure Monitor, Azure DevOps Pipelines.
Implement best practices for multi-cloud security, networking, and DR/BCP.

SRE & Reliability Engineering

Define and maintain SLIs, SLOs, and SLAs across distributed systems.
Conduct capacity planning, fault-tolerance reviews, chaos engineering, and DR drills.
Lead incident response, on-call rotations, and blameless postmortems.
Continuously optimize performance, cost, and reliability.

Automation & Infrastructure as Code (IaC)

Automate infrastructure provisioning with Terraform, Helm, Ansible, and GitOps workflows.
Design and maintain CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, Azure DevOps).
Enforce policy-as-code and integrate security & compliance automation.

Observability, Monitoring & Telemetry

Build comprehensive monitoring and observability solutions: CloudWatch, Prometheus, ELK/EFK, Datadog, Grafana, Splunk, New Relic.
Implement centralized logging, distributed tracing, OpenTelemetry standards.
Enable proactive alerting, anomaly detection, and automated remediation.

Release & Incident Management

Collaborate with DevOps and engineering teams to ensure reliable, safe, and repeatable releases.
Implement blue/green, rolling, and canary deployment strategies.
Drive root cause analysis (RCA), knowledge sharing, and preventive engineering.
Establish incident playbooks and integrate with ITSM tools (ServiceNow, PagerDuty, Opsgenie).

Qualifications & Skills

7+ years in SRE / DevOps / Cloud engineering roles.
Deep AWS expertise (60%) with working knowledge of Azure (40%).
Strong proficiency with Kubernetes (EKS/AKS), containers, and microservices.
Hands-on with Terraform, Helm, CI/CD platforms, observability stacks.
Solid foundation in networking, IAM, cloud security, and compliance (SOC2, HIPAA, NIST).
Proven track record of handling high-severity incidents and driving RCA.
Preferred Certifications: