company-logo-image

Regional Site Reliability Engineer (SRE)

ashley-avatar-image

AI-generated summary

beta

This job is a Regional Site Reliability Engineer (SRE)! You might like this job because you'll keep online services running smoothly, tackle issues, and work with cutting-edge tech like AWS and containers, ensuring systems are reliable and fast.

Undisclosed

Glenmarie, Selangor

Job Description

 Job Responsibilities: 

  • Ensure high availability and performance of production services across multiple regions
  • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets
  • Lead incident response and root cause analysis (RCA) for production issues
  • Improve system resilience through fault tolerance, redundancy, and graceful degradation
  • Operate and optimize containerized services running on AWS ECS
  • Manage cloud infrastructure including:
  • ◦ AWS ECS
  • ◦ Application Load Balancers
  • ◦ Auto Scaling
  • ◦ CloudWatch
  • ◦ VPC networking
  • Ensure reliable deployment pipelines and infrastructure consistency
  • Support and optimize Go and Node.js microservices
  • Improve service performance, scalability, and fault tolerance
  • Implement health checks, circuit breakers, and retry strategies
  • Collaborate with development teams to improve service architecture
  • Implement and maintain observability systems including:
  • ◦ Metrics
  • ◦ Logging
  • ◦ Distributed tracing
  • Build dashboards and alerts to detect system issues early
  • Improve monitoring using tools such as:
  • ◦ Prometheus / Grafana
  • ◦ AWS CloudWatch
  • ◦ OpenTelemetry
  • Build and maintain CI/CD pipelines for microservices
  • Automate infrastructure and operational tasks using:
  • ◦ Infrastructure as Code (Terraform / CloudFormation)
  • ◦ Scripts or internal tooling
  • Improve deployment reliability and reduce manual intervention
  • Participate in on-call rotations
  • Drive blameless postmortems
  • Implement preventive actions to eliminate recurring incidents
  • Continuously improve operational runbooks and response processes

Job Requirements

 Qualification & Experiences

  • 4+ years experience in Site Reliability Engineering, DevOps, or Production Engineering
  • Experience supporting distributed microservices architecture
  • Experience operating high-traffic production systems

Technical Skills:

Cloud & Infrastructure Management

  • Primary Platform: Extensive experience with AWS ecosystem management.
  • Compute & Orchestration: Hands-on expertise in AWS ECS (Fargate & EC2 launch types) and Docker containerization.
  • Networking: Proficient in VPC configuration, Application Load Balancers (ALB), and Network Load Balancers (NLB).
  • Scaling: Experienced in managing Auto Scaling to maintain high-traffic production environments.

Programming & Automation

  • Application Support: Proven ability to support and optimize distributed microservices written in Go (Golang) and Node.js.
  • Automation & Scripting: Skilled in developing internal tools and automation scripts using Go, Python, and Bash.
  • Infrastructure as Code (IaC): Experienced in automating infrastructure using Terraform or CloudFormation.

Additional:

  • Experience managing multi-region infrastructure
  • Experience operating high-scale microservices systems
  •  Knowledge of service mesh architectures
  • Experience with AWS ECR, Lambda, or event-driven architecture
  • Experience with cost optimization in AWS

Skills

AWS CloudFormation
Cloud Infrastructure
AWS Devops
Site Reliability Engineering

Company Benefits

Employee Discount

Enjoy employee discounts on beverage, merchandise, etc at all outlets across Malaysia.

Employee Perk Programmes

Establishment of corporate benefits to offer exclusive discounts or benefits to each employee.

Health and Wellness

Out-patient care and in-patient care are covered for all employee including ongoing wellness programs & activites.

Career Development

Job training and continuing education help to fuel employee career growth.

Extension Leave Benefits

Provide more generous with their leave days. We have more than 6 other types of leave!


Additional Info

Company Activity

Last active - few minutes ago

Career Level

Senior Executive

Job Specialisation


Company Profile

ZUS COFFEE-logo-image

ZUS COFFEE

For many, coffee is a daily need. Specialty coffee, however, is often seen as a luxury, something you treat yourself to only on special occasions.We started ZUS Coffee to change this perception.With the best quality ingredients, high-level coffee brewing technology and innovative business model, we’re evolving the concept of coffee consumption to make specialty coffee affordable for everyone, everyday.a...
Upload Resume