Site Reliability Engineer (SRE)

Department

Location

Job type

Job Description

We are looking for a motivated junior Site Reliability Engineer (SRE) to join our infrastructure and reliability team. In this role, you will help ensure the reliability, availability, and performance of our systems while learning best practices in automation, monitoring, and incident management. You will work closely with senior SREs and software engineers to support production systems and improve operational efficiency.

This role is ideal for candidates with a strong interest in cloud infrastructure, automation, and reliability engineering who are looking to grow into a full‑fledged SRE.

Responsibilities

1. System Reliability & Operations

Assist in maintaining the availability, performance, and reliability of production systems
Monitor system health using dashboards, alerts, and logs
Perform routine operational tasks such as system checks, backups, and deployments
Participate in on-call rotation with guidance from senior team members

2. Incident Management

Respond to system alerts and incidents following established runbooks
Assist in troubleshooting and resolving production issues
Support post-incident reviews (postmortems) and help document lessons learned

3. Automation & Tooling

Leveraging automation/AI-driven tools to design, implement, and/or maintain automated solution for deployment, monitoring, scaling, maintenance, for faster/better resolution and productivities
Help develop and maintain scripts and tools to automate repetitive operational tasks
Assist in improving CI/CD pipelines and deployment processes
Contribute to infrastructure automation using Infrastructure as Code (IaC) tools

4. Collaboration & Documentation

Work closely with software engineers to improve system design and operability
Maintain clear documentation for systems, procedures, and runbooks

Requirements

Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related field (or equivalent practical experience)
Basic understanding of Linux/Unix systems and networking fundamentals
Familiarity with at least one programming or scripting language (e.g. Python, Bash, Go)
Basic knowledge of cloud platforms (AWS, GCP, or Azure)
Understanding of version control systems (e.g. Git)
Strong problem-solving skills and eagerness to learn.
Basic understanding of AI products and concepts (e.g. capabilities and limitations of LLMs, common business use cases)
Communication and stakeholder management skills, with the ability to collaborate across diverse teams
Exposure to containers and orchestration tools (Docker, Kubernetes)
Experience with monitoring and observability tools (e.g. Prometheus, Grafana)

Stay updated with AXS by signing up for our newsletter

Personal

Business

About AXS

Contact Us

©2026 AXS Pte Ltd. Company Registration No. 199405882Z