Site Reliability Engineer (SRE) at Renmoney
At Renmoney, we believe finance should be simple, useful and accessible to everyone. Thatâs what makes us really passionate about leveraging data driven insights to help us understand you better and build useful financial products for your personal and business needs – like convenient loans to help you do more today, savings to keep you on track for your goals and investments thatâll generate more money for you.The Role
The Site Reliability Engineer (SRE) is responsible for ensuring the availability, reliability, scalability, and performance of business-critical applications and infrastructure. The role combines software engineering and operations expertise to automate processes, improve platform stability, and enhance system observability.
What You Will Do
Design, implement, and maintain highly available and scalable infrastructure.
Monitor production systems and proactively identify performance bottlenecks.
Manage incident response, root cause analysis (RCA), and problem management activities.
Develop automation scripts and tools to improve operational efficiency.
Implement and maintain CI/CD pipelines.
Manage cloud infrastructure across AWS and hybrid environments.
Configure and maintain observability platforms including monitoring, logging, and alerting solutions.
Define and track SLIs, SLOs, and error budgets.
Support application deployments and release management processes.
Collaborate with Engineering, Security, Data, and Product teams to improve system reliability.
Perform capacity planning and disaster recovery testing.
Ensure infrastructure and systems comply with security and regulatory requirements.
Requirements
What You Bring
Education
Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
Experience
4–7 years of experience in Site Reliability Engineering, DevOps, Cloud Engineering, or Infrastructure Operations.
Experience supporting mission-critical financial services or fintech platforms is an advantage.
Technical Skills
Strong knowledge of AWS services (EC2, ECS/EKS, RDS, Lambda, VPC, IAM, CloudWatch).
Experience with Infrastructure as Code (Terraform, CloudFormation).
Knowledge of containerization technologies (Docker, Kubernetes).
Experience with CI/CD tools (GitHub Actions, GitLab CI/CD, Jenkins, Azure DevOps).
Experience with monitoring tools such as Datadog, Prometheus, Grafana, New Relic, or ELK Stack.
Strong Linux administration skills.
Experience with scripting languages (Python, Bash, PowerShell).
Understanding of networking, DNS, load balancing, VPNs, and security controls.
Preferred Certifications
AWS Certified Solutions Architect.
AWS SysOps Administrator.
Kubernetes Certifications (CKA/CKAD).
HashiCorp Terraform Associate.
Key Competencies
Problem-solving and analytical thinking.
Incident management and troubleshooting.
Automation mindset.
Strong communication and collaboration.
Attention to detail.
This Role Is Ideal For You If
You enjoy solving complex infrastructure and reliability challenges.
You are passionate about automation and reducing operational overhead.
You thrive in highly available, customer-facing environments where up-time matters.
You enjoy working across Engineering, Security, Data, and Product teams to improve system performance.
You are proactive and constantly seek opportunities to improve reliability, scalability, and efficiency.
You May Not Enjoy This Role If
You prefer manual processes over automation.
You are uncomfortable responding to production incidents and troubleshooting critical issues.
You prefer working in isolated environments with limited collaboration.
You are not interested in continuous learning and evolving cloud technologies.
