Site Reliability Engineer at Cowrywise
Cowrywise is a fintech company democratizing access to premium financial services by making these services available to the mass market cheaply.We’re looking for a Site Reliability Engineer (SRE) to help build, maintain, and scale the infrastructure powering Cowrywise.You’ll work closely with our engineering team to improve reliability, observability, security, and deployment processes across our systems.
Our infrastructure team specializes across four areas: Cloud, Databases, Platform, and Observability. We run primarily on AWS with some workloads on GCP. For this role, we're particularly interested in someone who can raise the bar on observability, helping us detect issues faster and resolve them with confidence.
What you’ll do
Generally, members of the infrastructure team are able to do the following
Design, maintain, and improve cloud infrastructure and internal platforms
Improve system reliability, scalability, and performance across services
Build and maintain CI/CD pipelines and deployment workflows
Implement monitoring, logging, alerting, and observability systems
Respond to incidents, troubleshoot production issues, and lead root cause analysis
Automate operational tasks and infrastructure provisioning
Work with engineering teams to improve service architecture and operational readiness
Improve security posture, access controls, and infrastructure best practices
Manage containerized workloads and orchestration platforms
Maintain disaster recovery, backup, and high availability strategies
What we’re looking for
Required
4+ years of experience in an SRE, DevOps, or Platform Engineering role running production systems
Strong hands-on experience with AWS (compute, networking, IAM, storage, managed services)
Deep expertise in observability designing meaningful metrics, dashboards, alerts, and SLOs that actually catch problems before users do
Hands-on experience with New Relic, Grafana, and Prometheus (or equivalent tooling)
A track record of reducing MTTD and MTTR through better instrumentation, alerting, and incident response practices
Proficiency with Docker and containerized workflows
Solid scripting and automation skills (Python, Bash, Go, or similar)
Experience with infrastructure-as-code (Terraform, Pulumi, or CloudFormation)
Strong Linux fundamentals and networking knowledge
Experience building and maintaining CI/CD pipelines
Comfort leading incident response and writing clear post-mortems
Nice to have
Experience operating Kubernetes in production
Exposure to GCP or multi-cloud environments
Background in one of our specialization areas: Databases (Postgres, MySQL, Redis), Platform engineering, or Cloud architecture
Security-focused experience (IAM hardening, secrets management, compliance frameworks)
Experience in fintech or other regulated, high-availability environments
The people who succeed on this team
People who are proactive and take ownership
Engineers who automate before repeating manual work
People who stay calm and methodical during incidents
Engineers who care about clean systems and operational excellence
Strong collaborators who work well across teams
Curious builders who enjoy learning and improving systems continuously
