Senior SRE

Position Overview

The SRE Lead is responsible for managing the daily operations of the SRE team and overseeing the reliability, scalability, and performance of the infrastructure and services. This role involves managing the team’s day-to-day activities, defining strategies for improving system reliability, and ensuring the team adopts best practices in automation, incident response, and infrastructure management.

Key Responsibilities

1. System Reliability and Performance 

Maintain and enhance the reliability, uptime, and performance of production systems.

Monitor system health and proactively identify performance bottlenecks and areas for improvement.

Conduct root cause analysis (RCA) and contribute to post-incident reviews to prevent recurrence.

2. Incident Response and Operations 

Participate in on-call rotations and respond to incidents to minimize downtime.

Collaborate in incident management processes including triage, mitigation, documentation, and recovery.

Develop runbooks and automation scripts to streamline troubleshooting and recovery procedures.

3. Automation and Infrastructure Optimization 

Implement and maintain Infrastructure as Code (IaC) using tools such as Terraform, Ansible, or CloudFormation.

Improve CI/CD pipelines to ensure seamless, repeatable, and reliable deployments.

Automate operational tasks to reduce manual effort and increase efficiency.

Optimize cloud resource usage for performance and cost efficiency.

4. Monitoring and Observability 

Build and maintain comprehensive monitoring, alerting, and observability solutions (e.g., Prometheus, Grafana, ELK, Datadog).

Ensure meaningful alerts and actionable metrics are in place to detect and respond to system anomalies.

Collaborate with development teams to embed observability into new services from design to deployment.

5. Cross-Functional Collaboration 

Work closely with developers, QA, and DevOps to ensure system reliability is integrated into every phase of the software lifecycle.

Partner with stakeholders to support reliable deployments and continuous delivery.

Contribute to documentation, playbooks, and process improvements.

6. Continuous Improvement and Innovation 

Identify areas of improvement in existing systems, processes, and automation frameworks.

Research and implement emerging technologies that enhance system scalability, security, and resilience.

Participate in post-mortem reviews and reliability improvement initiatives.

7. Security and Compliance 

Apply security best practices to system configuration, monitoring, and access control.

Collaborate with security teams to maintain compliance with organizational and industry standards.

Assist in vulnerability management and ensure patches or mitigations are deployed in a timely manner.

8. Reporting and Metrics 

Track, analyze, and report on system performance, reliability, and incident trends.

Use metrics-driven insights to support reliability improvements and operational excellence initiatives.

Education & Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent experience.

Preferred certifications (optional):

AWS Cloud Engineer

AWS Machine Learning Ops Engineer

Collaboration

Passion for building scalable, reliable, and secure systems in a fast-paced environment.

Ability to translate complex technical concepts into clear, actionable insights for technical teams.

Strong interpersonal skills with the ability to work effectively across cross-functional teams.

Excellent problem-solving and analytical skills.

Our recruitment philosophy

We value self-awareness and powerful communication skills in our recruitment process. We seek fiercely passionate people who understand themselves and their career goals. We're after those with the right skills and a conscious choice to join our field. The perfect fit? A trading and crypto enthusiast who’s driven, collaborative, acts with ownership and delivers solid, scalable outcomes.

Choose Where To Go Next

Want to get started?

Choose Where To Go Next

Want to get started?

About

Products

Platforms

Accounts

Promotions

Tools

Partnership

Choose Where To Go Next

Want to get started?

About

Products

Platforms

Accounts

Promotions

Tools

Partnership

About

Products

Platforms

Accounts

Promotions

Tools

Partnership

Blog

Support

Choose Where To Go Next

Want to get started?

Senior SRE