Site Reliability Engineer

A black and white icon of a location pin on a white background.

Location:

PL
A black and white icon of a camera on a white background.

Salary:

A black and white clock icon on a white background.

Type:

Full Time
A black and white icon of a calendar on a white background.

Closing Date:

Site Reliability Engineer

Ledgebrook is a tech-enabled E&S MGA on a mission to modernize Specialty insurance. The industry is burdened with legacy technology and inefficient processes, preventing innovation at scale. We are changing that. Our goal is to become the best-in-class full-stack insurance and re/insurer, leveraging AI and data-driven insights to revolutionize underwriting, pricing, and risk selection.


We believe in talent density - fewer, better people working together as one. We win as a team, and our success is shared through generous equity packages for all employees.


About the Role

We’re seeking a  Site Reliability Engineer (SRE)  to ensure the reliability, scalability, and performance of Ledgebrook’s cloud-native infrastructure and applications. You’ll be at the forefront of operational excellence, driving automation, observability, and stability across our platforms.

This role combines infrastructure engineering, systems automation, and software development practices to proactively build resilient systems and reduce downtime, directly influencing our ability to scale rapidly and reliably.


What You’ll Work On

  • Reliability & Scalability  – Architect and maintain highly available, scalable cloud infrastructure to ensure the seamless operation of Ledgebrook's production and internal environments.
  • Observability & Monitoring  – Implement and optimize monitoring, logging, tracing, and alerting systems to proactively detect and resolve issues before they impact business operations.
  • Incident Response & Management  – Lead and participate in incident response efforts, performing root cause analysis and implementing corrective actions and improvements to prevent recurrence.
  • Infrastructure as Code (IaC)  – Automate infrastructure deployments and management using IaC tools (Terraform), ensuring consistency, repeatability, and security.
  • Continuous Integration & Delivery (CI/CD)  – Develop and maintain robust CI/CD pipelines, accelerating software delivery while maintaining system stability and security.
  • Performance Optimization  – Identify performance bottlenecks and work cross-functionally to optimize application and infrastructure performance.
  • Security & Compliance  – Ensure infrastructure adheres to best practices for security, compliance, and disaster recovery planning, maintaining up-to-date documentation and procedures.


About You

Here at Ledgebrook we’re passionate about creating a team that thrives on continuous learning and shares our excitement about building a company from the ground up. We’re looking for people to join us with:


  • A passion for delivering  world-class customer service  to our internal and external customers
  • Intellectual curiosity  and a strong desire for  innovation, rather than following the status quo
  • A hunger for  continuous learning  and opportunities to grow 
  • Agile  prioritization  skills coupled with a keen sense of urgency - we balance getting it right with getting it done right now
  • A strong drive and desire to  win togethe r as a high-performing team
  • A moral compass to " do the right thing, period." We have zero tolerance for toxic behaviors
  • An eagerness to actively  participate and connect  with the whole team, across remote locations
  • An  honest, transparent  communication style
  • A proactive,  solution-oriented  mindset. We don’t look for blame, we look for the solution


Tech Stack

  • Infrastructure & Cloud:  AWS (ECS, EKS, Lambda, S3, RDS, CloudWatch, IAM)
  • IaC & Automation:  Terraform, CloudFormation
  • CI/CD & Containerization:  GitHub Actions, Docker, Elastic Container Services
  • Observability & Monitoring:  Datadog, Sentry
  • Languages & Scripting:  Python, Bash, JavaScript/TypeScript


Must Haves

  • 5+ years experience in Site Reliability Engineering, DevOps, or infrastructure-focused engineering roles
  • Extensive hands-on experience with AWS infrastructure and services
  • Strong knowledge of container orchestration (Docker, Kubernetes, ECS)
  • Experience building and maintaining CI/CD pipelines
  • Proficiency in infrastructure automation (Terraform or CloudFormation)
  • Solid scripting/coding experience (Python, Bash, JavaScript/TypeScript)


Nice to Haves

  • Experience in insurance, fintech, or regulated industries
  • Familiarity with Socotra or other insurance-specific platforms
  • Background in managing production environments
  • Experience working in high-growth startups or fast-paced technology environments



Apply for Job
Apply for Job
SCHEMA MARKUP ( This text will only show on the editor. )