Contact: surminus.com
GitHub: surminus
Mail: work@surminus.com
I’m an experienced operations engineer with a focus on automation and tooling. Most of my career has been spent working with Linux systems, and I cut my teeth working in customer facing support.
I enjoy writing Go and Ruby. I have a lot of experience with continuous integration (CI) and deployment systems, and have a strong interest in using automation to solve problems. I particularly enjoy making the lives of my colleagues better with intuitive tools.
Much of my work in the last few years has been in an AWS environment. I feel comfortable working within the huge Amazon ecosystem and using the AWS SDK.
October 2021-present: Site Reliability Engineer for Ably
I work in the Infrastructure team to help scale the global platform as Ably continues to grow rapidly.
Ably is a distributed system that runs thousands of servers across multiple regions in AWS. The Infrastructure team is responsible for maintaining and ensuring uptime of the system, as well as providing tooling and support for the engineering teams, and leading on improvements to legacy parts of the system.
July 2021-October 2021: Systems Development Engineer for Amazon Web Services
I briefly worked at Amazon, but chose to leave because it didn't feel like the right fit for me.
March 2018–June 2021: Site Reliability Engineer for FutureLearn
As the sole SRE at FutureLearn, I was responsible for improving the development workflow, the CI and deployment methods, and the core infrastructure of the site.
Achievements in this role:
- Gave regular technical lightning talks and interactive workshops aimed at ensuring that software engineers, technical leads, and technical architects are also able to support and improve the platform.
- Sped up the overall build time for the main application by decreasing test times from ~15m to ~5m, greatly reducing the development feedback loop.
- Built, configured, and migrated to a new CI system to replace the out-of-date system previously in use.
- Unified all infrastructure code into Terraform and replaced manual provisioning tasks with automated instance bootstrapping. This saved operational support cost and reduced the complexity and number of tools involved in making changes to the stack.
- Wrote a command-line tool in Go for developers to ease deployments and allow working with SSH, AWS, CI, Datadog, Docker, and more.
- Upgraded the servers from Ubuntu 14.04 to Ubuntu 18.04.
- Modernised the platform by migrating from EC2 based deployments to Amazon ECS, focusing on scalability and simplicity while balancing the need for lower cost of ownership by moving to higher value managed services.
- Scaled the platform to handle an unexpected spike in traffic during COVID-19
January 2017–March 2018: Senior Web Operations Engineer for the Government Digital Service
As technical lead for GOV.UK Infrastructure, I led a team of web operations engineers and software engineers, and worked alongside the delivery and product managers in moving GOV.UK toward a modern infrastructure. GOV.UK is made up of over 50 microservices, mostly using Ruby on Rails, and the platform includes MySQL, PostgreSQL, MongoDB, Elasticsearch, Jenkins (deployment and CI), Varnish, NGINX, and Redis. It is a critical national resource, and so it is essential that the general public are able to reliably access content published on the site.
Achievements in this role:
- Enabled software engineers to deploy code as quickly and safely as possible on the current platform, while maintaining and improving the infrastructure and increasing automation and self-healing. This allowed my team to concentrate on improvements rather than maintenance.
- Collaborated with other technical leads, delivery managers, and product managers to ensure we were targeting the right goals, and provided guidance and advice based on my experience.
- Worked with technical architects to provide the vision for the future of the platform, which included highly available, ephemeral, and dynamic infrastructure and a transition from self-hosted to cloud native services.
May 2014–January 2017: Web Operations Engineer for the Government Digital Service
This position involved maintaining and supporting the ongoing needs of the evolving GOV.UK infrastructure. As part of this, I provided out-of-hours and in-office second-line support.
Achievements in this role:
- Worked in a multi-disciplinary team, responding to both technical and business needs.
- Made substantial contributions and improvements to our Puppet code.
- Planned and contributed to a number of improvements to functions such as monitoring, logging, CI, deployment, backups, and disaster recovery.
June 2008–May 2014: Linux Systems Administrator for Pulsant (formerly DediPower Managed Hosting)
I joined DediPower as an intern on work experience, and was then hired full time as part of the systems administration team, working with customers, other members of the support team, and other parts of the business including the networking, sales, datacentre, and provisioning teams. Our services were primarily Linux-based, but we also supported a wide range of systems including Windows, VMWare, Cisco firewalls, and server hardware.
Achievements in this role:
- Worked my way up from the lowest to the highest level of support.
- Supported all customers with extensive on-call coverage, often responding to and fixing issues from customers we’d had very little experience with previously, along with being a point of contact for larger and more complex systems.
- Provided technical guidance and ensured efficient knowledge sharing with new hires during a period of rapid growth from a dozen to over 60 people.
- As an intern, helped to build one of our datacentres, including the initial power cabling, air conditioning implementation, networking, and full rack placement.