Afero provides a complete, end-to-end platform for the rapid development and deployment of secure, performant, cost-effective, and easy-to-use consumer IoT devices—from major appliances to high-volume commodity hardware to hobby projects and prototypes.
The beating heart of it all is the Afero Cloud, and we’re looking for a stellar Site Reliability Engineer to apply their passion for automation, measurement, and traceability as a key member of our Cloud Engineering team. The ideal candidate will play a central role in expanding our cloud services across multiple providers and regions, and will help to shape processes and procedures for the entire cloud operations lifecycle—from deployment to monitoring to triage to postmortem—continuously improving reliability and reducing toil.
- Work closely with our service engineers to maintain, improve, and scale our existing infrastructure.
- Architect and deploy new infrastructure in different regions as part of our global scaling efforts.
- Work with engineering and product teams to formulate SLIs and associated SLOs according to company and customer needs, and continuously improve monitoring in pursuit of these.
- Develop and improve automated procedures to increase platform resilience.
- Participate in an on-call rotation for our production services.
Our Ideal Candidates
- Enjoy deep dives into technology stacks across different technologies to understand end-to-end system workflows.
- Can adapt past experiences to new requirements and develop solutions accordingly.
- Are at home in a collaborative environment and take pride in their ability to communicate ideas clearly among diverse audiences.
- Appreciate the value of clean, testable code, demonstrating both high standards and flexibility.
- Have extensive working knowledge of cloud-based services, including caching, database architecture, API design, and queueing services; extensive experience with real-time data processing.
- Extensive experience (5+ years) in either systems engineering and administration, or software engineering, with a passion for and and exposure to both.
- Fluency in at least one scripting language (Python, Ruby), one systems programming language (Java, C/C++, Rust, etc.).
- Experience building and operating production cloud environments with Amazon Web Services.
- Working knowledge of and experience with Elastic Beanstalk and CloudFormation.
- Working knowledge of AWS DynamoDB and Google BigQuery, AWS Simple Queue Service (SQS), AWS S3 and Google Cloud Storage APIs.
- Thorough understanding of networking protocols and technologies such as TCP/IP, HTTP, DNS, IPSec, and VPN.
- Working knowledge of and experience with AWS VPC Network Design, routing, and regional deployment.
- Experience managing AWS IAM Roles and Permissions Management.
- Experience setting up, configuring, and maintaining monitoring tools and applications such as New Relic.
- Working knowledge of and experience with additional cloud infrastructures such as GCP and Azure.
- Working knowledge of and experience with Google Compute Engine or Kubernetes Container Management/Deployment.