Job opening

Site Reliability Engineer - Nashua, NH USA

Episerver Engineering Operations is a rapidly growing part within the organization. We are in the process of building our teams, tools and systems as part of our mission to build the leading digital experience platform.

We enable Episerver to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values (Dependable, Collaborative and Simple) with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilising a variety of data collection, enrichment, analytics and visualisations to learn about our complex systems.

We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.

As an SRE in one of our teams, you will work to enhance availability, performance and stability of Episerver services as well as automating away repetitive work.

You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation.

The Role

  • Serve as level 3 support resource for responsible systems
  • Troubleshoot and resolve end-user issues independently and efficiently
  • Build knowledge base around common production support issues.
  • Troubleshoot and fix the system when it breaks
  • Reduce the impact of errors.
  • Troubleshoot and fix the system when it breaks.
  • Automate repetitive tasks.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Automate incident handling to build “self-healing” systems.
  • Author and maintains documentation for related processes, procedures and system events
  • Identify areas of improvement within our systems and perform enhancements.
  • Share the responsibility of being on-call

Essential Requirements

  • Expert level troubleshooting skills across different levels of the stack.
  • Scripting and software development across one or more programming languages -  (Powershell). 
  • Deep understanding of Windows and .NET based systems.
  • Hands on experience with cloud infrastructure such as Azure or AWS minimum of 2 years
  • Deep expertise in monitoring distributed systems application architectures.
  • Exposure to and maintenance of configuration management and orchestration tools at scale (Azure Automation, Salt, Puppet, Chef etc.)
  • Diagnosing and troubleshooting user facing service outages.
  • Exposure to system and application level telemetry for large distributed cloud architectures.
  • Diagnosing and resolving problems in high-throughput web applications and network services.

We'd be super excited if you have experience with:

  • SQL Server in large environments.
  • Building, automating, and maintaining infrastructure in Azure
  • Experience monitoring cloud services with Application Insight, Log Analytics and / or New Relic.
  • Understanding of ITIL terminology for incident and problem management
  • Experience leading teams of engineers in service outage situations
  • Experience with container management and micro-services architectures such as Service Fabric or Docker.

Desirable Characteristics

The best person for this role is someone that has a collaborative spirit, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can ask questions, learn from others and turn chaos into order.

This role would be a great fit for someone with creative and innovative problem solving skills with a willingness to take responsibility all the way to production. You will help architect, configure and implement solutions that operate at scale - seeing your own technology efforts directly improve the reliability of our products. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers. You will own tasks from planning to delivery to realise this goal and collaborate with different team.

Apply Here

  1. Please send your resume to: USCareers@Episerver.com
  2. Please include Req#: ENGOPS-18-102

About Episerver

Episerver connects digital commerce and digital marketing to help organizations create unique digital experiences for their customers, with measurable business results. The Episerver Digital Experience Cloud™ combines content, commerce, multi-channel marketing, and predictive analytics in a single platform to work full-circle for businesses online – from intelligent real-time personalization and lead-generation through to conversion and repeat business – with unprecedented ease-of-use.

Sitting at the center of the digital experience ecosystem, Episerver empowers digital leaders to embrace disruptive, transformational strategies to deliver standout experiences for their customers – everywhere they engage.

Founded in 1994, Episerver has offices in Australia, Denmark, Finland, The Netherlands, Norway, Singapore, South Africa, Spain, Sweden, UAE, UK and the USA.

If you are interested in a pivotal role within a company that is charting new territory in a market undergoing phenomenal growth, then Episerver is the place for you!

You will have the opportunity to work with some fantastic brands; industry thought leaders; and those shaping the digital experience landscape, while enjoying a flexible, collaborative and a stimulating work environment that will keep you engaged.