What's Your Passion?
Check Out Our Openings Below
About The Position
Elastifile helps organizations accelerate business in the cloud era, by augmenting cloud capabilities with enterprise-grade, scale-out file storage. Enterprises can leverage cloud resources without refactoring applications, and for a fraction of typical in-cloud file storage costs. A Gartner Cool Vendor, Elastifile is available on AWS, GCP, on-premises, and soon Azure. Based out of Silicon Valley and Israel, we are backed by world-class VC firms LightSpeed Venture Partners and Battery Ventures, and strategic investors such as Cisco, Dell/EMC and Western Digital.
This role is a Customer Reliability Engineer and is a member of our growing CSOPS team. In this role, you will place customer satisfaction at the center of everything you do. You will act as a liaison between our Development Operations teams, as well as our support teams to assist in escalations. While this role is similar to a Site Reliability Engineer (SRE), you will provide more opportunities to tackle critical end user experience concerns, while improving reliability and serviceability of the product,
You will be a key member of this team whose mission is to ensure that Elastifile's customers are supported, key services have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.
We are approaching customer service in an entirely new fashion, and we are looking for talented, motivated, dedicated individuals to help us define what the future of customer support should be.
Duties & Responsibilities:
- Work directly with customers to provide knowledge, insight, and support for day-to-day operations and POCs, and new customer on-boarding in GCP, AWS, and Azure
- Act as an escalation point for customers
- Leverage team’s deep knowledge and experience to triage and escalate issues discovered through monitoring or directly reported by customer
- Track and communicate status of issues both externally to customer and internally to peers and stakeholders.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health
- Practice sustainable incident response and blameless postmortems
- Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement
- Assist in continuous improvement in automation, monitoring and testing
- Evolve & Manage alert/monitoring infrastructure including responding to alerts
In general, you’re likely the right candidate if:
- You are customer focused
- You know Linux
- Familiar with rest APIs, JSON, python, standard DevOps tools including automation (terraform, cloudformation, jenkins)
- You are driven to eliminate work through automation
- You have experience in a service or SaaS environment
- You know how to diagnose and troubleshoot complex problems in cloud environments
- BS degree in Computer Science or related technical field or equivalent experience.
- Experience in one or more of the following: Python, Go, or Ruby.
- Must be able to travel domestically and internationally.
- Interest in analyzing and troubleshooting large-scale distributed systems.
- Experience with containers & orchestration including Docker, Kubernetes/GKE
- Experience supporting cloud platforms including GCP, AWS, and Azure.
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Ability to debug and optimize code and automate routine tasks.
- Understanding of enterprise IT application especially as it relates to primary storage infrastructure & specifically file services
- Experience with storage and file systems
- Experience working in a globally distributed team
- Experience with source repos including Git
- Experience with monitoring platforms including Datadog, Stackdriver, Prometheus, and Grafana.
No openings found. Please widen your search using the location & department filters.