Senior Performance and Capacity Engineer

Remote
Full Time
Experienced

About StackPath

StackPath is cloud platform built at the internet’s edge, providing infrastructure and services physically closer to the source or destination of data than hyperscale cloud service providers. StackPath edge compute—including Virtual Machines and Containers—and edge applications—including CDN and WAF—are strategically located in the world’s most densely populated areas, and united by a secure private network backbone and a single management system. Customers ranging from Fortune 50 enterprises to one-person startups trust StackPath to give their latency-sensitive workloads and applications the speed, security, and efficiency they require. For more information, visit stackpath.com and follow StackPath at www.fb.com/stackpathllc and www.twitter.com/stackpath.

About the Role

As a Senior Performance and Capacity Engineer you will work closely with the Site Reliability Engineering team to provide accurate and insightful capacity projections for the senior management team at StackPath. This role is critical for maintaining server and network resources needed to serve customers across the Edge Delivery and Edge Compute platforms. You will lead the effort to deliver accurate and timely capacity checks for new and growing customer deployments. You will also engage in performance troubleshooting to identify and remove live bottlenecks in the delivery environment.

This role will report to: VP Site Reliability Engineering

Essential Duties and Responsibilities

  • Handle complex enterprise issues, which often cross system, network, and software boundaries.
  • Design, develop and maintain internal service metrics (SLA, SLO, SLI) in cross-team collaborations.
  • Design, develop and maintain dashboards, tooling, alarms, and playbooks in collaboration with operations teams to support service-level objectives.
  • Design, develop and maintain reusable monitoring and canary infrastructure.
  • Design, execute and evaluate performance experiments.
  • Collaborate with operations and engineering teams in determining root cause of major incidents, performance anomalies, or other customer-impacting issues.
  • Discover and analyze system performance related bottlenecks.
  • Discover and analyze anomalies and system issues, with the goal of figuring out root causes and mitigating them.
  • Writing ETLs to extract performance related KPIs and presenting the said KPIs in a systematic manner.
  • Capacity planning using regressive machine learning models, and other statistical methods when applicable.
  • Automating everyday repeatable items.
  • Modeling Traffic Growth and making server purchasing recommendations.
  • Develop enterprise client traffic flow modeling, distribution, and capacity checks.
  • Direct and participate in automation of performance and capacity checks and need for capacity augmentation.

Desired Skills and Experience

    • High level knowledge of Linux and operating systems.
    • High level of WAN networking knowledge.
    • Scripting languages (Bash, Python, PHP, Perl).
    • Experience with Prometheus.
    • Experience with Grafana, Docker, GCP, Telegraf, and Tableau.
    • DB knowledge (MySQL, PostgreSQL, TimeScaleDB and others)
    • High level understanding of Statistics.
    • High level understanding of Machine Learning.
    • Experience with traffic analyzing tools (Catchpoint, Kentik, Cedexis...)
    • Experience with CI/CD/CM tools (Jenkins, Ansible, Puppet, Chef...)
    • Experience with Virtualization ( KVM, QEMU...)

This job description is not intended to be all-inclusive.

StackPath is an Equal Opportunity Employer. EOE/AA M/F/D/V

 

If your experience and qualifications match our current needs, a member of our human resources team will contact you. We look forward to hearing from you.

StackPath collects and processes personal data submitted by job applicants in accordance with our Privacy Policy

Share

Apply for this position

Required*
Apply with
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

To comply with government Equal Employment Opportunity and/or Affirmative Action reporting regulations, we are requesting (but NOT requiring) that you enter this personal data. This information will not be used in connection with any employment decisions, and will be used solely as permitted by state and federal law. Your voluntary cooperation would be appreciated. Learn more.