Principal Site Reliability Engineer

Employer: S&P Global
Location: Washington D.C., USA
Salary: Competitive
Closing date: Feb 6, 2023

Job Function: Other
Industry Sector: Finance - General
Employment Type: Full Time
Education: Bachelors

The Role: Principal SRE
Location: US Remote or On-site, EST/CST preferred
GL: 13 (for internal use only)

The Team: As part of the Cloud Engineering and Support team, this position supports the definition, development, and implementation of SRE principles within SPGI Market Intelligence. This includes heavy collaboration and a consulting-like mindset while engaging with operations, development, product management and business stakeholders. This role is focused on building and implementing a framework that improves the security, availability, scalability, cost-effectiveness and speed of deployment for both new and existing solutions. This role will be heavily involved in the migration of on-prem solutions to the cloud and the establishment of best practices around multi-cloud deployment and operations. The ideal candidate has a solid background in software engineering and operations with significant experience with SRE technologies (observability, CI/CD, automation) and cloud solutions (AWS, GCP preferred). The candidate should have extensive experience developing forward-thinking future strategies that benefit long term business initiatives.

Responsibilities:

You will be a part of an early-stage team that is developing, educating, implementing and refining SRE principles
Collaborate, discover and analyze the current estate of systems, applications, processes, tools, teams, and solutions. Identify strengthens, gaps, and areas of improvement in collaboration with Tech Ops, developer teams, and business stakeholders
Contribute to a set of patterns and blueprints for deploying cloud solutions in a secure, reliable, scalable, cost-effective and fast manner
Define and support the evolution of CI/CD capabilities as we progress to rapid, safe continuous deployment
Embed within TechOps to improve overall security, reliability, performance, scalability, and speed of deployment of platforms and solutions
Support TechOps in Incident Management, primarily through post-mortems, RCA, and post-incident improvements
Support the design, development and deployment of services through activities such as collaborating with developers and architects on system design, reuse of blueprints and frameworks, capacity planning, and readiness reviews
Work across divisional and corporate teams to align strategies, blueprints, and solutions
Collaborate with teams to define, monitor and measure SLIs, SLOs, and SLAs for services, infrastructure, and processes running in production
Support the definition, implementation and refinement of an observability strategy and framework
Work with teams to eliminate toil through automation of infrastructure provisioning, configuration management, deployment, testing, and operation
Work with security and developers to shift-left, embedding security design principles and capabilities early in the development process
Maintain an excellent understanding of the business's long-term goals and strategy ensuring that the design, architecture, scale and availability are aligned with these goals
Research and experiment with emerging technologies and tools related to performance, availability, observability, CI/CD, service design and consumption, micro-services, and other SRE-related technologies
Collaborate to promote and reinforce disciplined production software engineering processes and best-practices

What Were' Looking For:
Qualifications

Strong background with either Scala (Java), Go, or Python programming languages
Ability to conceptualize and articulate ideas clearly and concisely
Entrepreneurial experience where you helped lead the creation of a new product & organization
BA/BS or Masters in Computer Science, Math, Physics, or other technical fields
Experience with at least 10+ terabyte datasets, ideally up to multiple petabytes

Experience with:

Working within large organizations driving adoption of SRE principles across global, federated teams
Establishing and/or working for new teams within large organizations
Working with developers and operations to drive improvements within availability, scalability, security, performance, and deployment speed
Working in Agile/Kanban
Software engineering standard methodologies (unit testing, code reviews, design document, continuous delivery)
Traffic Management and networking concepts
Finding problems and writing code to fix them

Mastery of technologies and concepts, including:

Observability solutions like DataDog, New Relic, Prometheus, Grafana, ELK, OpenTelemetry
Containerization concepts and systems like Kubernetes
CI/CD solutions like GitLab, ArgoCD
IaC solutions like Terraform, AWS CDK, Pulumi
Stream-based data systems (Kafka, AWS Kinesis, GCP PusSub, etc.) particularly under varying load
Cloud platforms (AWS, GCP)

S&P Global states that the anticipated base salary range for this position is $107,100 to $212,918 . Base salary ranges may vary by geographic location.
In addition to base compensation, this role is eligible for an annual incentive plan.

This role is eligible to receive additional S&P Global benefits. For more information on the benefits we provide to our employees, visit S&P Benefits .

At S&P Global Market Intelligence, we know that not all information is important-some of it is vital. Accurate, deep and insightful. We integrate financial and industry data, research and news into tools that help track performance, generate alpha, identify investment ideas, understand competitive and industry dynamics, perform valuation and assess credit risk. Investment professionals, government agencies, corporations and universities globally can gain the intelligence essential to making business and financial decisions with conviction.

S&P Global Market Intelligence is a division of S&P Global (NYSE: SPGI), which provides essential intelligence for individuals, companies and governments to make decisions with confidence. For more information, visit www.spglobal.com/marketintelligence .

-----------------------------------------------------------

Equal Opportunity Employer
S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.

If you need an accommodation during the application process due to a disability, please send an email to: EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person.

US Candidates Only: The EEO is the Law Poster http://www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf describes discrimination protections under federal law.

-----------------------------------------------------------

IFTECH202.2 - Middle Professional Tier II (EEO Job Group)

Job ID: 279249
Posted On: 2023-01-30
Location: Virtual, Washington, United States

Principal Site Reliability Engineer

Sign in to create job alerts