AI/ML Platform Operations
JPMorgan Chase (JPMC) is a leading global financial services firm with assets of $2 trillion and operations in more than 60 countries. It is on the transformation journey to be a client-centric technology driven company over the last few years. With an annual tech budget of $10B+, it has started significantly investing and building in the next generation core infrastructure, data and AI technology.
As the next push into this investment, JPMC Silicon Valley is hiring the best talents to join the newly-formed AI engineering team. We are executing like a startup and building the next generation technology that combines JPMC unique data and full service advantage to develop high impact AI applications and platforms in the financial services industry. We are looking for people who are excited about the opportunity.
The AI and ML Controls and operations team is seeking operational support engineers that combines leadership and technical knowledge to operate and manage AI and ML platform in private and public cloud. You'll work collaboratively with other functional teams on delivering reliable service. Operational support engineers are accountable for availability, release management, system maintenance and management, system hygiene, and customer support. Role will require creative and critical thinking skills to maintain application systems that are crucial to the reliable daily operations of AI and ML platform. The candidate must be able to work in a global team setting and adapt to dynamic requirements.
· Proactively manage the AI and ML platform to provide great customer experience by ensuring that service is reliable and compliant
· Provide level 2 support to users of the platform on triaging of the issues, use of product, and general inquiries
· Work with development teams throughout the software life cycle ensuring sustainable software testing and releases
· Develop and implement robust release process
· Build tools and capabilities using modern technologies to detect issues, create transparency, detect patterns, and self-healing
· Drive application troubleshooting, maintenance, identification, escalation, resolution, and root cause, and postmortem of issues.
· Liaison with application development and infrastructure teams targeting on-going development, production activity and release management
· Champion a DevOps model so that services are automated and elastic across all platforms
· Bachelor’s degree or equivalent experience in an software engineering discipline
· Hands-on working knowledge of managing applications and services in private and public cloud preferably AWS
· Working knowledge of building and/or supporting AI and ML workloads
· Mastery in the development of automated tools, systems, and services in multiple technology domains
· Advanced knowledge of one or more infrastructure components (e.g. networking, cloud services, orchestration tools, containerization, compute and storage systems)
· Deep understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation.
· Working knowledge in one or more general purpose programming languages, plus an interest in learning other coding languages and skills as needed
· Working knowledge of development toolset to design, develop, test, deploy, maintain and improve software
· Experience in a production support environment
· Experience with Splunk or other monitoring tools
· Good understanding of technology risk and controls
· Experience in engineering solutions for metrics gathering/publishing and event collection/correlation across distributed architectures, automation, monitoring, intelligent alerting, random fault injection (Chaos Engineering), and self-healing.
· Ability to work collaboratively in teams and develop meaningful relationships to achieve common goals
FindTheBestJob is a free service and does not charge a fee at any stage of application or recruitment process. Don’t provide your bank account or credit card details to anyone during job application. FindTheBestJob does not guarantee the availability of a job since organizations may end applications earlier than due date.