AI/ML Engineer

Posted: Sunday, 31 August 2025
Valid Thru: Tuesday, 30 September 2025
Index Requested on: 08/31/2025 07:35:13
Indexed on: 08/31/2025 07:35:13

Location: Minneapolis, MN, 55401, US

Industry: Advertising and Public Relations
Occupational Category: 13-0000.00 - Business and Financial Operations
Type of Employment: FULL_TIME

Be The Match is hiring!

Description:

Job Description

The AI/ML Engineer will play a key role within the AI Center of Excellence (CoE), focusing on building, scaling, and maintaining robust ML and GenAI operational infrastructure. This position is responsible for developing and automating end-to-end machine learning pipelines, deploying models into production, and ensuring their performance and stability over time. The ideal candidate is a hands-on engineer with strong experience in MLOps, LLMOps, and cloud-native tools, and a passion for reliable, scalable, and efficient AI systems.

ACCOUNTABILITIES: (The primary functions, scope and responsibilities of the role)

Engineering and Operations:
  • Develop, deploy, and maintain production-grade ML/GenAI pipelines using AWS cloud-native and open-source MLOps tools.
  • Automate model training, evaluation, testing, deployment, and monitoring workflows.
  • Implement LLMOps practices for prompt versioning, model tracking, and continuous evaluation of GenAI systems.
  • Integrate ML systems with CI/CD pipelines and infrastructure-as-code tools.
  • Support model inference at scale via APIs, containers, and microservices.
  • Work closely with data engineering to ensure high-quality, real-time, and batch data availability for ML workflows.
  • Ensure high availability, reliability, and performance of AI services in production environments.
  • Maintain robust monitoring and observability on AWS, Snowflake, SalesForce and Oracle ecosystems.
  • Implement feature stores and data versioning systems to ensure reproducible ML experiments and deployments.
  • Deploy and optimize vector databases and embedding models for semantic search and RAG applications.
  • Configure GPU enabled cloud infrastructure and implement monitoring solutions to optimize resource utilization, costs and performance for ML training and inference workloads.
  • Establish automated model validation, testing, and rollback procedures for safe production deployments.


Tooling and Infrastructure:
  • Build and manage model registries, feature stores, and metadata tracking systems.
  • Leverage containerization (e.g., Docker) and orchestration (e.g, Kubernetes, Airflow, Kubeflow) for scalable deployment. Implement role-based access control, auditing, and governance for ML infrastructure.
  • Manage cost-effective cloud infrastructure using AWS.
  • Build and maintain data quality monitoring systems with automated alerting for data drift and anomalies.
  • Implement cost optimization strategies including auto-scaling, spot instances, and resource right-sizing for ML workloads.


Collaboration and Support:
  • Partner with data engineers, data scientists, ML engineers, architects, software engineers, infrastructure and security teams to support scalable and efficient AI/ML workflows.
  • Contribute to incident response, performance tuning, and continuous improvement of ML pipelines.
  • Provide guidance and documentation to promote reproducibility and best practices across teams.
  • Work as part of an agile development team and participate in planning and code reviews.

REQUIRED QUALIFICATIONS: (Minimum qualifications needed for this position including education, experience, certification, knowledge and/or physical requirements)

Knowledge of:
  • Cloud-native AI/ML development with AWS.
  • MLOps/LLMOps frameworks and lifecycle tools on AWS.
  • Monitoring and observability platforms on AWS.
  • ML model deployment strategies (e.g., batch, real-time, streaming).
  • Feature stores and data versioning tools on AWS and Snowflake
  • Model serving frameworks like AWS Sagemaker and AWS Bedrock for scalable inference deployment.
  • Vector databases and embedding deployment (e.g., Pinecone, Weaviate, FAISS, pgvector) for LLM and RAG applications.
  • LLMOps-specific tools including prompt management platforms and LLM serving optimization on AWS.
  • Docker registries and artifact management.


Required Skills and Abilities:
  • Strong Python programming and scripting skills.
  • Hands-on experience deploying and managing ML/GenAI models in production.
  • Experience with Docker, Kubernetes, and workflow orchestration tools like Airflow or Kubeflow.
  • Proficiency in infrastructure-as-code tools (e.g., Terraform, CloudFormation).
  • Ability to debug, troubleshoot, and optimize AI/ML pipelines and systems.
  • Comfortable working in agile teams and collaborating cross-functionally.
  • Proven ability to automate processes and build reusable ML operational frameworks.
  • Experience with A/B testing frameworks and canary deployments for ML models in production environments.
  • Knowledge of GPU resource management and optimization for training and inference workloads.
  • Understanding data pipeline quality monitoring, drift detection, and automated retraining triggers.
  • Experience with secrets management, role-based access control, and secure credential handling for ML systems.


Education and/or Experience:
  • Bachelor's degree in computer science, Engineering, or a related field (master's preferred).
  • 2-3 years of experience in ML engineering, DevOps, or MLOps roles.
  • Demonstrated experience managing production AI/ML workloads and systems.

PREFERRED QUALIFICATIONS: (Additional qualifications that may make a person even more effective in the role, but are not required for consideration)
  • Experience with LLMOps and GenAI pipeline monitoring.
  • Cloud certifications in AWS, Azure, or GCP.
  • Experience supporting AI applications in regulated industries (e.g., healthcare, finance).
  • Contributions to open-source MLOps tools or infrastructure projects.
  • Experience with edge deployment and model optimization techniques (quantization, pruning, distillation).
  • Knowledge of compliance frameworks (SOC2, GDPR, HIPAA) and security best practices for AI/ML systems.
  • Experience with real-time streaming data pipelines (Kafka, Kinesis) and event-driven ML Architectures.
Responsibilities

The AI/ML Engineer will play a key role within the AI Center of Excellence (CoE), focusing on building, scaling, and maintaining robust ML and GenAI operational infrastructure. This position is responsible for developing and automating end-to-end machine learning pipelines, deploying models into production, and ensuring their performance and stability over time. The ideal candidate is a hands-on engineer with strong experience in MLOps, LLMOps, and cloud-native tools, and a passion for reliable, scalable, and efficient AI systems.

ACCOUNTABILITIES: (The primary functions, scope and responsibilities of the role)

Engineering and Operations:
  • Develop, deploy, and maintain production-grade ML/GenAI pipelines using AWS cloud-native and open-source MLOps tools.
  • Automate model training, evaluation, testing, deployment, and monitoring workflows.
  • Implement LLMOps practices for prompt versioning, model tracking, and continuous evaluation of GenAI systems.
  • Integrate ML systems with CI/CD pipelines and infrastructure-as-code tools.
  • Support model inference at scale via APIs, containers, and microservices.
  • Work closely with data engineering to ensure high-quality, real-time, and batch data availability for ML workflows.
  • Ensure high availability, reliability, and performance of AI services in production environments.
  • Maintain robust monitoring and observability on AWS, Snowflake, SalesForce and Oracle ecosystems.
  • Implement feature stores and data versioning systems to ensure reproducible ML experiments and deployments.
  • Deploy and optimize vector databases and embedding models for semantic search and RAG applications.
  • Configure GPU enabled cloud infrastructure and implement monitoring solutions to optimize resource utilization, costs and performance for ML training and inference workloads.
  • Establish automated model validation, testing, and rollback procedures for safe production deployments.


Tooling and Infrastructure:
  • Build and manage model registries, feature stores, and metadata tracking systems.
  • Leverage containerization (e.g., Docker) and orchestration (e.g, Kubernetes, Airflow, Kubeflow) for scalable deployment. Implement role-based access control, auditing, and governance for ML infrastructure.
  • Manage cost-effective cloud infrastructure using AWS.
  • Build and maintain data quality monitoring systems with automated alerting for data drift and anomalies.
  • Implement cost optimization strategies including auto-scaling, spot instances, and resource right-sizing for ML workloads.


Collaboration and Support:
  • Partner with data engineers, data scientists, ML engineers, architects, software engineers, infrastructure and security teams to support scalable and efficient AI/ML workflows.
  • Contribute to incident response, performance tuning, and continuous improvement of ML pipelines.
  • Provide guidance and documentation to promote reproducibility and best practices across teams.
  • Work as part of an agile development team and participate in planning and code reviews.

REQUIRED QUALIFICATIONS: (Minimum qualifications needed for this position including education, experience, certification, knowledge and/or physical requirements)

Knowledge of:
  • Cloud-native AI/ML development with AWS.
  • MLOps/LLMOps frameworks and lifecycle tools on AWS.
  • Monitoring and observability platforms on AWS.
  • ML model deployment strategies (e.g., batch, real-time, streaming).
  • Feature stores and data versioning tools on AWS and Snowflake
  • Model serving frameworks like AWS Sagemaker and AWS Bedrock for scalable inference deployment.
  • Vector databases and embedding deployment (e.g., Pinecone, Weaviate, FAISS, pgvector) for LLM and RAG applications.
  • LLMOps-specific tools including prompt management platforms and LLM serving optimization on AWS.
  • Docker registries and artifact management.


Required Skills and Abilities:
  • Strong Python programming and scripting skills.
  • Hands-on experience deploying and managing ML/GenAI models in production.
  • Experience with Docker, Kubernetes, and workflow orchestration tools like Airflow or Kubeflow.
  • Proficiency in infrastructure-as-code tools (e.g., Terraform, CloudFormation).
  • Ability to debug, troubleshoot, and optimize AI/ML pipelines and systems.
  • Comfortable working in agile teams and collaborating cross-functionally.
  • Proven ability to automate processes and build reusable ML operational frameworks.
  • Experience with A/B testing frameworks and canary deployments for ML models in production environments.
  • Knowledge of GPU resource management and optimization for training and inference workloads.
  • Understanding data pipeline quality monitoring, drift detection, and automated retraining triggers.
  • Experience with secrets management, role-based access control, and secure credential handling for ML systems.


Education and/or Experience:
  • Bachelor's degree in computer science, Engineering, or a related field (master's preferred).
  • 2-3 years of experience in ML engineering, DevOps, or MLOps roles.
  • Demonstrated experience managing production AI/ML workloads and systems.

PREFERRED QUALIFICATIONS: (Additional qualifications that may make a person even more effective in the role, but are not required for consideration)
  • Experience with LLMOps and GenAI pipeline monitoring.
  • Cloud certifications in AWS, Azure, or GCP.
  • Experience supporting AI applications in regulated industries (e.g., healthcare, finance).
  • Contributions to open-source MLOps tools or infrastructure projects.
  • Experience with edge deployment and model optimization techniques (quantization, pruning, distillation).
  • Knowledge of compliance frameworks (SOC2, GDPR, HIPAA) and security best practices for AI/ML systems.
  • Experience with real-time streaming data pipelines (Kafka, Kinesis) and event-driven ML Architectures.

Responsibilities:

Please review the job description.

Educational requirements:

  • high school

Desired Skills:

Please see the job description for required or recommended skills.

Benefits:

Please see the job description for benefits.

Apply Now