A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer considers responsible AI throughout the ML development process, and collaborates closely with other job roles to ensure long-term success of models. The ML Engineer should be proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation. The ML Engineer needs familiarity with foundational concepts of application development, infrastructure management, data engineering, and data governance. Through an understanding of training, retraining, deploying, scheduling, monitoring, and improving models, the ML Engineer designs and creates scalable solutions for optimal performance.
Google Cloud

The exam is 2 hours long and costs $200.

Exam Content & Outline – What Will You Be Tested On?

There are FIVE main crucial capabilities that the exam will test you on:

Framing ML problems
Architecting ML solutions
Designing data preparation and processing systems
Developing ML models
Automating and orchestrating ML pipelines
Monitoring, optimizing, and maintaining ML solutions

Let’s look at each of these in more detail and find out what exactly to study in order to be certified as a Professional Machine Learning Engineer.

Framing ML Problems

Translating business challenges into ML use cases
- Choosing the best solution (ML vs. non-ML, custom vs. pre-packaged [e.g. AutoML, Vision API]) based on business requirements
- Defining how the model output should be used to solve the business problem
- Deciding how incorrect results should be handled
- Identifying data sources (available vs. ideal)
Defining ML problems
- Problem type (e.g. classification, regression, clustering)
- Outcome of model predictions
- Input (features) and predicted output format
Defining business success criteria
- Alignment of ML success metrics to the business problem
- Key results
- Determining when a model is deemed unsuccessful
Identifying risks to feasibility of ML solutions
- Assessing and communicating business impact
- Assessing ML solution readiness
- Assessing data readiness and potential limitations
- Aligning with Google’s Responsible AI practices (e.g., different biases)

Architecting ML Solutions

Designing reliable, scalable, and highly available ML solutions
- Choosing appropriate ML services for the use case (e.g., Cloud Build Kubeflow)
- Component types (e.g., data collection, data management)
- Exploration/analysis
- Feature engineering
- Logging/management
- Automation
- Orchestration
- Monitoring
- Serving
Choosing appropriate Google Cloud hardware components
- Evaluation of compute and accelerator options (e.g. CPU, GPU, TPU, edge devices)
Designing architecture that compiles with security concerns across sectors/industries
- Building secure ML systems (e.g., protecting against unintentional exploitation of data/model, hacking)
- Privacy implications of data usage and/or collection (e.g. handling sensitive data like PII and PHI)

Designing Data Preparation and Processing Systems

Exploring data (EDA)
- Visualization
- Statistical fundamentals at scale
- Evaluation of data quality and feasibility
- Establishing data contraints (e.g. TFDV)
Building data pipelines
- Organizing and optimizing training datasets
- Data validation
- Handling missing data
- Handling outliers
- Data leakage
Creating input features (feature engineering)
- Ensuring consistent data pre-processing between training and serving
- Encoding structured data typs
- Feature selection
- Class imbalance
- Feature crosses
- Transformations (TensorFlow Transform)

Developing ML Models

Building models
- Choice of framework and model
- Modeling techniques given interpretability requirements
- Transfer learning
- Data augmentation
- Semi-supervised learning
- Model generalization and strategies to handle overfitting and underfitting
Training models
- Ingestion of various file types into training (e.g. CSV, JSON, IMG, parquet or databases, Hadoop/Spark)
- Training a model as a job in different environments
- Hyperparameter tuning
- Tracking metrics during training
- Retraining/redeployment evaluation
Testing models
- Unit tests for model training and serving
- Model performance against baselines, simpler models, and across the time dimension
- Model explainability on Vertex AI
Scaling model training and serving
- Distributed training
- Scaling prediction service (E.g. Vertex AI Prediction, containerized serving)

Automating and Orchestrating ML Pipelines

Designing and implementing training pipelines
- Identification of components, parameters, triggers, and compute needs (e.g. Cloud Build, Cloud Run)
- Orchestration framework (e.g. Kubeflow Pipelines/Vertex AI Pipelines, Cloud Composer/Apache Airflow)
- Hybrid or multi-cloud strategies
- System design with TFX components/Kubeflow DSL
Implementing serving pipelines
- Serving (online, batch, caching)
- Google Cloud serving options
- Testing for target performance
- Configuring trigger and pipeline schedules
Tracking and auditing metadata
- Organizing and tracking experiments and pipeline runs
- Hooking into model and dataset versioning
- Model/dataset lineage

Monitoring, Optimizing, and Maintaining ML Solutions

Monitoring and troubleshooting ML solutions
- Performance and business quality of ML model predictions
- Logging strategies
- Establishing continuous evaluation metrics (e.g. evaluation of drift or bias)
- Understanding Google Cloud permissions model
- Identification of appropriate retraining policy
- Common training and serving errors (TensorFlow)
- ML model failure and resulting biases
Tuning performance of ML solutions for training and serving in production
- Optimization and simplification of input pipeline for training
- Simplification techniques

Recommended Study Materials

Books

Exam Content & Outline – What Will You Be Tested On?

Framing ML Problems

Architecting ML Solutions

Designing Data Preparation and Processing Systems

Developing ML Models

Automating and Orchestrating ML Pipelines

Monitoring, Optimizing, and Maintaining ML Solutions

Recommended Study Materials

You may have missed

The Business Value of Using Apigee API Management

Create New Business Opportunities by Exposing and Monetizing Public-Facing APIs

Understanding Application Programming Interfaces (APIs)

The Business Value of Deploying Containers with Google Cloud Products: Google Kubernetes Engine (GKE) and Cloud Run

The Main Benefits of Containers and Microservices for Application Modernization

Everywhere You Look: The Omnipresent Cloud