AWS Certified AI Practitioner · Domain 1 · ~20%

Fundamentals of AI & ML

Concepts, data, learning types, training vs inference, lifecycle, MLOps, and metrics—aligned to the official AIF-C01 exam guide. Each topic slide includes a Definitions section; use the master glossary slide for cram review.

Use ← → or click Next · Open this file in any browser (online for diagrams)

1.1 · Concepts

AI ⊃ ML ⊃ Deep Learning

AI = goal: intelligent behavior. ML = learn patterns from data. DL = ML using deep neural networks (often best for text, images, audio at scale).

flowchart TB
  subgraph AI["Artificial Intelligence AI"]
    subgraph ML["Machine Learning ML"]
      subgraph DL["Deep Learning DL"]
        N[Deep neural networks]
      end
      C["Classical ML: e.g. logistic regression, random forests, gradient boosting"]
    end
    R["Other AI: rules, search, expert systems — less common in modern product AI"]
  end
        
Exam tip: “Improves from examples / historical data” → ML. “Many layers, unstructured data” → often DL.

Definitions

Artificial intelligence (AI)
The broad field of making systems perform tasks that need human-like perception, reasoning, language, planning, or decision-making—using rules, search, optimization, machine learning, or other methods.
Machine learning (ML)
A branch of AI where behavior improves from data: algorithms adjust internal parameters so predictions or actions get better on examples rather than relying only on hand-written rules.
Deep learning (DL)
Machine learning that uses deep neural networks (many layers) to learn representations—especially strong for unstructured inputs like text, images, and audio.
Neural network
A model made of interconnected units (neurons) organized in layers; each connection has a weight the training process updates.
Classical machine learning
Traditional ML models that are often shallower or non-neural—e.g. logistic regression, decision trees, random forests, gradient boosting—for tabular and structured problems.
1.1 · Data

Structured vs unstructured · Labeled vs unlabeled

Data shapes

  • Structured: tables, time series
  • Unstructured: text, images, audio

Labels

  • Labeled: input + known target (spam/not spam)
  • Unlabeled: find structure (clusters) or use self-supervised in DL
flowchart LR
  subgraph structured["Structured"]
    T["Tabular rows and columns"]
    TS[Time series]
  end
  subgraph unstructured["Unstructured"]
    TXT[Text]
    IMG[Images]
    AUD[Audio]
  end
  L["Labeled: input + known answer"]
  U["Unlabeled: input only"]
  structured --> L
  structured --> U
  unstructured --> L
  unstructured --> U
        

Definitions

Structured data
Information organized in clear fields—rows/columns in tables, or time-ordered measurements—easy to query and aggregate (e.g. CRM records, sales per day).
Unstructured data
Raw content without a fixed schema: free text, images, audio, video—requires encoding or models to interpret.
Tabular data
Examples as rows and features as columns (spreadsheet-style); common input for classical ML.
Time series
Values indexed by time (metrics, sensors, stock prices); order in time usually matters for modeling.
Labeled data
Each training example includes the correct answer (target/label) the model should learn to predict.
Unlabeled data
Inputs without supplied targets—used for finding structure (clustering) or for methods that invent supervision from the data (e.g. self-supervised pre-training in deep learning).
1.1 · Learning types

Supervised learning

Inputs + correct outputs (labels). Learn mapping X → ŷ. Regression predicts a number. Classification predicts a category.

flowchart LR
  X["Features / input X"] --> M[Model]
  M --> Yhat[Prediction ŷ]
  Ytrue[True label y] -.->|"compare error"| M
        

Definitions

Supervised learning
Training with input–output pairs; the model learns to map inputs to known labels.
Features
The measurable inputs (columns, signals) the model uses to make a prediction—also called predictors or independent variables.
Label (target)
The value the model is trained to predict (class, number, or ranking).
Model
A learned function (with parameters/weights) that turns inputs into outputs.
Regression
Supervised task: predict a continuous number (price, demand, risk score).
Classification
Supervised task: predict a category (spam/ham, defect type, churn yes/no).
Prediction (ŷ)
The model’s output for a given input; compared to the true label during training to compute error.
1.1 · Learning types

Unsupervised · Reinforcement learning

Unsupervised

Mostly no labels. Discover patterns: clustering, segments, anomalies.

flowchart LR
  D[Unlabeled data] --> U[Algorithm]
  U --> P["Patterns: clusters / structure"]
            

Reinforcement learning

Agent acts → environmentreward. Learn a policy (what action to take).

flowchart LR
  A[Agent chooses action] --> E[Environment]
  E --> R[Reward signal]
  R --> A
            

Definitions

Unsupervised learning
Learning without provided labels; goal is often grouping, compression, density estimation, or anomaly detection.
Clustering
Partitioning examples into groups so similar items are together (e.g. customer segments) without predefined segment names.
Reinforcement learning (RL)
An agent takes actions in an environment, receives rewards, and learns a policy to maximize long-term reward.
Agent
The decision-maker that selects actions based on observations (state).
Environment
Everything outside the agent that responds to actions and produces the next state and reward.
Reward
A scalar feedback signal telling the agent how good the last action was.
Policy
The strategy mapping states (or observations) to actions—what the RL algorithm optimizes.
1.1 · Inference

Training vs inference · Batch vs real-time

Training adjusts the model from data. Inference runs the trained model on new inputs (what users and apps usually hit).

flowchart TB
  subgraph train["Training phase"]
    D1[Training data] --> T["Train / fit model"]
    T --> W["Learned parameters weights"]
  end
  subgraph infer["Inference phase"]
    D2[New data] --> M["Model uses weights"]
    M --> O["Predictions / outputs"]
  end
  train --> infer
        
Batch inference: many records on a schedule; throughput matters.
Real-time: API-style, low latency per request.

Definitions

Training
The phase where the model’s parameters (weights) are updated using data and an optimization process so error decreases on training examples.
Inference
Using a trained model on new data to produce outputs (predictions, scores, classes)—usually without updating weights.
Parameters / weights
Internal numbers learned during training that define the model’s behavior.
Batch inference
Scoring many records in bulk (e.g. nightly job); optimizing for throughput and total runtime rather than single-request latency.
Real-time (online) inference
Serving predictions per request (API); optimizing for low latency per interaction.
1.2 · Use cases

When ML fits · When it doesn’t

ML helps when data has signal, outcome is measurable, and scale beats hand-written rules. Avoid ML when a simple rule suffices, data or labels are insufficient, or cost outweighs benefit.

flowchart TB
  Q{"Does a simple rule solve it cheaply?"}
  Q -->|Yes| R1["Start with rules / deterministic logic"]
  Q -->|No| Q2{"Do you have enough quality data and a measurable target?"}
  Q2 -->|No| R2["Fix data / process first"]
  Q2 -->|Yes| R3["Candidate for ML"]
        

Definitions

Signal (in data)
Real, repeatable patterns that relate inputs to outcomes—what makes learning possible; noise is randomness that does not generalize.
Use case
A specific business problem + success criteria where AI/ML might help (e.g. “rank support tickets by urgency”).
Deterministic rule
Fixed logic that always gives the same output for the same input (if/then, thresholds with no learned parameters)—often the first baseline before ML.
Cost–benefit analysis
Comparing data, build, maintenance, risk, and governance costs against expected business value before committing to ML.
Ground truth
The best available “correct” labels or outcomes used for training or evaluation—sometimes imperfect or delayed in real systems.
1.2 · AWS · Conceptual

Managed AI / ML services (examples)

The exam expects recognition of SageMaker as the broad ML platform, plus APIs for specific tasks—you choose by modality and problem, not memorizing every edge case.

Definitions

Managed AI / ML service
A cloud API or console workflow where AWS runs and scales the model infrastructure; you integrate via API instead of operating raw servers for that piece.
Modality
The type of input/output medium: text, speech, image, document, etc.—service choice depends on modality and task.
Amazon SageMaker
AWS’s broad platform for building, training, tuning, deploying, and monitoring custom ML—notebooks, pipelines, endpoints, and governance hooks.
Amazon Comprehend
Managed NLP for analyzing text (entities, sentiment, topics, classification, etc.).
Amazon Lex
Build conversational interfaces (chatbots/voice) using intents and slots.
Amazon Transcribe
Speech-to-text transcription.
Amazon Translate
Machine translation between languages.
Amazon Polly
Text-to-speech (synthetic voices).
Amazon Rekognition
Image and video analysis (labels, faces, moderation, etc.).
Amazon Textract
Extract text, forms, and tables from documents and scans.
1.3 · Lifecycle

ML pipeline & model sources

flowchart LR
  C[Collect data] --> E[EDA]
  E --> P[Preprocess]
  P --> FE[Feature engineering]
  FE --> TR[Train]
  TR --> TU[Tune]
  TU --> EV[Evaluate]
  EV --> DP[Deploy]
  DP --> MO[Monitor]
  MO --> TR
        

Definitions

ML pipeline (lifecycle)
The end-to-end sequence from data through deployment: collect → explore → clean → featurize → train → tune → evaluate → deploy → monitor (often looping).
Data collection
Gathering raw data from databases, APIs, logs, files, or streams for ML use.
EDA (exploratory data analysis)
Inspecting distributions, missing values, outliers, and relationships to guide cleaning and modeling choices.
Preprocessing
Cleaning, scaling, encoding, and transforming raw inputs into a usable form for training.
Feature engineering
Creating or selecting inputs (features) that improve model performance—domain-informed signals beyond raw fields.
Hyperparameter tuning
Searching settings not learned from data (e.g. learning rate, tree depth) to improve validation performance.
Model evaluation
Measuring quality on held-out data with appropriate metrics before production release.
Deployment
Putting a trained model into production (endpoint, batch job, or embedded) so consumers can get predictions.
Monitoring (ML)
Watching data drift, concept drift, latency, errors, and business KPIs to know when to retrain or roll back.
Pre-trained model
A model already trained (by you or a provider) that you fine-tune or reuse instead of training from scratch.
Endpoint (model serving)
A network address (HTTP API) that hosts a model for real-time inference requests.
Amazon SageMaker Data Wrangler
Visual data preparation in SageMaker to clean and transform data for ML.
Amazon SageMaker Feature Store
A centralized store for ML features (training and serving) to keep definitions consistent.
Amazon SageMaker Model Monitor
Monitors deployed models for quality and drift compared to a baseline.
1.3 · MLOps

MLOps: repeatable, production-ready, monitored

Experiments → reproducible pipelines → registry/releases → monitoring → retrain or rollback. Goal: models don’t silently degrade.

flowchart TB
  subgraph dev["Experimentation"]
    NB["Notebooks / trials"] --> EXP[Track experiments]
  end
  subgraph prod["Production readiness"]
    PI["Reproducible pipelines"] --> REG[Model registry]
    REG --> REL["Controlled releases"]
  end
  subgraph run["Operate"]
    MON["Monitor drift / quality"] --> RET["Retrain / rollback"]
  end
  dev --> prod --> run
        

Definitions

MLOps
Practices to deliver and run ML reliably in production: versioned data and models, automated pipelines, testing, releases, monitoring, and governance—similar spirit to DevOps for software.
Experiment tracking
Recording hyperparameters, datasets, code versions, and metrics so runs are comparable and reproducible.
Model registry
A catalog of model artifacts with versions, metadata, and promotion stages (staging/production).
Continuous integration / delivery (for ML)
Automated build/test/deploy of training and inference components so changes ship safely.
Data drift
Change in input data distribution over time; can hurt performance even if code is unchanged.
Concept drift
The relationship between inputs and labels changes (the “world” changes); the old model becomes misaligned.
Retraining
Training again on newer data or labels to refresh the model after drift or new patterns.
1.3 · Metrics

Technical vs business metrics

Model: accuracy, precision/recall, AUC-ROC, F1; MAE/RMSE for regression.

Business: cost per user, ROI, complaints, operational load from false positives, revenue.

flowchart LR
  TM["Technical metrics: AUC, F1, RMSE"] --> Q{"Good enough for the business problem?"}
  BM["Business metrics: ROI, cost, complaints, throughput"] --> Q
  Q -->|No| I["Iterate: data, model, thresholds"]
  Q -->|Yes| S["Ship / scale with governance"]
        

Definitions

Technical (model) metrics
Quantitative scores on predictions vs labels—used to compare models and thresholds mathematically.
Accuracy
Fraction of correct predictions among all examples (can mislead with imbalanced classes).
Precision
Among positive predictions, how many were truly positive—important when false alarms are costly.
Recall (sensitivity)
Among actual positives, how many the model caught—important when missing positives is costly.
F1 score
Harmonic mean of precision and recall—balances both when you need a single number.
AUC-ROC
Area under the receiver operating characteristic curve—summarizes tradeoff between true/false positives across thresholds.
MAE / RMSE
Mean Absolute Error / Root Mean Squared Error—common regression error measures (punishes large errors more in RMSE).
Business metrics
Outcomes stakeholders care about: revenue, cost, CSAT, handle time, fraud dollars, ROI—not always aligned 1:1 with pure accuracy.
ROI (return on investment)
Value gained versus cost of building and operating the solution.
Reference

Master glossary (Domain 1)

One-page recap of terms used in this deck—use for quick review before practice tests.

AI · ML · DL
AI = smart behavior broadly; ML = learn from data; DL = deep neural nets for rich unstructured patterns.
Structured / unstructured · Tabular · Time series
Structured = rows/columns or time-ordered metrics; unstructured = text, image, audio, video.
Labeled / unlabeled
Labeled = training pairs with known targets; unlabeled = discover structure or use self-supervision.
Supervised · Unsupervised · Reinforcement
Supervised = learn X→y with labels; unsupervised = patterns without labels; RL = agent, environment, reward, policy.
Regression · Classification
Number vs category prediction.
Training · Inference · Batch · Real-time
Fit weights vs run model; bulk scoring vs low-latency API.
Pipeline stages
Collect → EDA → preprocess → features → train → tune → evaluate → deploy → monitor.
MLOps · Registry · Drift
Production ML discipline; versioned models; input or relationship changes over time.
Precision · Recall · F1 · AUC · MAE · RMSE
Classification tradeoffs and regression errors; always tie to business impact.

Scroll on this slide if needed · Continue for self-check recap

Recap · Self-check

Before Domain 2

Official guide: AIF-C01 exam guide on AWS Skill Builder / docs.aws.amazon.com · Next lesson: Domain 2 Generative AI

1 / 13