AWS Certified AI Practitioner · Domain 2 · ~24%

Fundamentals of Generative AI

GenAI building blocks, foundation-model lifecycle, business fit, limits, and AWS services for generative apps—aligned to AIF-C01. Each topic slide ends with Definitions; cram with the glossary slide.

Use ← → or Space · Pearson-style deck · Online for Mermaid CDN

2.1 · Core ideas

What is generative AI?

Generative AI creates new content (text, code, images, audio, structured answers) from learned patterns—usually driven by large models and prompts rather than only classifying existing rows.

flowchart LR
  U["Prompt / instruction"] --> T[Tokenization]
  T --> FM["Foundation model"]
  FM --> O["Generated output"]

Exam angle: Know that GenAI centers on tokens, foundation models, prompting, and often transformers—not hand-tuned scoring rules alone.

Definitions

Generative AI (GenAI): Systems that produce new samples or completions (sequences, pixels, waveforms) conditioned on input context—versus only predicting a fixed label.
Prompt: User or system text (or multimodal input) that steers the model toward a task, tone, format, or constraints.
Completion: The model’s continuation of the prompt—output tokens generated step by step.
Token: The atomic unit a model reads/writes (often subword pieces); prompts and outputs are split into tokens for processing and billing.

2.1 · Representations

Chunking · Embeddings · Vectors

Chunking splits long documents for retrieval and context limits. Embeddings map text/images into numeric vectors so “similar meaning” is nearby in vector space—used heavily in search and RAG (Domain 3 goes deeper).

flowchart TB
  DOC["Long document / corpus"] --> CH[Chunking into segments]
  CH --> E["Embedding model"]
  E --> V["Vectors stored for similarity search"]

Definitions

Chunking: Breaking content into smaller pieces (by size, sentences, or structure) so each piece fits model context windows and can be retrieved or cited independently.
Embedding: A learned numeric representation of data in a high-dimensional space; similar inputs get similar vectors—used for semantic search, clustering, and retrieval.
Vector: An ordered list of numbers representing a chunk, query, or item; distance metrics (e.g. cosine) measure similarity.
Context window: Maximum tokens a model can attend to in one forward pass; larger docs must be chunked, summarized, or retrieved selectively.

2.1 · Models

Transformers · LLMs · Foundation models

Most modern text GenAI uses transformer architectures. An LLM is a large language model; a foundation model (FM) is a broad base model (often pre-trained) you can adapt with prompts, tools, RAG, or fine-tuning.

flowchart TB
  TR["Transformer blocks attention FFN"] --> LLM["Large language model LLM"]
  LLM --> FM["Foundation model general capabilities"]
  FM --> USE["Prompt RAG fine-tune agents"]

Definitions

Transformer: Neural architecture using self-attention to relate all tokens in context—scalable basis for modern LLMs.
Large language model (LLM): A very large transformer (or mix of architectures) trained heavily on text to model language, knowledge, and task patterns.
Foundation model (FM): A broad, often pre-trained model that can underpin many tasks (Q&A, summarization, extraction) via prompting and customization paths.
Prompt engineering: Designing instructions, examples, and structure in prompts to improve quality, reliability, and safety—without changing model weights (in-context).

2.1 · Modalities

Multimodal models · Diffusion models

Multimodal

One model jointly handles more than one input/output type (e.g. image + text).

flowchart LR
  IMG[Image] --> MM["Multimodal FM"]
  TX[Text] --> MM
  MM --> OUT["Text caption answer or gen"]

Diffusion

Often used for image/audio generation: iterative denoising from random noise toward a sample.

flowchart LR
  Z[Random noise] --> S["Many denoising steps"]
  S --> G[Generated image or audio]

Definitions

Multimodal model: A model trained to align or fuse signals across modalities (text, vision, audio) for understanding or generation.
Diffusion model: A generative approach that learns to reverse a noise-corruption process—common for high-quality image generation and some audio models.
Modality: The type of data: text, image, speech, video, code, etc.—affects model choice and evaluation.

2.1 · Lifecycle

Foundation model lifecycle

From raw corpora to production: select data and base model, pre-train, optionally fine-tune, evaluate, deploy, collect feedback. Most practitioners consume FMs from providers; fewer build from scratch.

flowchart LR
  DS[Data selection] --> MS[Model selection]
  MS --> PT[Pre-training]
  PT --> FT[Fine-tuning optional]
  FT --> EV[Evaluation]
  EV --> DP[Deployment]
  DP --> FB["User feedback and monitoring"]
  FB -.-> EV

Definitions

Pre-training: Large-scale training (often self-supervised on broad data) to learn general representations before task-specific adaptation.
Fine-tuning: Additional training on a narrower dataset or objectives to specialize a model (domain, tone, format)—contrasts with in-context learning via prompts alone.
Continuous pre-training: Further pre-training on new corpora to refresh knowledge while retaining base capabilities—requires care and governance.
RLHF (high level): Reinforcement learning from human feedback: align model outputs with human preferences using reward modeling—exam may name it as a fine-tuning/alignment technique.

2.2 · Use cases

Where GenAI creates value

Match modality and risk to the task: drafting, summarization, translation, code assist, search assistance, contact-center augmentation, content generation, personalization—always with human review where stakes are high.

flowchart TB
  subgraph val["High-value patterns"]
    D1["Draft and refine text"]
    D2["Summarize long content"]
    D3["Classify extract structure"]
    D4["Code suggestion and tests"]
    D5["Conversational assistants"]
  end

Definitions

Summarization: Condensing source material into shorter faithful text—may be extractive (select spans) or abstractive (new phrasing).
Translation / localization: Converting content across languages or regional style; quality and domain vocabulary matter.
Code generation: Producing or editing source from natural-language intent; must be validated with tests and review.
Conversational agent: A dialogue system using an FM (often plus tools) to hold multi-turn tasks—exam links this to agents in later domains.

2.2 · Reality check

Strengths vs limits

GenAI is fast and flexible but not an oracle. Expect hallucinations, nondeterminism (temperature), fairness gaps, and limited verifiability without retrieval or tools.

flowchart LR
  subgraph pros["Advantages"]
    P1["Broad generalization"]
    P2["Rapid prototyping"]
    P3["Natural language interface"]
  end
  subgraph cons["Risks and limits"]
    C1["Hallucinations"]
    C2["Nondeterminism"]
    C3["Opaque reasoning"]
    C4["Data and license sensitivity"]
  end

Definitions

Hallucination: Plausible-sounding but false or unsupported statements—critical risk for factual, legal, or medical use cases.
Nondeterminism: With sampling (temperature > 0), the same prompt can yield different outputs across runs—matters for testing and compliance.
Interpretability gap: Difficulty explaining why a specific token was generated; mitigated with citations (RAG), structured outputs, and audits.
Inaccuracy / staleness: Model knowledge may be outdated or wrong for niche domains—mitigate with retrieval, fine-tuning, or tool use.

2.2 · Selection & value

Choosing models · Business metrics

Balance capability, latency, cost, context length, license, safety, and compliance. Tie pilots to business KPIs, not only novelty scores.

flowchart TB
  Q{"Model choice drivers"} --> A[Task modality]
  Q --> B[Latency and throughput]
  Q --> C[Cost tokens and hosting]
  Q --> D["Compliance and data residency"]
  Q --> E["Customization needs RAG fine-tune"]

Definitions

Latency: Time to first token or full response—drives UX for interactive apps.
Throughput: Tokens or requests processed per second—matters at high volume.
Token-based pricing: Billing by input/output tokens consumed; major driver of cloud FM cost on fully managed APIs.
Provisioned throughput: Reserved model capacity for predictable performance and spend—versus on-demand bursting.
Business value metrics: Examples: handle time, conversion, cost per resolution, employee productivity, error reduction, revenue per user—paired with quality checks.

2.3 · AWS GenAI stack

Services you should recognize

Amazon Bedrock is the FM hub (multiple providers, customization paths). SageMaker JumpStart brings models and notebooks. PartyRock is a playground for quick experimentation. Amazon Q is the assistant experience across productivity and AWS.

flowchart TB
  subgraph apps["Applications and UX"]
    Q["Amazon Q"]
    PR["PartyRock playground"]
  end
  subgraph fm["Foundation models"]
    BR["Amazon Bedrock FMs tools guardrails"]
  end
  subgraph build["Build and ops"]
    JS["SageMaker JumpStart"]
    SM["SageMaker training and hosting"]
  end
  apps --> BR
  build --> BR

Definitions

Amazon Bedrock: Fully managed service to invoke and customize foundation models from multiple providers, often with features for knowledge bases, agents, and guardrails.
SageMaker JumpStart: Curated models, solutions, and notebooks inside SageMaker to start quickly with pre-trained models and examples.
PartyRock: Amazon’s Bedrock-based playground for low-friction GenAI app experiments and learning (naming and features evolve—verify in AWS docs).
Amazon Q: AWS’s generative assistant product family for business and builder workflows—integrates with enterprise data and AWS consoles per AWS positioning.
Guardrails (Bedrock): Configurable policies to filter undesirable topics, PII, or toxicity—part of responsible deployment on AWS.

2.3 · Cloud tradeoffs

Why AWS for GenAI · Cost dimensions

AWS stresses security, compliance footprint, global regions, and integration with identity, storage, and data services. Costs hinge on tokens, infrastructure, customization, and reliability choices.

flowchart LR
  subgraph savings["Cost and ops levers"]
    L1["On-demand vs provisioned throughput"]
    L2["Model size vs quality"]
    L3["Caching and prompt design"]
    L4["Region and redundancy choices"]
  end

Exam pattern: Pick the lever—performance vs cost, availability, data residency, token pricing—for a scenario.

Definitions

Shared responsibility model: AWS secures the cloud infrastructure; customers secure what they put in the cloud (data, IAM, app config)—applies to GenAI workloads.
Regional coverage: Choosing an AWS Region for data residency, latency, and which Bedrock models/features are available there.
Redundancy / availability: Design for fault tolerance (Multi-AZ, retries); higher resilience generally increases cost.
Custom model on Bedrock: Fine-tuned or continued-pretrained variants with their own hosting and pricing profile—exam may contrast with base on-demand inference.

Reference

Master glossary (Domain 2)

Quick-review sheet aligned to Domain 2 task statements.

GenAI · Prompt · Token · Completion: Produce new content; steering text; billing/read unit; model output stream.
Chunking · Embedding · Vector: Split long content; semantic numeric representation; similarity search unit.
Transformer · LLM · Foundation model: Attention-based architecture; large text model; broad base for many tasks.
Prompt engineering · Context window: In-context control; max tokens per call.
Multimodal · Diffusion: Multiple input/output types; denoising generative process.
Pre-training · Fine-tuning · RLHF: General training; specialization; human-preference alignment.
Hallucination · Nondeterminism: False fluency; varying outputs at same prompt.
Latency · Throughput · Token pricing · Provisioned throughput: Speed; scale; pay per token; reserved capacity.
Bedrock · JumpStart · PartyRock · Amazon Q: FM service; curated SageMaker starters; playground; assistant UX.

Scroll if needed · Next: recap and Domain 3 preview

Recap · Self-check

Before Domain 3 (Foundation model apps)

Define tokens, embeddings, vectors, and chunking; explain why chunking matters
Contrast transformer LLM vs foundation model; name two non-text modalities
List three GenAI strengths and three risks (hallucination, nondeterminism, …)
Name four levers for model selection (latency, cost, compliance, customization)
Map a scenario to Bedrock vs JumpStart vs Q at a high level

Official: AIF-C01 exam guide · Domain 3 covers RAG, agents, prompts in depth