AWS Certified AI Practitioner · Domain 3 · ~28%

Applications of Foundation Models

FM selection, inference controls, RAG, vectors on AWS, agents, prompt design, customization, and evaluation—per AIF-C01. Definitions on each topic slide plus a glossary.

Navigation: ← → · Space · Home / End

3.1 · Design

Choosing a foundation model

Trade off task modality, quality vs size, latency, context length, languages, customization needs, compliance, and cost.

flowchart TB
  START[Business task] --> M[Modality text image code]
  M --> L[Latency and throughput]
  L --> C[Context length and cost]
  C --> G[Governance compliance]
  G --> CH[Choose FM and path RAG FT etc]
        
Exam tip: Scenario questions often bundle “pick the cheapest acceptable latency model” with “need long documents” or “must stay in Region X.”

Definitions

Modality
Input/output type the model supports: text, image, embeddings, etc.—must match your use case.
Model size / complexity
Larger models can be more capable but cost more and may run slower—balance against SLAs.
Context length
Max prompt + completion tokens in one call; drives how much document text fits without retrieval.
Customization need
Whether prompts alone suffice or you need RAG, fine-tuning, or agents for accuracy and tools.
3.1 · Inference

Inference parameters

Temperature and top-p / top-k control randomness. Max output length caps tokens generated. Together they shape creativity vs determinism and cost.

flowchart LR
  subgraph ctrl["Controls"]
    TEM["Temperature low focused high creative"]
    LEN["Max output length"]
  end
  ctrl --> FM[Foundation model]
  FM --> OUT[Completion]
        

Definitions

Temperature
Sampling sharpness: lower values make outputs more deterministic; higher values increase diversity and hallucination risk.
Top-p / top-k
Truncate the sampling distribution to likely tokens—another way to tune randomness and quality.
Max output tokens
Hard cap on generated length; affects completeness of answers and token spend.
Stop sequences
Delimiters that halt generation early for structured formats.
3.1 · RAG

Retrieval-Augmented Generation

RAG retrieves relevant chunks from a knowledge base, injects them into the prompt, then the FM answers—improving factual grounding versus “parametric memory” alone.

flowchart LR
  Q[User query] --> E[Embed query]
  E --> VS[Vector search]
  VS --> CH[Top chunks]
  CH --> FM[FM with context]
  FM --> A[Answer]
        

On AWS, Amazon Bedrock Knowledge Bases orchestrates ingestion, embeddings, and retrieval for Bedrock models.

Definitions

RAG
Pattern: retrieve evidence, augment the prompt, generate—reduces reliance on stale baked-in weights for facts.
Knowledge base
Curated document store plus retrieval pipeline (often vector search) feeding the model.
Grounding
Answering from retrieved sources; still requires citation discipline and evaluation.
3.1 · Storage

Embeddings and vector options on AWS

The exam names services that can back vector search for embeddings. Choice depends on ops model, scale, and existing data platforms.

flowchart TB
  EMB[Embedding vectors] --> VS[Vector store layer]
  VS --> O[OpenSearch Aurora RDS PG Neptune DocumentDB]
        

Definitions

Vector database / index
Storage and indexing optimized for similarity search over embeddings at scale.
Hybrid search
Combine keyword filters with vector similarity for better precision in enterprise corpora.
3.1 · Customization

Cost / complexity tradeoffs

From cheapest/fastest to change to most invasive: prompting (in-context) → RAGfine-tuning / continued pre-training → train-from-scratch (rare for practitioners).

flowchart LR
  ICL["In-context prompts"] --> RAG[RAG + tools]
  RAG --> FT[Fine-tuning custom model]
  FT --> CPT["Continuous pre-training"]
        

Definitions

In-context learning
Steer behavior via instructions and examples in the prompt—no weight updates.
Fine-tuning
Train further on curated task data to specialize the model; higher cost and MLOps burden.
Continuous pre-training
Broadly refresh model knowledge on new corpora—more capacity than fine-tuning, more risk.
3.1 · Agents

Agents for multi-step tasks

Agents (e.g. Agents for Amazon Bedrock) plan and execute sequences: call APIs, query knowledge bases, then let the FM reason over results.

flowchart TB
  G[User goal] --> P[Planner FM]
  P --> KB[Knowledge base retrieval]
  P --> API[Actions and APIs]
  KB --> P
  API --> P
  P --> R[Final response]
        

Definitions

Agent
Orchestration pattern: model decides which tool or retrieval step comes next until the task completes.
Tool / action
An allowed capability (API, database lookup, calculator) the agent can invoke with structured parameters.
ReAct-style loop (conceptual)
Alternate reasoning traces with actions; common pattern in exam discussions of agents.
3.2 · Prompts

Prompt engineering techniques

Use clear instructions, context, formats, and optional examples. Advanced patterns include chain-of-thought for multi-step reasoning (used judiciously).

flowchart LR
  Z[Zero-shot instruction only] --> F[Few-shot with examples]
  F --> COT[Chain-of-thought]
  COT --> TMP[Prompt templates and roles]
        

Definitions

Zero-shot
Ask the task with no labeled examples in the prompt.
Few-shot
Provide small exemplar input/output pairs to show the desired pattern.
Negative prompt
Tell the model what not to do—helps reduce unwanted behaviors or formats.
Prompt template
Reusable scaffold with slots for user input, policies, and structured output instructions.
3.2 · Risks

Prompt attacks and data risks

Production systems must mitigate prompt injection, jailbreaks, training-data poisoning, and unintended data exposure in logs or downstream tools.

flowchart TB
  ATK[Untrusted user input] --> R1[Injection overrides system policy]
  ATK --> R2[Jailbreak evades guardrails]
  ATK --> R3[Poisoned docs hurt RAG]
        

Mitigations include Guardrails for Amazon Bedrock, allow-listed tools, least-privilege IAM, sanitization, and human review for sensitive flows.

Definitions

Prompt injection
Malicious text that hijacks model behavior or leaks secrets by overriding instructions.
Jailbreak
Attempts to bypass safety policies or elicit disallowed outputs.
Data poisoning
Corrupting training or retrieval corpora to bias or sabotage outputs.
3.3 · Training

Fine-tuning and data preparation

Instruction tuning aligns models to follow directions. Domain adaptation narrows vocabulary and style. Data must be curated, representative, labeled, and governed; RLHF can align outputs with human preference.

flowchart LR
  CUR[Curate and label data] --> GOV[Governance and PII review]
  GOV --> FT[Fine-tune or instruction tune]
  FT --> EV[Evaluate on holdout]
        

Definitions

Instruction tuning
Supervised fine-tune on instruction–response pairs to improve helpfulness and format adherence.
Domain adaptation
Specialize vocabulary and reasoning patterns for a vertical (legal, finance, internal jargon).
Representative data
Training or RAG corpora should mirror production demographics and edge cases to limit bias surprises.
3.4 · Evaluation

How you know the FM fits the business

Combine automatic metrics, benchmarks, and human evaluation. Map scores to outcomes: productivity, error rate, CSAT—not leaderboard chasing alone.

flowchart LR
  AUTO[ROUGE BLEU BERTScore] --> H[Human rubric eval]
  H --> BIZ[Business KPIs ROI latency cost]
        

Definitions

Human evaluation
Annotators score helpfulness, correctness, safety—gold standard for subjective tasks.
Benchmark dataset
Standardized tasks for comparing models; watch for mismatch with your domain.
Business fit
Whether measured gains justify operational risk, cost, and governance overhead.
Reference

Domain 3 master glossary

FM selection
Modality, latency, context, cost, compliance, customization.
Inference knobs
Temperature, top-p/k, max tokens, stop sequences.
RAG · Knowledge base
Retrieve, augment prompt, generate; Bedrock Knowledge Bases.
Vectors on AWS
OpenSearch, Aurora/RDS Postgres vectors, Neptune, DocumentDB—match to architecture questions.
Customization ladder
Prompt → RAG → fine-tune → continued pre-training.
Agents
Multi-step planning with tools and retrieval (Agents for Amazon Bedrock).
Prompting
Zero/few-shot, CoT, templates, negatives; watch injection/jailbreak risks.
Evaluation
ROUGE, BLEU, BERTScore, humans, business KPIs.

Longest weighted domain on the exam—review scenarios that combine RAG + guardrails + cost.

Recap

Self-check before Domain 4

Next domain: Responsible AI (Domain 4) per AIF-C01 exam guide

1 / 13