AWS Certified AI Practitioner · Domain 3 · ~28%

Applications of Foundation Models

FM selection, inference controls, RAG, vectors on AWS, agents, prompt design, customization, and evaluation—per AIF-C01. Definitions on each topic slide plus a glossary.

Navigation: ← → · Space · Home / End

3.1 · Design

Choosing a foundation model

Trade off task modality, quality vs size, latency, context length, languages, customization needs, compliance, and cost.

flowchart TB
  START[Business task] --> M[Modality text image code]
  M --> L[Latency and throughput]
  L --> C[Context length and cost]
  C --> G[Governance compliance]
  G --> CH[Choose FM and path RAG FT etc]

Exam tip: Scenario questions often bundle “pick the cheapest acceptable latency model” with “need long documents” or “must stay in Region X.”

Definitions

Modality: Input/output type the model supports: text, image, embeddings, etc.—must match your use case.
Model size / complexity: Larger models can be more capable but cost more and may run slower—balance against SLAs.
Context length: Max prompt + completion tokens in one call; drives how much document text fits without retrieval.
Customization need: Whether prompts alone suffice or you need RAG, fine-tuning, or agents for accuracy and tools.

3.1 · Inference

Inference parameters

Temperature and top-p / top-k control randomness. Max output length caps tokens generated. Together they shape creativity vs determinism and cost.

flowchart LR
  subgraph ctrl["Controls"]
    TEM["Temperature low focused high creative"]
    LEN["Max output length"]
  end
  ctrl --> FM[Foundation model]
  FM --> OUT[Completion]

Definitions

Temperature: Sampling sharpness: lower values make outputs more deterministic; higher values increase diversity and hallucination risk.
Top-p / top-k: Truncate the sampling distribution to likely tokens—another way to tune randomness and quality.
Max output tokens: Hard cap on generated length; affects completeness of answers and token spend.
Stop sequences: Delimiters that halt generation early for structured formats.

3.1 · RAG

Retrieval-Augmented Generation

RAG retrieves relevant chunks from a knowledge base, injects them into the prompt, then the FM answers—improving factual grounding versus “parametric memory” alone.

flowchart LR
  Q[User query] --> E[Embed query]
  E --> VS[Vector search]
  VS --> CH[Top chunks]
  CH --> FM[FM with context]
  FM --> A[Answer]

On AWS, Amazon Bedrock Knowledge Bases orchestrates ingestion, embeddings, and retrieval for Bedrock models.

Definitions

RAG: Pattern: retrieve evidence, augment the prompt, generate—reduces reliance on stale baked-in weights for facts.
Knowledge base: Curated document store plus retrieval pipeline (often vector search) feeding the model.
Grounding: Answering from retrieved sources; still requires citation discipline and evaluation.

3.1 · Storage

Embeddings and vector options on AWS

The exam names services that can back vector search for embeddings. Choice depends on ops model, scale, and existing data platforms.

Amazon OpenSearch Service — k-NN / vector engine patterns
Amazon Aurora PostgreSQL / Amazon RDS for PostgreSQL — pgvector-style extensions (per product docs)
Amazon Neptune — graph + vector workloads where relationships matter
Amazon DocumentDB (with MongoDB compatibility) — document + vector use cases where applicable

flowchart TB
  EMB[Embedding vectors] --> VS[Vector store layer]
  VS --> O[OpenSearch Aurora RDS PG Neptune DocumentDB]

Definitions

Vector database / index: Storage and indexing optimized for similarity search over embeddings at scale.
Hybrid search: Combine keyword filters with vector similarity for better precision in enterprise corpora.

3.1 · Customization

Cost / complexity tradeoffs

From cheapest/fastest to change to most invasive: prompting (in-context) → RAG → fine-tuning / continued pre-training → train-from-scratch (rare for practitioners).

flowchart LR
  ICL["In-context prompts"] --> RAG[RAG + tools]
  RAG --> FT[Fine-tuning custom model]
  FT --> CPT["Continuous pre-training"]

Definitions

In-context learning: Steer behavior via instructions and examples in the prompt—no weight updates.
Fine-tuning: Train further on curated task data to specialize the model; higher cost and MLOps burden.
Continuous pre-training: Broadly refresh model knowledge on new corpora—more capacity than fine-tuning, more risk.

3.1 · Agents

Agents for multi-step tasks

Agents (e.g. Agents for Amazon Bedrock) plan and execute sequences: call APIs, query knowledge bases, then let the FM reason over results.

flowchart TB
  G[User goal] --> P[Planner FM]
  P --> KB[Knowledge base retrieval]
  P --> API[Actions and APIs]
  KB --> P
  API --> P
  P --> R[Final response]

Definitions

Agent: Orchestration pattern: model decides which tool or retrieval step comes next until the task completes.
Tool / action: An allowed capability (API, database lookup, calculator) the agent can invoke with structured parameters.
ReAct-style loop (conceptual): Alternate reasoning traces with actions; common pattern in exam discussions of agents.

3.2 · Prompts

Prompt engineering techniques

Use clear instructions, context, formats, and optional examples. Advanced patterns include chain-of-thought for multi-step reasoning (used judiciously).

flowchart LR
  Z[Zero-shot instruction only] --> F[Few-shot with examples]
  F --> COT[Chain-of-thought]
  COT --> TMP[Prompt templates and roles]

Definitions

Zero-shot: Ask the task with no labeled examples in the prompt.
Few-shot: Provide small exemplar input/output pairs to show the desired pattern.
Negative prompt: Tell the model what not to do—helps reduce unwanted behaviors or formats.
Prompt template: Reusable scaffold with slots for user input, policies, and structured output instructions.

3.2 · Risks

Prompt attacks and data risks

Production systems must mitigate prompt injection, jailbreaks, training-data poisoning, and unintended data exposure in logs or downstream tools.

flowchart TB
  ATK[Untrusted user input] --> R1[Injection overrides system policy]
  ATK --> R2[Jailbreak evades guardrails]
  ATK --> R3[Poisoned docs hurt RAG]

Mitigations include Guardrails for Amazon Bedrock, allow-listed tools, least-privilege IAM, sanitization, and human review for sensitive flows.

Definitions

Prompt injection: Malicious text that hijacks model behavior or leaks secrets by overriding instructions.
Jailbreak: Attempts to bypass safety policies or elicit disallowed outputs.
Data poisoning: Corrupting training or retrieval corpora to bias or sabotage outputs.

3.3 · Training

Fine-tuning and data preparation

Instruction tuning aligns models to follow directions. Domain adaptation narrows vocabulary and style. Data must be curated, representative, labeled, and governed; RLHF can align outputs with human preference.

flowchart LR
  CUR[Curate and label data] --> GOV[Governance and PII review]
  GOV --> FT[Fine-tune or instruction tune]
  FT --> EV[Evaluate on holdout]

Definitions

Instruction tuning: Supervised fine-tune on instruction–response pairs to improve helpfulness and format adherence.
Domain adaptation: Specialize vocabulary and reasoning patterns for a vertical (legal, finance, internal jargon).
Representative data: Training or RAG corpora should mirror production demographics and edge cases to limit bias surprises.

3.4 · Evaluation

How you know the FM fits the business

Combine automatic metrics, benchmarks, and human evaluation. Map scores to outcomes: productivity, error rate, CSAT—not leaderboard chasing alone.

ROUGE — overlap-oriented summarization quality
BLEU — n-gram overlap; common in machine translation history
BERTScore — semantic similarity using contextual embeddings

flowchart LR
  AUTO[ROUGE BLEU BERTScore] --> H[Human rubric eval]
  H --> BIZ[Business KPIs ROI latency cost]

Definitions

Human evaluation: Annotators score helpfulness, correctness, safety—gold standard for subjective tasks.
Benchmark dataset: Standardized tasks for comparing models; watch for mismatch with your domain.
Business fit: Whether measured gains justify operational risk, cost, and governance overhead.

Reference

Domain 3 master glossary

FM selection: Modality, latency, context, cost, compliance, customization.
Inference knobs: Temperature, top-p/k, max tokens, stop sequences.
RAG · Knowledge base: Retrieve, augment prompt, generate; Bedrock Knowledge Bases.
Vectors on AWS: OpenSearch, Aurora/RDS Postgres vectors, Neptune, DocumentDB—match to architecture questions.
Customization ladder: Prompt → RAG → fine-tune → continued pre-training.
Agents: Multi-step planning with tools and retrieval (Agents for Amazon Bedrock).
Prompting: Zero/few-shot, CoT, templates, negatives; watch injection/jailbreak risks.
Evaluation: ROUGE, BLEU, BERTScore, humans, business KPIs.

Longest weighted domain on the exam—review scenarios that combine RAG + guardrails + cost.

Recap

Self-check before Domain 4

List five criteria for picking an FM and tie each to a business constraint
Explain temperature and max output length tradeoffs
Draw RAG mentally: query → embed → retrieve → prompt → answer
Name two AWS vector-capable services and when you might pick each (high level)
Contrast prompt-only vs RAG vs fine-tuning for a factual Q&A scenario
Define jailbreak vs prompt injection; name one AWS mitigation feature
Name three evaluation metrics and one business metric for the same use case

Next domain: Responsible AI (Domain 4) per AIF-C01 exam guide