AWS Certified AI Practitioner · Domain 3 · ~28%
Applications of Foundation Models
FM selection, inference controls, RAG, vectors on AWS, agents, prompt design, customization, and evaluation—per AIF-C01. Definitions on each topic slide plus a glossary.
Navigation: ← → · Space · Home / End
3.1 · Design
Choosing a foundation model
Trade off task modality, quality vs size, latency, context length, languages, customization needs, compliance, and cost.
flowchart TB
START[Business task] --> M[Modality text image code]
M --> L[Latency and throughput]
L --> C[Context length and cost]
C --> G[Governance compliance]
G --> CH[Choose FM and path RAG FT etc]
Exam tip: Scenario questions often bundle “pick the cheapest acceptable latency model” with “need long documents” or “must stay in Region X.”
Definitions
- Modality
- Input/output type the model supports: text, image, embeddings, etc.—must match your use case.
- Model size / complexity
- Larger models can be more capable but cost more and may run slower—balance against SLAs.
- Context length
- Max prompt + completion tokens in one call; drives how much document text fits without retrieval.
- Customization need
- Whether prompts alone suffice or you need RAG, fine-tuning, or agents for accuracy and tools.
3.1 · Inference
Inference parameters
Temperature and top-p / top-k control randomness. Max output length caps tokens generated. Together they shape creativity vs determinism and cost.
flowchart LR
subgraph ctrl["Controls"]
TEM["Temperature low focused high creative"]
LEN["Max output length"]
end
ctrl --> FM[Foundation model]
FM --> OUT[Completion]
Definitions
- Temperature
- Sampling sharpness: lower values make outputs more deterministic; higher values increase diversity and hallucination risk.
- Top-p / top-k
- Truncate the sampling distribution to likely tokens—another way to tune randomness and quality.
- Max output tokens
- Hard cap on generated length; affects completeness of answers and token spend.
- Stop sequences
- Delimiters that halt generation early for structured formats.
3.1 · RAG
Retrieval-Augmented Generation
RAG retrieves relevant chunks from a knowledge base, injects them into the prompt, then the FM answers—improving factual grounding versus “parametric memory” alone.
flowchart LR
Q[User query] --> E[Embed query]
E --> VS[Vector search]
VS --> CH[Top chunks]
CH --> FM[FM with context]
FM --> A[Answer]
On AWS, Amazon Bedrock Knowledge Bases orchestrates ingestion, embeddings, and retrieval for Bedrock models.
Definitions
- RAG
- Pattern: retrieve evidence, augment the prompt, generate—reduces reliance on stale baked-in weights for facts.
- Knowledge base
- Curated document store plus retrieval pipeline (often vector search) feeding the model.
- Grounding
- Answering from retrieved sources; still requires citation discipline and evaluation.
3.1 · Storage
Embeddings and vector options on AWS
The exam names services that can back vector search for embeddings. Choice depends on ops model, scale, and existing data platforms.
- Amazon OpenSearch Service — k-NN / vector engine patterns
- Amazon Aurora PostgreSQL / Amazon RDS for PostgreSQL — pgvector-style extensions (per product docs)
- Amazon Neptune — graph + vector workloads where relationships matter
- Amazon DocumentDB (with MongoDB compatibility) — document + vector use cases where applicable
flowchart TB
EMB[Embedding vectors] --> VS[Vector store layer]
VS --> O[OpenSearch Aurora RDS PG Neptune DocumentDB]
Definitions
- Vector database / index
- Storage and indexing optimized for similarity search over embeddings at scale.
- Hybrid search
- Combine keyword filters with vector similarity for better precision in enterprise corpora.
3.1 · Customization
Cost / complexity tradeoffs
From cheapest/fastest to change to most invasive: prompting (in-context) → RAG → fine-tuning / continued pre-training → train-from-scratch (rare for practitioners).
flowchart LR
ICL["In-context prompts"] --> RAG[RAG + tools]
RAG --> FT[Fine-tuning custom model]
FT --> CPT["Continuous pre-training"]
Definitions
- In-context learning
- Steer behavior via instructions and examples in the prompt—no weight updates.
- Fine-tuning
- Train further on curated task data to specialize the model; higher cost and MLOps burden.
- Continuous pre-training
- Broadly refresh model knowledge on new corpora—more capacity than fine-tuning, more risk.
3.1 · Agents
Agents for multi-step tasks
Agents (e.g. Agents for Amazon Bedrock) plan and execute sequences: call APIs, query knowledge bases, then let the FM reason over results.
flowchart TB
G[User goal] --> P[Planner FM]
P --> KB[Knowledge base retrieval]
P --> API[Actions and APIs]
KB --> P
API --> P
P --> R[Final response]
Definitions
- Agent
- Orchestration pattern: model decides which tool or retrieval step comes next until the task completes.
- Tool / action
- An allowed capability (API, database lookup, calculator) the agent can invoke with structured parameters.
- ReAct-style loop (conceptual)
- Alternate reasoning traces with actions; common pattern in exam discussions of agents.
3.2 · Prompts
Prompt engineering techniques
Use clear instructions, context, formats, and optional examples. Advanced patterns include chain-of-thought for multi-step reasoning (used judiciously).
flowchart LR
Z[Zero-shot instruction only] --> F[Few-shot with examples]
F --> COT[Chain-of-thought]
COT --> TMP[Prompt templates and roles]
Definitions
- Zero-shot
- Ask the task with no labeled examples in the prompt.
- Few-shot
- Provide small exemplar input/output pairs to show the desired pattern.
- Negative prompt
- Tell the model what not to do—helps reduce unwanted behaviors or formats.
- Prompt template
- Reusable scaffold with slots for user input, policies, and structured output instructions.
3.2 · Risks
Prompt attacks and data risks
Production systems must mitigate prompt injection, jailbreaks, training-data poisoning, and unintended data exposure in logs or downstream tools.
flowchart TB
ATK[Untrusted user input] --> R1[Injection overrides system policy]
ATK --> R2[Jailbreak evades guardrails]
ATK --> R3[Poisoned docs hurt RAG]
Mitigations include Guardrails for Amazon Bedrock, allow-listed tools, least-privilege IAM, sanitization, and human review for sensitive flows.
Definitions
- Prompt injection
- Malicious text that hijacks model behavior or leaks secrets by overriding instructions.
- Jailbreak
- Attempts to bypass safety policies or elicit disallowed outputs.
- Data poisoning
- Corrupting training or retrieval corpora to bias or sabotage outputs.
3.3 · Training
Fine-tuning and data preparation
Instruction tuning aligns models to follow directions. Domain adaptation narrows vocabulary and style. Data must be curated, representative, labeled, and governed; RLHF can align outputs with human preference.
flowchart LR
CUR[Curate and label data] --> GOV[Governance and PII review]
GOV --> FT[Fine-tune or instruction tune]
FT --> EV[Evaluate on holdout]
Definitions
- Instruction tuning
- Supervised fine-tune on instruction–response pairs to improve helpfulness and format adherence.
- Domain adaptation
- Specialize vocabulary and reasoning patterns for a vertical (legal, finance, internal jargon).
- Representative data
- Training or RAG corpora should mirror production demographics and edge cases to limit bias surprises.
3.4 · Evaluation
How you know the FM fits the business
Combine automatic metrics, benchmarks, and human evaluation. Map scores to outcomes: productivity, error rate, CSAT—not leaderboard chasing alone.
- ROUGE — overlap-oriented summarization quality
- BLEU — n-gram overlap; common in machine translation history
- BERTScore — semantic similarity using contextual embeddings
flowchart LR
AUTO[ROUGE BLEU BERTScore] --> H[Human rubric eval]
H --> BIZ[Business KPIs ROI latency cost]
Definitions
- Human evaluation
- Annotators score helpfulness, correctness, safety—gold standard for subjective tasks.
- Benchmark dataset
- Standardized tasks for comparing models; watch for mismatch with your domain.
- Business fit
- Whether measured gains justify operational risk, cost, and governance overhead.
Reference
Domain 3 master glossary
- FM selection
- Modality, latency, context, cost, compliance, customization.
- Inference knobs
- Temperature, top-p/k, max tokens, stop sequences.
- RAG · Knowledge base
- Retrieve, augment prompt, generate; Bedrock Knowledge Bases.
- Vectors on AWS
- OpenSearch, Aurora/RDS Postgres vectors, Neptune, DocumentDB—match to architecture questions.
- Customization ladder
- Prompt → RAG → fine-tune → continued pre-training.
- Agents
- Multi-step planning with tools and retrieval (Agents for Amazon Bedrock).
- Prompting
- Zero/few-shot, CoT, templates, negatives; watch injection/jailbreak risks.
- Evaluation
- ROUGE, BLEU, BERTScore, humans, business KPIs.
Longest weighted domain on the exam—review scenarios that combine RAG + guardrails + cost.
Recap
Self-check before Domain 4
- List five criteria for picking an FM and tie each to a business constraint
- Explain temperature and max output length tradeoffs
- Draw RAG mentally: query → embed → retrieve → prompt → answer
- Name two AWS vector-capable services and when you might pick each (high level)
- Contrast prompt-only vs RAG vs fine-tuning for a factual Q&A scenario
- Define jailbreak vs prompt injection; name one AWS mitigation feature
- Name three evaluation metrics and one business metric for the same use case
Next domain: Responsible AI (Domain 4) per AIF-C01 exam guide