Embeddings and Vector Databases on AWS

Embeddings and vector databases are the pair of concepts that unlock almost every practical generative AI pattern on AWS, from semantic search to retrieval-augmented generation. An embedding is a high-dimensional numeric vector that captures the semantic meaning of a piece of text, an image, or any other input — two pieces of content that mean similar things end up close together in that vector space. A vector database is a data store built specifically to index, filter, and perform approximate nearest neighbor search across millions or billions of these vectors with low latency. For AIF-C01 task statement 2.1, you must recognise what embeddings and vector databases are, which similarity metric applies to which scenario, and which AWS service provides which vector database capability. This topic walks through the theory, the core AWS options — Amazon OpenSearch Service, Aurora PostgreSQL with pgvector, Amazon MemoryDB for Redis, Amazon DocumentDB, Amazon Kendra, and Amazon Bedrock Knowledge Bases — and closes with traps, memorisation cues, and FAQs calibrated to the AIF-C01 blueprint.

What Are Embeddings and Vector Databases?

Embeddings and vector databases form a two-part system. The embedding model is the encoder that turns raw input — a sentence, a paragraph, a product description, a document chunk, even an image — into a fixed-length array of floating-point numbers. That array, the embedding, lives in a high-dimensional space (typically 256 to 4096 dimensions on modern models) where proximity encodes semantic similarity. The vector database is the storage and retrieval engine that holds these embeddings alongside their source metadata and answers one basic question fast: "given a query vector, which stored vectors are closest to it?"

Embeddings and vector databases matter for the AIF-C01 exam because generative AI applications on AWS almost universally depend on them. When you build a chatbot that answers questions from your company's internal wiki, Amazon Bedrock Knowledge Bases invokes an embedding model (Amazon Titan Text Embeddings or Cohere Embed) on every document chunk, stores the embeddings in a vector database (OpenSearch Serverless, Aurora PostgreSQL pgvector, Amazon MemoryDB for Redis, or Amazon Neptune Analytics), and at query time retrieves the top relevant chunks via approximate nearest neighbor search before handing them to a foundation model. Without embeddings and vector databases, the foundation model cannot ground its answer in your private data.

Why Embeddings and Vector Databases Exist

Before embeddings, semantic search was approximated with keyword matching, synonym expansion, and hand-tuned relevance functions such as BM25. Those approaches break whenever the query and the document describe the same concept using different words — "how do I reset my password?" versus "account credentials recovery procedure". Embeddings solve this mismatch because the encoder learns, from billions of text examples, that both sentences should be mapped to similar vectors. Vector databases exist because naive brute-force similarity search does not scale: comparing a query against a billion stored vectors one-by-one would take minutes. Approximate nearest neighbor algorithms, packaged inside dedicated vector database engines, trade a sliver of recall for one-thousand-fold speed-ups, making embeddings and vector databases practical at production scale.

Where This Topic Sits in the AIF-C01 Blueprint

Task statement 2.1 of the AWS AIF-C01 exam guide asks you to explain the basic concepts of generative AI. Embeddings and vector databases are explicitly listed as foundational concepts inside Domain 2 (Fundamentals of Generative AI, 24% of the exam). The same concepts also appear in task statement 3.1 (design considerations for applications that use foundation models) because choosing the right AWS vector database is a design decision. Expect scenario questions that ask which embedding model to use, which similarity metric fits a use case, and which AWS service is the right vector database for a given constraint.

白話文解釋 Embeddings and Vector Databases

If the math of high-dimensional vector spaces feels abstract, these three analogies reframe embeddings and vector databases in everyday terms. Pick whichever one sticks in your memory when you see a scenario question on exam day.

Analogy 1: The Library GPS Coordinate

Imagine every book in a giant library has been assigned GPS coordinates — not based on shelf position, but based on what the book is about. A romance novel sits near Paris, a thriller near New York, a cookbook near Tuscany, and a science textbook near Geneva. The "location" is the embedding: a set of numbers that places meaning in space. If you ask the librarian "find me something similar to this thriller", the librarian does not re-read every book — she looks at the query's coordinates and returns the nearest neighbours on her map. The vector database is that map plus the index that makes neighbour lookup fast. Amazon Titan Text Embeddings and Cohere Embed are the mapmakers who assign the coordinates. Amazon OpenSearch Service, Aurora PostgreSQL pgvector, and Amazon MemoryDB for Redis are competing map-storage services, each with different sizes, indexes, and price tags.

Analogy 2: The Kitchen Spice Rack

Think of embeddings as the flavour profile of each ingredient. A chef does not compare ingredients by name — "tomato" and "sundried tomato" look similar as words but can behave very differently — she compares them by flavour coordinates: acidity, sweetness, umami, pungency, aroma family. Those coordinates are the embedding. The vector database is the organised spice rack where ingredients are pre-sorted so the chef can grab "anything tangy and herbaceous" in seconds instead of tasting every jar. When the chef (your foundation model) needs to cook (generate) a specific dish (answer), she grabs the three closest ingredients from the rack and combines them. Amazon Bedrock Knowledge Bases is the automated kitchen that runs the entire workflow: taste every ingredient (embed), organise the rack (store vectors), fetch the relevant ones on demand (retrieve), and plate the final dish (generate). Amazon Kendra is the experienced sous-chef who already knows the pantry by heart and uses both the flavour map and the written labels (keyword + semantic hybrid) to find ingredients.

Analogy 3: The Open-Book Exam Cheat Sheet

Picture your application as a student taking an open-book exam. The textbook is your private knowledge base — every product spec, policy document, and support article your company owns. A classical keyword search is like flipping through the index at the back of the textbook looking for exact word matches; miss the word and you miss the page. Embeddings and vector databases are like having a smart study buddy who has already rewritten every page onto index cards sorted by meaning rather than alphabet. When the exam question arrives, the buddy reads the question's meaning, sprints to the nearest cards, and hands the top five to you. You (the foundation model) read those cards and write a grounded answer. Approximate nearest neighbor algorithms such as HNSW are the buddy's shortcut — instead of checking every card, she follows a hierarchical set of bookmarks that jump her to the right neighbourhood instantly. Amazon Bedrock Knowledge Bases hires the study buddy, the cards, and the bookmarks for you.

Core Operating Principles: How Embeddings Are Built

The embedding model is a neural network — typically a transformer — trained to map inputs into a vector space where semantic similarity corresponds to geometric proximity. The training objective, called contrastive learning in most modern embedding models, pulls semantically related pairs closer together and pushes unrelated pairs further apart across millions or billions of examples.

How Embedding Models Are Trained

Training data for embedding models typically comes from two kinds of sources: natural pairs that occur in public text (question-and-answer pairs, titles and summaries, paraphrased sentences, multilingual translations), and curated negative examples that teach the model what should be dissimilar. The model outputs a vector, and a loss function — often a variant of InfoNCE or triplet loss — penalises the network when similar pairs land far apart or dissimilar pairs land close together. Over billions of training steps, the vector space reorganises itself so that semantic meaning becomes geometry. Commercial embedding models such as Amazon Titan Text Embeddings V2 and Cohere Embed are pre-trained at scale by AWS and partners; you consume them via Amazon Bedrock and never train them yourself.

Embedding Dimensionality and Storage Cost

Each embedding is a fixed-length float array. Titan Text Embeddings V2 supports 256, 512, and 1024 dimensions; Cohere Embed English v3 produces 1024-dimension vectors; older Titan Embeddings G1 produced 1536 dimensions. Higher dimensionality can capture finer distinctions but costs more to store, transfer, and search. For one million documents, a 1024-dimension float32 embedding occupies roughly 4 GB of raw storage before index overhead. Picking the right embedding dimension is a cost versus recall trade-off that every embeddings and vector databases design must resolve.

Word, Sentence, and Document Embeddings

Embedding models come in different granularities. Word embeddings (Word2Vec, GloVe, fastText) represent single tokens and are mostly of historical interest. Sentence embeddings summarise a whole sentence into a single vector — the standard for semantic search. Document embeddings average or pool sentence embeddings to represent a paragraph, page, or entire document. Modern embedding models on Amazon Bedrock produce sentence-to-paragraph-level embeddings and handle inputs up to several thousand tokens, which is why chunking strategy matters when you build embeddings and vector databases pipelines.

An embedding is a dense, fixed-length numeric vector produced by a neural network encoder that captures the semantic meaning of an input — text, image, or audio — such that similar inputs map to nearby vectors in a high-dimensional space. A vector database is a data store optimised to index, filter, and search these vectors via approximate nearest neighbor (ANN) algorithms. On AWS, embeddings and vector databases are provided through Amazon Bedrock (Titan Embeddings, Cohere Embed) plus vector-capable stores including Amazon OpenSearch Service, Amazon Aurora PostgreSQL with pgvector, Amazon MemoryDB for Redis, Amazon DocumentDB, Amazon Neptune Analytics, and the managed Amazon Bedrock Knowledge Bases integration. Reference: https://aws.amazon.com/bedrock/knowledge-bases/

Similarity Metrics: How Vector Databases Compare Vectors

Once you have embeddings, a vector database answers queries by computing a similarity score between the query vector and stored vectors. Three metrics dominate and every AIF-C01 embeddings and vector databases scenario assumes you can distinguish them.

Cosine Similarity

Cosine similarity measures the angle between two vectors, ignoring their magnitude. The score ranges from -1 (opposite direction) to +1 (identical direction), with 0 meaning orthogonal (unrelated). Cosine similarity is the default metric for almost all modern text embedding models, including Amazon Titan Text Embeddings and Cohere Embed, because the models are trained such that direction encodes semantics regardless of how "long" a vector happens to be. If the scenario question says "semantic search over text" without further hints, cosine similarity is the safe assumption.

Dot Product

Dot product (inner product) multiplies corresponding components and sums. Unlike cosine similarity, dot product is sensitive to vector magnitude. When embeddings are L2-normalised to unit length — which Titan and Cohere models do by default — cosine similarity and dot product produce identical rankings, but dot product is slightly faster to compute. Some vector databases prefer dot product for performance; if embeddings are not normalised, dot product can over-weight "popular" or longer documents, which is occasionally desirable for recommendation systems but usually not for pure semantic search.

Euclidean Distance (L2)

Euclidean distance measures the straight-line distance between two points in the vector space. Smaller distance means more similar. Euclidean is common for image embeddings and clustering-style workloads but less common for text embeddings, where direction usually matters more than magnitude. Most AWS vector database engines — OpenSearch k-NN, pgvector, MemoryDB — support all three metrics and let you pick at index-creation time.

The single most common mistake in embeddings and vector databases design is choosing a similarity metric that does not match what the embedding model was optimised for. Amazon Titan Text Embeddings and Cohere Embed are trained for cosine similarity; using Euclidean distance against them will degrade recall. Always check the model documentation and align the vector index metric to that default. In Amazon OpenSearch Service k-NN, set space_type to cosinesimil; in Aurora pgvector, use the vector_cosine_ops operator class; in Amazon MemoryDB for Redis use the COSINE distance metric. Consistency between embedding model and similarity metric is non-negotiable. Reference: https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html

Why Vector Databases? Approximate Nearest Neighbor at Scale

A naive similarity search computes the distance between the query and every stored vector. For a thousand documents, that is fine. For a billion documents, it is an infrastructure disaster. Vector databases use approximate nearest neighbor (ANN) algorithms that sacrifice a small amount of recall for dramatic speed and cost improvements. AIF-C01 does not require deep mathematical understanding of these algorithms, but it does expect you to recognise their names and know that vector databases use them.

HNSW — Hierarchical Navigable Small World

HNSW is the most widely used ANN algorithm in modern vector databases. It builds a multi-layer graph where higher layers contain fewer, longer-range links and lower layers contain more, shorter-range links. A search traverses the top layer to find a coarse neighbourhood, then descends to refine the match. HNSW delivers excellent recall and latency at moderate memory cost and is the default engine choice in Amazon OpenSearch Service (faiss engine), Amazon Aurora PostgreSQL pgvector (HNSW index since pgvector 0.5.0), and Amazon MemoryDB for Redis (Redis 7.2 vector search).

IVF — Inverted File Index

IVF partitions the vector space into clusters using k-means. At query time, only the nearest clusters are scanned, skipping the majority of vectors. IVF typically uses less memory than HNSW but has slightly lower recall at equivalent latency. Amazon OpenSearch Service supports IVF via both the faiss and nmslib engines.

Lucene Engine

Amazon OpenSearch Service also supports a native Lucene-based k-NN engine, which is the only option that runs inside segments-based Lucene indices alongside full-text fields. Lucene is the right choice when you want hybrid queries combining BM25 keyword matching and vector similarity in one shard, which is a common pattern for enterprise search.

Exact Search as a Baseline

Most vector databases also offer exact (brute-force) search for evaluation and small datasets. Exact search gives 100% recall but scales linearly with corpus size. Use it as the ground truth when tuning ANN parameters, not as a production choice past a few thousand vectors.

Every approximate nearest neighbor algorithm — HNSW, IVF, Lucene k-NN — exposes parameters that trade recall for latency and cost. In HNSW, m controls graph connectivity and ef_construction/ef_search control how thoroughly the graph is traversed. In IVF, nlist sets the number of clusters and nprobe sets how many clusters to scan at query time. Start with the defaults documented by Amazon OpenSearch Service, Aurora pgvector, or Amazon MemoryDB for Redis, benchmark recall@10 against exact search on a held-out query set, then adjust. Embeddings and vector databases performance is a continuum, not a fixed spec. Reference: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html

AWS Vector Database Options — The Full Portfolio

AWS offers more vector database options than any single exam question can cover, but the AIF-C01 blueprint focuses on a core set you must recognise by name and primary use case. The following sections walk through each option in the order you are most likely to see them in scenario questions.

Amazon OpenSearch Service — k-NN Plugin with faiss, nmslib, and Lucene

Amazon OpenSearch Service is the flagship AWS vector database option for teams that also need full-text search, log analytics, or hybrid keyword-plus-semantic queries. The OpenSearch k-NN plugin is built into every cluster and supports three underlying engines: faiss (Facebook AI Similarity Search), nmslib (Non-Metric Space Library), and Lucene. OpenSearch supports HNSW and IVF index types, all three similarity metrics (cosine, dot product, Euclidean), and offers both Provisioned clusters (OpenSearch Service) and serverless (OpenSearch Serverless) deployment modes. OpenSearch Serverless is specifically the default vector store that Amazon Bedrock Knowledge Bases creates when you do not bring your own.

Pick Amazon OpenSearch Service when you need:

Large-scale vector search across millions to billions of embeddings.
Hybrid search combining BM25 keyword relevance with vector similarity in one query.
Rich filtering on structured metadata alongside vector search.
Integration with Amazon Bedrock Knowledge Bases as the managed vector store.

Amazon Aurora PostgreSQL with pgvector

Amazon Aurora PostgreSQL supports the pgvector extension, which adds a vector column type and index types (ivfflat and HNSW since pgvector 0.5.0) to the relational engine. pgvector turns Aurora into a first-class vector database while preserving full SQL — meaning you can JOIN embeddings against your existing relational tables, enforce foreign keys between vectors and business entities, and run transactional writes.

Pick Amazon Aurora PostgreSQL with pgvector when you need:

Tight coupling between vectors and existing relational data (customer table, product catalog, orders).
A single database for both OLTP and vector search to minimise operational surface.
Familiar PostgreSQL tooling, drivers, and backup/restore workflows.
An embeddings and vector databases footprint that does not justify a dedicated cluster.

Aurora PostgreSQL with pgvector is an excellent embeddings and vector databases choice for workloads in the single-digit to low-tens-of-millions of vectors, especially when vectors sit alongside relational data. Past that scale, OpenSearch Service — purpose-built for distributed ANN indexes — typically outperforms pgvector on both latency and cost because Aurora's compute scales vertically on one primary instance while OpenSearch scales horizontally across shards. Candidates who default to "Aurora pgvector is always cheapest" fail scenario questions that specify billions of embeddings or sub-100 ms latency at high query concurrency. Reference: https://aws.amazon.com/rds/aurora/ai-ml/

Amazon MemoryDB for Redis — Vector Search on Redis 7.2

Amazon MemoryDB for Redis added vector search capabilities on Redis 7.2, bringing in-memory ANN search with single-digit millisecond latency to the Redis API surface. MemoryDB uses HNSW for indexing and supports cosine, dot product, and Euclidean metrics. Because MemoryDB is a durable primary database (not just a cache), its vector index is persisted across Availability Zones via the same distributed transaction log that backs standard MemoryDB key-value data.

Pick Amazon MemoryDB for Redis when you need:

The lowest possible query latency for embeddings and vector databases — sub-10 ms even at high throughput.
A durable in-memory vector store that doubles as the primary application data store.
Redis-compatible tooling and clients already in your stack.
Real-time recommendation, semantic caching of LLM responses, or live personalisation.

Amazon ElastiCache for Redis (Redis OSS 7.2 and ValKey 7.2) also supports vector search and is the right choice when you want the same low-latency vector search but your use case is pure caching rather than a durable primary store.

Amazon DocumentDB 5.0 — Vector Search for JSON Documents

Amazon DocumentDB (with MongoDB compatibility) added native vector search in version 5.0, using an HNSW-style index similar in spirit to pgvector. You can store embeddings directly inside MongoDB-style documents alongside their source JSON, and query them through the familiar $vectorSearch aggregation pipeline. For teams whose application already speaks MongoDB and stores content as JSON documents, DocumentDB vector search eliminates the need for a separate vector database tier.

Pick Amazon DocumentDB when you need:

MongoDB-compatible APIs with embedded vector search in the same cluster.
Content that is natively document-shaped (product catalogs, articles, user profiles).
A managed alternative to running MongoDB Atlas Vector Search on self-managed infrastructure.

Amazon Neptune Analytics — Graph Plus Vector Search

Amazon Neptune Analytics is a graph analytics engine that combines graph traversal with vector similarity search. It is the right AWS vector database when your data model is fundamentally relational-as-graph (knowledge graphs, fraud rings, recommendation networks) and you want to retrieve "similar entities connected by a relationship" rather than "similar documents". Neptune Analytics is also a supported vector store option for Amazon Bedrock Knowledge Bases.

Amazon RDS for PostgreSQL with pgvector

If you do not need Amazon Aurora's self-healing storage, Amazon RDS for PostgreSQL also supports the pgvector extension starting from PostgreSQL 15. Feature parity with Aurora pgvector is close, and the choice between RDS for PostgreSQL and Aurora PostgreSQL usually comes down to the broader database sizing and availability requirements, not the embeddings and vector databases features themselves.

Amazon Kendra — Managed Semantic Search (Not a Pure Vector Database)

Amazon Kendra is an intelligent enterprise search service that uses embeddings under the hood but presents a much higher-level API than a vector database. You do not manage embeddings, vector indexes, chunking, or retrieval parameters — you connect data sources (Amazon S3, SharePoint, Confluence, Salesforce, Jira, databases), define access controls, and Kendra returns ranked, semantically relevant passages with highlighted answers. Kendra also supports hybrid semantic plus keyword relevance natively.

Pick Amazon Kendra when you need:

Enterprise search with connectors to many content sources out of the box.
Zero infrastructure management for embeddings, vector indexes, or retrieval.
User-level access control integrated with identity providers.
Question-answering over documents without building your own RAG stack.

This is one of the highest-frequency AIF-C01 embeddings and vector databases traps. Amazon Kendra uses embeddings internally but does not expose a vector API. You cannot bring your own embeddings, pick an ANN algorithm, or swap similarity metrics. If the scenario says "store and search 50 million embeddings generated from Amazon Titan", Kendra is the wrong answer — pick Amazon OpenSearch Service, Aurora pgvector, or Amazon MemoryDB for Redis instead. Conversely, if the scenario says "managed enterprise search with SharePoint connectors and no embeddings pipeline to build", Kendra is the right answer over a raw vector database. The keyword discriminator is whether the question focuses on vectors (OpenSearch/pgvector/MemoryDB) or on managed search (Kendra). Reference: https://aws.amazon.com/kendra/

Amazon Bedrock Knowledge Bases — Fully Managed RAG with a Vector Store

Amazon Bedrock Knowledge Bases is the end-to-end managed RAG service on AWS. You point it at an S3 bucket, Confluence space, SharePoint site, Salesforce org, or web crawler source, and Bedrock Knowledge Bases automatically chunks the content, invokes an embedding model (Amazon Titan Text Embeddings V2 by default, Cohere Embed as an option), stores the vectors in your chosen vector store, and provides a single API (RetrieveAndGenerate) that handles retrieval, prompt assembly, and foundation model invocation. Supported vector stores include Amazon OpenSearch Serverless (default), Amazon Aurora PostgreSQL with pgvector, Amazon MemoryDB for Redis, Amazon Neptune Analytics, Pinecone, Redis Enterprise Cloud, and MongoDB Atlas.

Pick Amazon Bedrock Knowledge Bases when you need:

A production RAG pipeline without writing chunking, embedding, indexing, or retrieval code.
A managed vector store lifecycle (creation, updates, deletions synchronised with source data).
Native integration with Amazon Bedrock foundation models and Amazon Bedrock Agents.
The fastest path from "I have documents in S3" to "my chatbot answers from them".

Embedding Models on Amazon Bedrock

AWS provides two primary families of embedding models through Amazon Bedrock, and AIF-C01 expects you to recognise both.

Amazon Titan Text Embeddings

Amazon Titan Text Embeddings is the AWS first-party embedding family. Titan Text Embeddings V2 (amazon.titan-embed-text-v2:0) supports configurable output dimensions of 256, 512, or 1024, accepts up to 8192 input tokens, and is optimised for cosine similarity. It supports over 100 languages and is the default embedding model in Amazon Bedrock Knowledge Bases. Titan Multimodal Embeddings (amazon.titan-embed-image-v1) additionally encodes images into the same 1024-dimension space as text, enabling cross-modal retrieval.

Cohere Embed

Cohere Embed English v3 (cohere.embed-english-v3) and Multilingual v3 (cohere.embed-multilingual-v3) are third-party embedding models offered via the Amazon Bedrock model catalog. Cohere Embed supports input_type parameters (search_document, search_query, classification, clustering) that tune the embedding for specific downstream tasks — a feature that often improves retrieval recall compared to a single-purpose encoder.

Choosing Between Titan and Cohere Embed

Titan Text Embeddings is the conservative default when you have no strong preference: AWS-native, flexible dimensions, broad language coverage, and first-class integration with Bedrock Knowledge Bases. Cohere Embed wins when you need task-specific embeddings (the input_type tuning), when you have already benchmarked Cohere higher on your domain data, or when you want the multilingual variant's accuracy on lower-resource languages. On the AIF-C01 exam, either is an acceptable answer unless the question explicitly mentions a feature only one supports.

Pain Point Mapping: Which AWS Vector Database for Which Scenario

The AIF-C01 community reports embeddings and vector databases selection as one of the trickier scenario patterns. The following mapping converts research-driven pain points into exam-ready decision cues.

Pain Point: "Which Vector Store for a Bedrock Knowledge Base?"

Default answer: Amazon OpenSearch Serverless, because it is the one Bedrock Knowledge Bases creates automatically when you do not bring your own. Override to Aurora PostgreSQL pgvector if the scenario emphasises co-location with relational data. Override to Amazon MemoryDB for Redis if the scenario demands the lowest query latency. Override to Amazon Neptune Analytics if the scenario involves graph-shaped data.

Pain Point: "Vector Database or Kendra?"

If the question focuses on building a custom RAG pipeline with control over embeddings, chunking, or retrieval parameters, pick a vector database (OpenSearch, pgvector, MemoryDB). If the question focuses on managed enterprise search with pre-built connectors and no ML pipeline to operate, pick Amazon Kendra. The pivot keyword is "managed enterprise search" (Kendra) versus "store embeddings" (vector database).

Pain Point: "OpenSearch vs Aurora pgvector vs MemoryDB for Vector Workloads"

Large corpus (tens of millions to billions), hybrid search, dedicated vector tier → Amazon OpenSearch Service.
Moderate corpus (thousands to tens of millions), vectors alongside relational data, one database to operate → Amazon Aurora PostgreSQL pgvector.
Low-latency critical path, real-time personalisation, semantic caching of LLM responses → Amazon MemoryDB for Redis (or ElastiCache for Redis if cache-style).

Pain Point: "Which Similarity Metric?"

Default to cosine similarity for text embeddings. Use dot product when the embedding model documentation says embeddings are normalised and dot product is the recommended metric. Use Euclidean for image embeddings or clustering-style workloads where magnitude matters.

Pain Point: "Titan vs Cohere Embed"

Default to Amazon Titan Text Embeddings V2. Switch to Cohere Embed when the scenario mentions input_type task tuning or when multilingual accuracy on specific lower-resource languages is emphasised.

Embedding = dense vector capturing meaning. Vector database = store plus ANN index. Metric = cosine for text, Euclidean for images. Algorithm = HNSW default, IVF for memory savings, Lucene for hybrid full-text plus vector. AWS options: Amazon OpenSearch Service (k-NN plugin, default for Bedrock KB), Amazon Aurora PostgreSQL pgvector (SQL-native), Amazon MemoryDB for Redis (lowest latency), Amazon DocumentDB 5.0 (MongoDB-compatible), Amazon Neptune Analytics (graph plus vector), Amazon Kendra (managed enterprise search, NOT a raw vector DB), Amazon Bedrock Knowledge Bases (end-to-end managed RAG). Embedding models on Bedrock: Amazon Titan Text Embeddings V2 and Cohere Embed v3. Reference: https://aws.amazon.com/bedrock/knowledge-bases/

Common Exam Traps for Embeddings and Vector Databases

The AIF-C01 question bank reliably hits the following gotchas. Review each one before exam day.

Trap 1: Amazon Kendra Is Not a Vector Database

Amazon Kendra uses embeddings internally but is a managed search product, not a raw vector store. Questions that say "store 50 million Titan embeddings" are asking for a vector database (OpenSearch, pgvector, MemoryDB), not Kendra.

Trap 2: OpenSearch Is Not Only for Logs

Amazon OpenSearch Service began life as a log analytics engine, which misleads candidates into thinking it is the wrong tool for embeddings and vector databases. In fact, OpenSearch is the default vector store for Amazon Bedrock Knowledge Bases and the most common AWS vector database for large-scale semantic search.

Trap 3: RAG Does Not Retrain the Model

Embeddings and vector databases power retrieval-augmented generation, but RAG does not update model weights. The foundation model still generates from its frozen parameters plus whatever context was retrieved and injected. If the scenario asks how to teach a model new domain knowledge without retraining, RAG (and therefore embeddings and vector databases) is the answer.

Trap 4: Higher Dimensionality Is Not Always Better

Larger embedding dimensions cost more to store and search, and the accuracy gain past 1024 dimensions is usually marginal for semantic search. If the scenario optimises for cost at scale, picking 256-dimension or 512-dimension Titan Text Embeddings V2 is often the right call.

Trap 5: Similarity Metric Must Match the Embedding Model

Using Euclidean distance against an embedding model trained for cosine similarity degrades recall. Always align the vector index metric with the embedding model documentation default.

Trap 6: Amazon Bedrock Knowledge Bases Hides the Vector DB, But You Still Choose One

Candidates sometimes assume Bedrock Knowledge Bases removes the vector database decision entirely. It does not — Knowledge Bases still needs a vector store, and you pick from Amazon OpenSearch Serverless (default), Aurora PostgreSQL pgvector, Amazon MemoryDB for Redis, Amazon Neptune Analytics, and a few third-party options. Embeddings and vector databases selection remains a design question even inside a managed RAG service.

Trap 7: Amazon DocumentDB Needs Version 5.0 for Vector Search

Vector search on Amazon DocumentDB is a 5.0+ feature. Older DocumentDB versions do not support $vectorSearch. Migration or version upgrade is required before DocumentDB can serve as your vector database.

Embeddings and Vector Databases vs Adjacent AWS Capabilities

Candidates sometimes confuse embeddings and vector databases with three neighbouring AWS features. Knowing the boundary is worth several exam points.

Embeddings vs Tokens

Tokens are sub-word units the foundation model reads one at a time during inference. Embeddings are dense semantic vectors produced by a separate embedding model and consumed by the retrieval system, not by the generation model directly. A token has a position in a sequence; an embedding has coordinates in a space. The two concepts sit in different pipeline stages.

Embeddings and Vector Databases vs Fine-Tuning

Embeddings and vector databases deliver relevant context to a frozen foundation model at query time. Fine-tuning updates the foundation model's weights on domain data. The two are complementary strategies for customisation — RAG (embeddings and vector databases) is cheaper and fresher; fine-tuning bakes domain behaviour into the model itself. AIF-C01 scenario questions often ask you to pick between them: freshness and changing data favour RAG; style consistency and specialised vocabulary favour fine-tuning.

Embeddings and Vector Databases vs Prompt Caching

Prompt caching on Amazon Bedrock stores the key-value attention cache for a repeated prompt prefix so subsequent invocations skip recomputation. This is a latency and cost optimisation at the inference layer, not a retrieval mechanism. Embeddings and vector databases answer "what context is relevant?"; prompt caching answers "how do I avoid reprocessing the same context twice?" They compose rather than compete.

Real-World Pattern: Building an Embeddings and Vector Databases Pipeline on AWS

A realistic embeddings and vector databases production pipeline combines several AWS services:

Amazon S3 — source bucket holding documents, PDFs, FAQs, product manuals.
Amazon Bedrock (Titan Text Embeddings V2 or Cohere Embed v3) — embedding model invoked per chunk.
Amazon OpenSearch Serverless — vector store configured with cosine similarity, HNSW index via the faiss engine.
Amazon Bedrock Knowledge Bases — orchestrator that chunks S3 content, calls the embedding model, writes vectors into OpenSearch Serverless, and exposes RetrieveAndGenerate.
Anthropic Claude (on Amazon Bedrock) — generation model invoked with the top-K retrieved chunks as context.
Amazon CloudWatch and AWS CloudTrail — observability over Knowledge Base sync jobs and invocation patterns.
AWS Identity and Access Management — IAM roles that scope bedrock:InvokeModel and aoss:APIAccessAll to the Knowledge Base execution role.

AIF-C01 does not require you to architect this full pipeline, but it does expect you to recognise each component and know why embeddings and vector databases are the layer that makes the whole system work.

Security, Observability, and Cost for Embeddings and Vector Databases

Every AWS vector database integrates with the platform's security and observability primitives. Encryption at rest via AWS KMS is available on Amazon OpenSearch Service, Aurora PostgreSQL pgvector, Amazon MemoryDB for Redis, Amazon DocumentDB, and Amazon Bedrock Knowledge Bases storage. Encryption in transit via TLS is enforced on all API surfaces. IAM and resource-based policies control who can read or write vectors. CloudWatch metrics expose query latency, index size, and error rates. CloudTrail logs API activity for audit.

Cost levers for embeddings and vector databases:

Embedding model invocation cost (per 1K input tokens on Amazon Bedrock).
Vector store compute and storage (OpenSearch capacity units, Aurora instance hours, MemoryDB node hours, DocumentDB instance hours).
Re-embedding cost when source content changes or the embedding model is upgraded.
Cross-region data transfer if the vector store and Bedrock region differ.

Amazon Titan Text Embeddings V2 lets you pick 256, 512, or 1024 dimensions. Start with 512 for most semantic search workloads, benchmark recall@10 against your query set, and drop to 256 if recall is acceptable — you will save roughly half the vector storage and cut query CPU proportionally. Moving from 1024 down to 512 on a billion-vector corpus can save tens of thousands of dollars per year across Amazon OpenSearch Service, Aurora pgvector, or Amazon MemoryDB for Redis. Embeddings and vector databases costs compound fast; dimension sizing is the highest-leverage lever. Reference: https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html

Practice Question Cues for Task 2.1 Embeddings and Vector Databases

When you see these keywords on the AIF-C01 exam, map immediately to the correct embeddings and vector databases service or concept:

"Semantic search", "find similar documents", "meaning-based retrieval" → embeddings plus a vector database.
"Default vector store for Amazon Bedrock Knowledge Bases" → Amazon OpenSearch Serverless.
"Vectors alongside relational data", "SQL-native vector search" → Amazon Aurora PostgreSQL pgvector.
"Lowest possible query latency", "real-time semantic cache", "in-memory vector search" → Amazon MemoryDB for Redis.
"MongoDB-compatible document store with vector search" → Amazon DocumentDB 5.0.
"Managed enterprise search with SharePoint/Confluence connectors" → Amazon Kendra (NOT a raw vector database).
"End-to-end managed RAG pipeline without writing retrieval code" → Amazon Bedrock Knowledge Bases.
"First-party AWS embedding model with configurable dimensions" → Amazon Titan Text Embeddings V2.
"Third-party embedding model with task-specific input_type tuning" → Cohere Embed v3.
"Angle between vectors ignoring magnitude" → cosine similarity.
"Straight-line distance in vector space" → Euclidean distance.
"Inner product sensitive to magnitude" → dot product.
"Hierarchical graph ANN index" → HNSW.
"Cluster-based ANN index" → IVF.
"Hybrid keyword and vector search in one shard" → Amazon OpenSearch Service Lucene k-NN engine.

Key Numbers and Must-Memorise Facts

Amazon Titan Text Embeddings V2 supports output dimensions 256, 512, or 1024, with up to 8192 input tokens.
Cohere Embed v3 on Amazon Bedrock produces 1024-dimension vectors and supports input_type task tuning.
Amazon OpenSearch Service k-NN plugin supports three engines: faiss, nmslib, Lucene.
Amazon Bedrock Knowledge Bases default vector store is Amazon OpenSearch Serverless.
Amazon MemoryDB for Redis vector search requires Redis 7.2 or later.
Amazon DocumentDB vector search requires version 5.0 or later.
pgvector on Aurora PostgreSQL supports HNSW from version 0.5.0 and IVFFlat from earlier versions.
Cosine similarity ranges from -1 to +1; higher means more similar.
Euclidean distance is smaller for more similar vectors; cosine and dot product are larger for more similar vectors.
HNSW is the default ANN algorithm across most AWS vector database engines.
An L2-normalised vector has cosine similarity equal to dot product — pick whichever is faster in your engine.

FAQ — Embeddings and Vector Databases Top Questions

1. What is the difference between embeddings and vector databases?

Embeddings are the output of an encoder model — a dense numeric vector that represents the meaning of an input. A vector database is the storage engine that indexes many embeddings and answers nearest-neighbor queries quickly. The two work together: you cannot do semantic search with embeddings alone because naive similarity scans do not scale, and a vector database is useless without embeddings to store. On AWS, the embedding side is served by Amazon Bedrock (Titan Text Embeddings, Cohere Embed), and the vector database side is served by Amazon OpenSearch Service, Aurora PostgreSQL pgvector, Amazon MemoryDB for Redis, Amazon DocumentDB, Amazon Neptune Analytics, or the managed Amazon Bedrock Knowledge Bases integration.

2. When should I use Amazon Kendra versus a vector database?

Use Amazon Kendra when you need managed enterprise search with pre-built connectors to S3, SharePoint, Confluence, Salesforce, or Jira and you do not want to operate an embeddings pipeline. Use a raw vector database (Amazon OpenSearch Service, Aurora pgvector, Amazon MemoryDB for Redis) when you need control over the embedding model, chunking strategy, similarity metric, or ANN parameters, or when the scenario explicitly says "store embeddings" or "vector search". Kendra uses embeddings internally but does not expose them as a vector API, which is why it is a frequent AIF-C01 trap answer for pure vector-store questions.

3. Which similarity metric should I pick for text embeddings on Amazon Bedrock?

Cosine similarity is the correct default for Amazon Titan Text Embeddings and Cohere Embed v3. Both models are trained such that angular distance between vectors encodes semantic similarity, and their embeddings are L2-normalised so cosine and dot product produce identical rankings. Euclidean distance is not recommended for text embeddings — reserve it for image embeddings or clustering workloads where magnitude carries meaning.

4. Which AWS vector database is the default for Amazon Bedrock Knowledge Bases?

Amazon OpenSearch Serverless is the default vector store that Amazon Bedrock Knowledge Bases provisions when you do not bring your own. You can override to Amazon Aurora PostgreSQL with pgvector (for co-location with relational data), Amazon MemoryDB for Redis (for lowest latency), Amazon Neptune Analytics (for graph-shaped data), Pinecone, Redis Enterprise Cloud, or MongoDB Atlas. The choice affects cost, latency, operational surface, and feature set, but not the Knowledge Base API contract your application uses.

5. What is the difference between HNSW and IVF as ANN algorithms?

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph index and is the default in most modern vector databases including Amazon OpenSearch Service, Aurora pgvector 0.5.0+, and Amazon MemoryDB for Redis. HNSW delivers excellent recall and latency at moderate memory cost. IVF (Inverted File) partitions the vector space into clusters via k-means and scans only the nearest clusters at query time, trading slightly lower recall for lower memory usage. OpenSearch Service supports both. For AIF-C01, remember that both are approximate nearest neighbor algorithms used inside vector databases to scale embeddings search, and HNSW is the typical default.

6. Should I use Amazon Titan Text Embeddings or Cohere Embed on Amazon Bedrock?

Amazon Titan Text Embeddings V2 is the safe default — it is AWS-native, offers configurable output dimensions (256, 512, 1024) for cost control, supports over 100 languages, and integrates first-class with Amazon Bedrock Knowledge Bases. Choose Cohere Embed v3 when you need input_type parameters to tune the embedding for search_query, search_document, classification, or clustering tasks, or when benchmarks on your specific domain or language favour Cohere. Both are solid AIF-C01 answers unless the scenario names a feature unique to one.

7. How does Amazon Bedrock Knowledge Bases relate to embeddings and vector databases?

Amazon Bedrock Knowledge Bases is a managed layer on top of embeddings and vector databases. It automates the full RAG pipeline: chunk documents from Amazon S3 or connector sources, call an embedding model (Titan or Cohere) on each chunk, write the vectors into your configured vector store (OpenSearch Serverless by default), and at query time retrieve relevant chunks, assemble the prompt, and invoke a foundation model. You still choose the vector store and the embedding model, but you no longer write the chunking, embedding loop, index creation, or retrieval logic yourself. It is the fastest path from "documents in S3" to "grounded AI answers" on AWS.