Vector Embeddings in FireProx¶

This notebook demonstrates how to work with vector embeddings in FireProx using the native google.cloud.firestore_v1.vector.Vector class.

Important Limitations¶

Firestore Emulator Does NOT Support Vector Embeddings

Vector embeddings are a production-only feature
The Firestore emulator will reject any operations involving vectors
All examples in this notebook require a real Firestore instance
See GitHub Issue #7216

Vector Constraints:

Maximum 2048 dimensions per vector
Vectors cannot be nested inside arrays or maps
Vectors must be at the top level of a document field

What are Vector Embeddings?¶

Vector embeddings are numerical representations of data (text, images, etc.) that capture semantic meaning. They enable:

Semantic search (find similar documents)
Clustering and classification
Recommendation systems
Question answering

FireProx uses the native Firestore Vector type directly for seamless integration.

Setup¶

Note: These examples will fail with the emulator. You must use a real Firestore project.

In [ ]:

Copied!





import os

from google.cloud import firestore
from google.cloud.firestore_v1.base_vector_query import DistanceMeasure
from google.cloud.firestore_v1.vector import Vector

from fire_prox import AsyncFireProx, FireProx

# Check if running in CI environment
if os.environ.get('NOTEBOOK_CI'):
    print("⚠️  Running in CI - skipping vector examples (requires production Firestore)")
    import sys
    sys.exit(0)

# Initialize clients (PRODUCTION ONLY - will not work with emulator)
project_id = 'your-project-id'  # Replace with your actual project ID

# Synchronous client
sync_client = firestore.Client(project=project_id)
db = FireProx(sync_client)

# Asynchronous client
async_client = firestore.AsyncClient(project=project_id)
async_db = AsyncFireProx(async_client)

print("✓ Connected to production Firestore")
print("⚠️  Remember: Vector embeddings DO NOT work with the emulator")
import os

from google.cloud import firestore
from google.cloud.firestore_v1.base_vector_query import DistanceMeasure
from google.cloud.firestore_v1.vector import Vector

from fire_prox import AsyncFireProx, FireProx

# Check if running in CI environment
if os.environ.get('NOTEBOOK_CI'):
    print("⚠️  Running in CI - skipping vector examples (requires production Firestore)")
    import sys
    sys.exit(0)

# Initialize clients (PRODUCTION ONLY - will not work with emulator)
project_id = 'your-project-id'  # Replace with your actual project ID

# Synchronous client
sync_client = firestore.Client(project=project_id)
db = FireProx(sync_client)

# Asynchronous client
async_client = firestore.AsyncClient(project=project_id)
async_db = AsyncFireProx(async_client)

print("✓ Connected to production Firestore")
print("⚠️  Remember: Vector embeddings DO NOT work with the emulator")

Feature 1: Creating and Storing Vectors (Sync)¶

Create a native Vector from a list of floats and store it in a document.

In [ ]:

Copied!





# Create a collection for documents with embeddings
documents = db.collection('semantic_documents')

# Create a simple 3-dimensional embedding using native Vector
doc1 = documents.new()
doc1.title = "Introduction to Machine Learning"
doc1.content = "Machine learning is a subset of artificial intelligence..."
doc1.embedding = Vector([0.12, 0.45, 0.78])  # Native Vector instance

# Save to Firestore
doc1.save(doc_id='ml_intro')

print(f"✓ Saved document with {len(doc1.embedding.to_map_value()['value'])} dimensions")
print(f"  Title: {doc1.title}")
print(f"  Embedding type: {type(doc1.embedding).__name__}")
# Create a collection for documents with embeddings
documents = db.collection('semantic_documents')

# Create a simple 3-dimensional embedding using native Vector
doc1 = documents.new()
doc1.title = "Introduction to Machine Learning"
doc1.content = "Machine learning is a subset of artificial intelligence..."
doc1.embedding = Vector([0.12, 0.45, 0.78])  # Native Vector instance

# Save to Firestore
doc1.save(doc_id='ml_intro')

print(f"✓ Saved document with {len(doc1.embedding.to_map_value()['value'])} dimensions")
print(f"  Title: {doc1.title}")
print(f"  Embedding type: {type(doc1.embedding).__name__}")

Feature 2: Reading Vectors from Firestore (Sync)¶

FireProx automatically preserves native Firestore Vectors when reading.

In [ ]:

Copied!





# Read the document back
retrieved = db.doc('semantic_documents/ml_intro')
retrieved.fetch()

# Access the vector - stays as native Vector
print(f"Document: {retrieved.title}")
print(f"Embedding type: {type(retrieved.embedding).__name__}")
print(f"Vector: {retrieved.embedding}")
# Read the document back
retrieved = db.doc('semantic_documents/ml_intro')
retrieved.fetch()

# Access the vector - stays as native Vector
print(f"Document: {retrieved.title}")
print(f"Embedding type: {type(retrieved.embedding).__name__}")
print(f"Vector: {retrieved.embedding}")

Feature 3: Working with Higher-Dimensional Embeddings¶

Real-world embeddings typically have many more dimensions (e.g., 384, 768, 1536 dimensions).

In [ ]:

Copied!





import random

# Create a document with a realistic 384-dimensional embedding
# (typical for models like sentence-transformers/all-MiniLM-L6-v2)
doc2 = documents.new()
doc2.title = "Deep Learning Fundamentals"
doc2.content = "Deep learning uses neural networks with multiple layers..."

# Generate a random 384-dimensional embedding (in practice, use a real model)
embedding_384d = [random.random() for _ in range(384)]
doc2.embedding = Vector(embedding_384d)

doc2.save(doc_id='dl_fundamentals')

dimension_count = len(doc2.embedding.to_map_value()['value'])
print(f"✓ Saved document with {dimension_count}-dimensional embedding")
values = doc2.embedding.to_map_value()['value']
print(f"  First 5 dimensions: {values[:5]}")
print(f"  Last 5 dimensions: {values[-5:]}")
import random

# Create a document with a realistic 384-dimensional embedding
# (typical for models like sentence-transformers/all-MiniLM-L6-v2)
doc2 = documents.new()
doc2.title = "Deep Learning Fundamentals"
doc2.content = "Deep learning uses neural networks with multiple layers..."

# Generate a random 384-dimensional embedding (in practice, use a real model)
embedding_384d = [random.random() for _ in range(384)]
doc2.embedding = Vector(embedding_384d)

doc2.save(doc_id='dl_fundamentals')

dimension_count = len(doc2.embedding.to_map_value()['value'])
print(f"✓ Saved document with {dimension_count}-dimensional embedding")
values = doc2.embedding.to_map_value()['value']
print(f"  First 5 dimensions: {values[:5]}")
print(f"  Last 5 dimensions: {values[-5:]}")

Feature 4: Dimension Validation¶

Firestore enforces a maximum dimension limit of 2048.

In [ ]:

Copied!





MAX_DIMENSIONS = 2048

print(f"Firestore maximum dimensions: {MAX_DIMENSIONS}")

# This works - exactly at the limit
max_vector = Vector([0.1] * MAX_DIMENSIONS)
print(f"✓ Created vector with {len(max_vector.to_map_value()['value'])} dimensions (max allowed)")

# This will fail when you try to save - exceeds the limit
try:
    too_large = Vector([0.1] * (MAX_DIMENSIONS + 1))
    doc_test = documents.new()
    doc_test.embedding = too_large
    # doc_test.save()  # This would fail
    print(f"\n⚠️  Created vector with {len(too_large.to_map_value()['value'])} dimensions")
    print("   (This will fail when you try to save to Firestore!)")
except Exception as e:
    print(f"\n✗ Error: {e}")
MAX_DIMENSIONS = 2048

print(f"Firestore maximum dimensions: {MAX_DIMENSIONS}")

# This works - exactly at the limit
max_vector = Vector([0.1] * MAX_DIMENSIONS)
print(f"✓ Created vector with {len(max_vector.to_map_value()['value'])} dimensions (max allowed)")

# This will fail when you try to save - exceeds the limit
try:
    too_large = Vector([0.1] * (MAX_DIMENSIONS + 1))
    doc_test = documents.new()
    doc_test.embedding = too_large
    # doc_test.save()  # This would fail
    print(f"\n⚠️  Created vector with {len(too_large.to_map_value()['value'])} dimensions")
    print("   (This will fail when you try to save to Firestore!)")
except Exception as e:
    print(f"\n✗ Error: {e}")

Feature 5: Async Operations with Vectors¶

Vectors work seamlessly with the async API.

In [ ]:

Copied!





# Async version - store and retrieve vectors
async_documents = async_db.collection('semantic_documents')

# Create and save
async_doc = async_documents.new()
async_doc.title = "Neural Network Architectures"
async_doc.content = "Neural networks consist of interconnected layers..."
async_doc.embedding = Vector([0.23, 0.56, 0.89])

await async_doc.save(doc_id='nn_architectures')

dimension_count = len(async_doc.embedding.to_map_value()['value'])
print(f"✓ Saved async document with {dimension_count}D embedding")

# Read back
async_retrieved = async_db.doc('semantic_documents/nn_architectures')
await async_retrieved.fetch()

print(f"\nRetrieved: {async_retrieved.title}")
print(f"Embedding: {async_retrieved.embedding}")
# Async version - store and retrieve vectors
async_documents = async_db.collection('semantic_documents')

# Create and save
async_doc = async_documents.new()
async_doc.title = "Neural Network Architectures"
async_doc.content = "Neural networks consist of interconnected layers..."
async_doc.embedding = Vector([0.23, 0.56, 0.89])

await async_doc.save(doc_id='nn_architectures')

dimension_count = len(async_doc.embedding.to_map_value()['value'])
print(f"✓ Saved async document with {dimension_count}D embedding")

# Read back
async_retrieved = async_db.doc('semantic_documents/nn_architectures')
await async_retrieved.fetch()

print(f"\nRetrieved: {async_retrieved.title}")
print(f"Embedding: {async_retrieved.embedding}")

Feature 6: Real-World Example - Text Embeddings¶

Simulate generating embeddings from text using a hypothetical embedding model.

Note: This example shows the pattern. In production, you would use a real embedding model like:

OpenAI's text-embedding-ada-002 (1536 dimensions)
Sentence Transformers (384-768 dimensions)
Google's Vertex AI embeddings (768 dimensions)

In [ ]:

Copied!





def generate_fake_embedding(text: str, dimensions: int = 384) -> list:
    """
    Simulate an embedding model (in production, use a real model).

    Real examples:
    - openai.embeddings.create(input=text, model="text-embedding-ada-002")
    - sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2').encode(text)
    - vertexai.TextEmbeddingModel.from_pretrained('textembedding-gecko').get_embeddings([text])
    """
    import hashlib
    import random

    # Use text hash as seed for reproducible "embeddings"
    seed = int(hashlib.md5(text.encode()).hexdigest(), 16) % (2**32)
    random.seed(seed)

    return [random.gauss(0, 1) for _ in range(dimensions)]

# Example documents to embed
articles = [
    {
        'title': 'Introduction to Python',
        'content': 'Python is a high-level programming language known for its simplicity and readability.'
    },
    {
        'title': 'JavaScript Basics',
        'content': 'JavaScript is the programming language of the web, enabling interactive websites.'
    },
    {
        'title': 'Database Design Principles',
        'content': 'Good database design ensures data integrity, reduces redundancy, and improves query performance.'
    }
]

# Store articles with embeddings
for i, article in enumerate(articles):
    doc = documents.new()
    doc.title = article['title']
    doc.content = article['content']

    # Generate embedding from content
    embedding = generate_fake_embedding(article['content'])
    doc.embedding = Vector(embedding)

    doc.save(doc_id=f'article_{i}')
    dimension_count = len(doc.embedding.to_map_value()['value'])
    print(f"✓ Saved: {article['title']} ({dimension_count}D)")

print("\n✓ All articles embedded and stored")
def generate_fake_embedding(text: str, dimensions: int = 384) -> list:
    """
    Simulate an embedding model (in production, use a real model).

    Real examples:
    - openai.embeddings.create(input=text, model="text-embedding-ada-002")
    - sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2').encode(text)
    - vertexai.TextEmbeddingModel.from_pretrained('textembedding-gecko').get_embeddings([text])
    """
    import hashlib
    import random

    # Use text hash as seed for reproducible "embeddings"
    seed = int(hashlib.md5(text.encode()).hexdigest(), 16) % (2**32)
    random.seed(seed)

    return [random.gauss(0, 1) for _ in range(dimensions)]

# Example documents to embed
articles = [
    {
        'title': 'Introduction to Python',
        'content': 'Python is a high-level programming language known for its simplicity and readability.'
    },
    {
        'title': 'JavaScript Basics',
        'content': 'JavaScript is the programming language of the web, enabling interactive websites.'
    },
    {
        'title': 'Database Design Principles',
        'content': 'Good database design ensures data integrity, reduces redundancy, and improves query performance.'
    }
]

# Store articles with embeddings
for i, article in enumerate(articles):
    doc = documents.new()
    doc.title = article['title']
    doc.content = article['content']

    # Generate embedding from content
    embedding = generate_fake_embedding(article['content'])
    doc.embedding = Vector(embedding)

    doc.save(doc_id=f'article_{i}')
    dimension_count = len(doc.embedding.to_map_value()['value'])
    print(f"✓ Saved: {article['title']} ({dimension_count}D)")

print("\n✓ All articles embedded and stored")

Feature 7: Vector Similarity Search with find_nearest¶

Use FireProx's find_nearest() method to perform vector similarity search and find nearest neighbors.

Requirements:

A vector index must be created on the field (using gcloud CLI or Firebase console)
Does NOT work with emulator (production only)

In [ ]:

Copied!





# Create a query vector (in practice, this would be an embedding of a search query)
query_text = "programming languages and coding"
query_embedding = generate_fake_embedding(query_text)
query_vector = Vector(query_embedding)

print(f"Searching for documents similar to: '{query_text}'")
print("\nNote: This requires a vector index on the 'embedding' field.")
print("Create index with: gcloud firestore indexes composite create ...\n")

# Find nearest neighbors using EUCLIDEAN distance
try:
    vector_query = documents.find_nearest(
        vector_field="embedding",
        query_vector=query_vector,
        distance_measure=DistanceMeasure.EUCLIDEAN,
        limit=5,
        distance_result_field="distance"  # Optional: store calculated distance
    )

    print("Top 5 nearest neighbors:")
    print("=" * 60)

    for doc in vector_query.get():
        print(f"\nTitle: {doc.title}")
        print(f"Content: {doc.content}")
        # Access distance if distance_result_field was specified
        if hasattr(doc, 'distance'):
            print(f"Distance: {doc.distance:.4f}")

except Exception as e:
    print(f"⚠️  Vector search failed: {e}")
    print("\nThis is expected if:")
    print("  1. No vector index exists on the 'embedding' field")
    print("  2. Running against emulator (vectors not supported)")
    print("  3. Collection has no documents with embeddings")
# Create a query vector (in practice, this would be an embedding of a search query)
query_text = "programming languages and coding"
query_embedding = generate_fake_embedding(query_text)
query_vector = Vector(query_embedding)

print(f"Searching for documents similar to: '{query_text}'")
print("\nNote: This requires a vector index on the 'embedding' field.")
print("Create index with: gcloud firestore indexes composite create ...\n")

# Find nearest neighbors using EUCLIDEAN distance
try:
    vector_query = documents.find_nearest(
        vector_field="embedding",
        query_vector=query_vector,
        distance_measure=DistanceMeasure.EUCLIDEAN,
        limit=5,
        distance_result_field="distance"  # Optional: store calculated distance
    )

    print("Top 5 nearest neighbors:")
    print("=" * 60)

    for doc in vector_query.get():
        print(f"\nTitle: {doc.title}")
        print(f"Content: {doc.content}")
        # Access distance if distance_result_field was specified
        if hasattr(doc, 'distance'):
            print(f"Distance: {doc.distance:.4f}")

except Exception as e:
    print(f"⚠️  Vector search failed: {e}")
    print("\nThis is expected if:")
    print("  1. No vector index exists on the 'embedding' field")
    print("  2. Running against emulator (vectors not supported)")
    print("  3. Collection has no documents with embeddings")

Feature 8: Vector Search with Pre-filtering¶

Combine where() clauses with find_nearest() to filter documents before searching.

Note: Requires a composite index when combining filters with vector search.

In [ ]:

Copied!





# First, let's add a category field to our documents
doc_python = db.doc('semantic_documents/article_0')
doc_python.fetch()
doc_python.category = 'programming'
doc_python.save()

doc_js = db.doc('semantic_documents/article_1')
doc_js.fetch()
doc_js.category = 'programming'
doc_js.save()

doc_db = db.doc('semantic_documents/article_2')
doc_db.fetch()
doc_db.category = 'database'
doc_db.save()

print("✓ Added categories to documents")

# Now search with pre-filtering
try:
    # Find nearest neighbors only among 'programming' category
    filtered_query = (
        documents
        .where('category', '==', 'programming')
        .find_nearest(
            vector_field="embedding",
            query_vector=query_vector,
            distance_measure=DistanceMeasure.COSINE,
            limit=3
        )
    )

    print("\nFiltered results (category='programming' only):")
    print("=" * 60)

    for doc in filtered_query.get():
        print(f"\nTitle: {doc.title}")
        print(f"Category: {doc.category}")
        print(f"Content: {doc.content[:50]}...")

except Exception as e:
    print(f"\n⚠️  Filtered vector search failed: {e}")
    print("\nThis requires a composite index with:")
    print("  - category field")
    print("  - embedding vector field")
# First, let's add a category field to our documents
doc_python = db.doc('semantic_documents/article_0')
doc_python.fetch()
doc_python.category = 'programming'
doc_python.save()

doc_js = db.doc('semantic_documents/article_1')
doc_js.fetch()
doc_js.category = 'programming'
doc_js.save()

doc_db = db.doc('semantic_documents/article_2')
doc_db.fetch()
doc_db.category = 'database'
doc_db.save()

print("✓ Added categories to documents")

# Now search with pre-filtering
try:
    # Find nearest neighbors only among 'programming' category
    filtered_query = (
        documents
        .where('category', '==', 'programming')
        .find_nearest(
            vector_field="embedding",
            query_vector=query_vector,
            distance_measure=DistanceMeasure.COSINE,
            limit=3
        )
    )

    print("\nFiltered results (category='programming' only):")
    print("=" * 60)

    for doc in filtered_query.get():
        print(f"\nTitle: {doc.title}")
        print(f"Category: {doc.category}")
        print(f"Content: {doc.content[:50]}...")

except Exception as e:
    print(f"\n⚠️  Filtered vector search failed: {e}")
    print("\nThis requires a composite index with:")
    print("  - category field")
    print("  - embedding vector field")

Feature 9: Async Vector Search¶

Vector search works with the async API as well.

In [ ]:

Copied!





# Async vector search
async_documents = async_db.collection('semantic_documents')

try:
    async_vector_query = async_documents.find_nearest(
        vector_field="embedding",
        query_vector=query_vector,
        distance_measure=DistanceMeasure.DOT_PRODUCT,
        limit=3
    )

    print("Async vector search results:")
    print("=" * 60)

    async for doc in async_vector_query.stream():
        print(f"\nTitle: {doc.title}")
        print(f"Content: {doc.content[:60]}...")

except Exception as e:
    print(f"⚠️  Async vector search failed: {e}")
    print("Requires vector index and production Firestore")
# Async vector search
async_documents = async_db.collection('semantic_documents')

try:
    async_vector_query = async_documents.find_nearest(
        vector_field="embedding",
        query_vector=query_vector,
        distance_measure=DistanceMeasure.DOT_PRODUCT,
        limit=3
    )

    print("Async vector search results:")
    print("=" * 60)

    async for doc in async_vector_query.stream():
        print(f"\nTitle: {doc.title}")
        print(f"Content: {doc.content[:60]}...")

except Exception as e:
    print(f"⚠️  Async vector search failed: {e}")
    print("Requires vector index and production Firestore")

Distance Measures¶

Firestore supports three distance measures for vector similarity:

EUCLIDEAN: Measures straight-line distance between vectors
- Good for: Spatial data, when magnitude matters
- Range: 0 to ∞ (lower is more similar)
COSINE: Measures angle between vectors (direction)
- Good for: Text embeddings, when direction matters more than magnitude
- Range: -1 to 1 (higher is more similar)
DOT_PRODUCT: Measures both angle and magnitude
- Good for: When both direction and magnitude are important
- Range: -∞ to ∞ (higher is more similar)

In [ ]:

Copied!





# Compare different distance measures
print("Comparing distance measures:")
print("=" * 60)

for measure in [DistanceMeasure.EUCLIDEAN, DistanceMeasure.COSINE, DistanceMeasure.DOT_PRODUCT]:
    print(f"\n{measure.name}:")
    try:
        query = documents.find_nearest(
            vector_field="embedding",
            query_vector=query_vector,
            distance_measure=measure,
            limit=2,
            distance_result_field="distance"
        )

        for doc in query.get():
            distance = getattr(doc, 'distance', 'N/A')
            print(f"  - {doc.title}: {distance}")

    except Exception as e:
        print(f"  ⚠️  Failed: {str(e)[:60]}...")
# Compare different distance measures
print("Comparing distance measures:")
print("=" * 60)

for measure in [DistanceMeasure.EUCLIDEAN, DistanceMeasure.COSINE, DistanceMeasure.DOT_PRODUCT]:
    print(f"\n{measure.name}:")
    try:
        query = documents.find_nearest(
            vector_field="embedding",
            query_vector=query_vector,
            distance_measure=measure,
            limit=2,
            distance_result_field="distance"
        )

        for doc in query.get():
            distance = getattr(doc, 'distance', 'N/A')
            print(f"  - {doc.title}: {distance}")

    except Exception as e:
        print(f"  ⚠️  Failed: {str(e)[:60]}...")

Server-Side Embedding Generation¶

Firebase Extension for Automatic Embeddings¶

Firebase provides extensions that can automatically generate embeddings when documents are created or updated:

How it works:

Configure which collection and field to monitor
When a document is created/updated, the extension triggers
It sends the text field to an embedding model (Vertex AI / Gemini)
The generated embedding is stored back in the document

Example workflow:

# 1. Save document with text content (no embedding yet)
doc = documents.new()
doc.title = "My Article"
doc.content = "This is the text content to embed..."
doc.save()

# 2. Extension automatically triggers:
#    - Reads doc.content
#    - Calls Vertex AI embedding API
#    - Writes result to doc.embedding

# 3. Read back with embedding (after extension completes)
import time
time.sleep(2)  # Wait for extension to process
doc.fetch(force=True)
print(f"Auto-generated embedding: {len(doc.embedding.to_map_value()['value'])}D")

Alternative: Client-Side Embeddings

For more control, generate embeddings in your application:

# Using OpenAI
import openai

response = openai.embeddings.create(
    input="Your text here",
    model="text-embedding-ada-002"
)
embedding = response.data[0].embedding
doc.embedding = Vector(embedding)
doc.save()

# Using Sentence Transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("Your text here").tolist()
doc.embedding = Vector(embedding)
doc.save()

Cleanup¶

In [ ]:

Copied!





# Delete test documents
test_docs = [
    'ml_intro',
    'dl_fundamentals',
    'nn_architectures',
    'article_0',
    'article_1',
    'article_2'
]

for doc_id in test_docs:
    try:
        doc = db.doc(f'semantic_documents/{doc_id}')
        doc.delete()
        print(f"✓ Deleted {doc_id}")
    except Exception as e:
        print(f"  (Could not delete {doc_id}: {e})")

print("\n✓ Cleanup complete")
# Delete test documents
test_docs = [
    'ml_intro',
    'dl_fundamentals',
    'nn_architectures',
    'article_0',
    'article_1',
    'article_2'
]

for doc_id in test_docs:
    try:
        doc = db.doc(f'semantic_documents/{doc_id}')
        doc.delete()
        print(f"✓ Deleted {doc_id}")
    except Exception as e:
        print(f"  (Could not delete {doc_id}: {e})")

print("\n✓ Cleanup complete")

Summary¶

Key Takeaways¶

Native Vector Support: FireProx uses native google.cloud.firestore_v1.vector.Vector directly
Automatic Handling: FireProx preserves Vector types seamlessly during read/write operations
Vector Search: Use find_nearest() for similarity search and nearest neighbor queries
Distance Measures: Choose from EUCLIDEAN, COSINE, or DOT_PRODUCT based on your use case
Pre-filtering: Combine where() with find_nearest() for filtered vector search
Sync & Async: Works with both synchronous and asynchronous APIs
Production Only: Vectors do NOT work with Firestore emulator

Limitations to Remember¶

⚠️ Emulator does not support vectors
⚠️ Maximum 2048 dimensions
⚠️ Maximum 1000 results per query
⚠️ Vectors cannot be nested in arrays/maps
⚠️ Vectors must be top-level document fields
⚠️ Requires vector index for search operations
⚠️ No real-time snapshot listeners for vector queries

Use Cases¶

Semantic Search: Find documents similar to a query
Content Recommendations: Suggest related articles/products
Question Answering: Match questions to relevant answers
Image Search: Find similar images by embedding
Clustering: Group similar documents together
Duplicate Detection: Find near-duplicate content

Next Steps¶

To build a complete semantic search system:

Choose an embedding model (OpenAI, Sentence Transformers, Vertex AI)
Generate embeddings for your documents
Store using native Vector type
Create vector indexes (using gcloud CLI or Firebase console)
Use find_nearest() for similarity search
Optionally combine with filters using where()