
Azure Integrated Vectorization is a powerful tool for efficient data search. It allows you to process large amounts of data quickly and accurately.
To get started with Azure Integrated Vectorization, you'll need to understand its core components. These include vector search, indexing, and query processing.
Vector search is a key feature of Azure Integrated Vectorization, enabling you to search for similar vectors in a dataset. This is achieved through the use of algorithms like cosine similarity and dot product.
You can start with a simple example, such as searching for similar images in a dataset. This can be done by creating a vector representation of each image and then using vector search to find similar images.
Expand your knowledge: Azure Vector Databases
What is Azure Integrated Vectorization?
Azure Integrated Vectorization is a powerful technology that leverages machine learning to capture the meaning of data. It converts unstructured data like text, images, and audio into high-dimensional vectors, also known as embeddings, which can be used for similarity searches.
These embeddings are generated using the multi-modal embeddings API of Azure AI Vision, which can process images and text. The key idea behind vector search involves applying nearest neighbor algorithms to find similar data.
Vector embeddings allow for more accurate and efficient searches compared to traditional search systems, which rely on exact matches or lexical similarity. This technology has the potential to transform the search process and provide more relevant results.
What Is a Database?
A database is a system designed to store and manage data efficiently.
A traditional database relies on structured data formats, but a vector database is optimized for similarity search tasks.
Vector databases are crucial in applications involving machine learning and AI.
They store data as vectors, which are numerical representations of data capturing the semantic meaning of text, images, and other data types.
By storing data as vectors, these databases facilitate quick comparisons and nearest-neighbor searches.
The choice of a suitable vector database can significantly impact the performance and scalability of systems using it.
Expand your knowledge: How to Store Files on the Cloud
Introduction
Azure Integrated Vectorization is a game-changer for search systems. It leverages machine learning to capture the meaning of data, allowing you to find similar items based on their content.
Conventional search systems rely on exact matches, but vector similarity search is a more advanced approach. This method converts unstructured data into high-dimensional vectors, also known as embeddings.
The multi-modal embeddings API of Azure AI Vision is a powerful tool for generating vectors for images and text. You can use this API to find similar data based on their content.
Vector search involves applying nearest neighbor algorithms to find similar data. This process is more efficient and effective than traditional search methods.
You might enjoy: Searching through Content on a Webpage for Seo
Similarity Metric
Azure Integrated Vectorization uses a similarity metric to measure the similarity between vectors. This metric is crucial for various applications, including recommender systems, fraud detection, and image recognition.
The similarity metric specified in the index vectorSearch section for a vector-only query is cosine, euclidean, or dotProduct. Azure OpenAI embedding models use cosine similarity, so if you're using Azure OpenAI embedding models, cosine is the recommended metric.
Cosine similarity measures the angle between two vectors and is not affected by their magnitudes. This makes it a reliable choice for many applications.
Euclidean and dotProduct are also valid values for the similarity metric, but they are not the recommended choice if you're using Azure OpenAI embedding models.
Readers also liked: Connections - Oracle Fusion Cloud Applications
Getting Started with Azure Integrated Vectorization
To get started with Azure integrated vectorization, you'll need to set up a few things. First, you'll need to have an index with searchable vector fields on Azure AI Search.
You'll also need a deployed embedding model, such as text-embedding-ada-002, text-embedding-3-small, or text-embedding-3-large on Azure OpenAI, which is used to vectorize a query. This model must be identical to the embedding model used for the vector field in your index.
Here are the specific embedding models you can use: text-embedding-ada-002text-embedding-3-smalltext-embedding-3-large
Recommended read: Next Js Bundle Is Large
Database Services
Azure offers a range of database services that can be used for vector database applications.
Microsoft Azure is a leading cloud provider, offering numerous data management services. Azure Cosmos DB for NoSQL is a globally distributed, multi-model database service designed for large-scale applications, supporting the storage and retrieval of vector embeddings.
Azure Cosmos DB for MongoDB offers a managed service for users familiar with MongoDB, providing compatibility with existing MongoDB applications and tools. This service ensures low-latency data access and high availability.
A fresh viewpoint: Cloud Foundry Services
Azure Database for PostgreSQL is a fully managed relational database service based on the open-source PostgreSQL, supporting storing vector embeddings and performing efficient similarity searches. Azure SQL Database is a fully managed platform as a service (PaaS) relational database, optimized for modern application development.
Azure Cosmos DB for PostgreSQL, a managed PostgreSQL service, offers support for storing and querying vector embeddings, making it suitable for applications involving complex similarity searches. Azure Cosmos DB for PostgreSQL provides low latency, high availability, and global distribution.
Azure SQL Database offers features like intelligent performance, security, and high availability, ensuring that applications can scale and maintain data integrity and security. The integration with Azure’s ecosystem allows for easy use of additional services like AI and analytics.
Suggestion: Azure Service Operator
Getting Started
To get started with Azure Integrated Vectorization, create an Azure account if you don't already have one.
Azure Integrated Vectorization is a feature that's available in Azure Machine Learning, a cloud-based platform for building, training, and deploying machine learning models.
See what others are reading: Free Azure Training
You'll need to have a good understanding of machine learning concepts and Azure services to effectively use Integrated Vectorization.
Azure Machine Learning provides a visual interface for building and training models, making it easier to integrate vectorization into your workflow.
Start by exploring the Azure Machine Learning workspace, where you can create and manage your machine learning projects.
The Azure Machine Learning designer is a key tool for building and training models, and it's where you'll typically use Integrated Vectorization.
Make sure to check the Azure Machine Learning documentation for the latest information on using Integrated Vectorization in your projects.
Discover more: Learn Google Cloud Platform
Prerequisites
To get started with Azure Integrated Vectorization, you'll need to have a few things in place. Ensure you have an Azure subscription, which you can create for free or as an Azure for Students account.
You'll also need to have Python 3.x, Visual Studio Code, Jupyter Notebook, and the Jupyter Extension for Visual Studio Code installed and configured.
Worth a look: Azure Data Studio vs Azure Data Explorer
Here are the specific requirements:
- Azure AI Search in any region and on any tier.
- A vector index on Azure AI Search, which you can confirm by checking for a vectorSearch section in your index.
- Optionally, add a vectorizer to your index for built-in text-to-vector or image-to-vector conversion during queries.
- Visual Studio Code with a REST client and sample data, if you want to run examples on your own.
To proceed with Azure Integrated Vectorization, you'll need to have a deployed embedding model, such as text-embedding-ada-002, text-embedding-3-small, or text-embedding-3-large on Azure OpenAI. You'll also need permissions to use the embedding model, which can be achieved by having Cognitive Services OpenAI User permissions or providing an API key.
Discover more: Microsoft Azure from Zero to Hero - the Complete Guide
How to Use Azure Integrated Vectorization
To use Azure integrated vectorization, you'll need to create a data source connection to a supported data source for indexer-based indexing. This involves setting up a skillset that calls Text Split skill for chunking and AzureOpenAIEmbeddingModel or another embedding skill to vectorize the chunks. You can then create an index that specifies a vectorizer for query time and assign it to vector fields.
For text-to-vector conversion during queries, you'll need to specify one or more vector fields in your query, along with a text string that's converted to a vector at query time. This requires setting up a vectorizer, which is defined in the index schema and assigned to a vector field. The vectorizer must match the embedding model used to encode your content.
Here are the key components you'll need to set up for integrated vectorization:
- Data source connection to a supported data source
- Skillset that calls Text Split skill for chunking and AzureOpenAIEmbeddingModel or another embedding skill
- Index that specifies a vectorizer for query time and assigns it to vector fields
- Vectorizer defined in the index schema and assigned to a vector field
How to Use
To use Azure integrated vectorization, you'll first need to add a vectorizer to an index. This should be the same embedding model used to generate vectors in the index. You can do this by creating a data source connection to a supported data source for indexer-based indexing.
A skillset should be created that calls Text Split skill for chunking and AzureOpenAIEmbeddingModel or another embedding skill to vectorize the chunks. This skillset should be assigned to a vector index to receive the chunked and vectorized content.
The indexer should be configured to drive everything, from data retrieval, to skillset execution, through indexing. We recommend running the indexer on a schedule to pick up changed documents or any documents that were missed due to throttling.
Here are the basic steps to create a vectorizer and vector profile:
1. Use Create or Update Index to add vectorizers to a search index.
2. Add a vector profiles section that specifies one of your vectorizers.
3. Assign a vector profile to a vector field.
You can also use the Import and vectorize data wizard to explore integrated vectorization before writing any code. This wizard reads files from Azure Blob storage, creates an index with chunked and vectorized fields, and adds a vectorizer.
The vectorizer that's created by the wizard is set to the same embedding model used to index the blob content. You can test your vectorizer for text-to-vector conversion during query execution by sending a query through a vectorizer using a search client.
Here's an example of a vector query request:
- vectorQueries provides an array of vector queries.
- vector contains the image vectors and text vectors in the search index.
- fields specifies which vector field to target.
- k is the number of nearest neighbor matches to include in results.
Search results would include a combination of text and images, assuming your search index includes a field for the image file.
Additional reading: Azure Cognitive Search Vector
Check Logs
To confirm query execution on your vector field, you'll need to check your logs. If you've enabled diagnostic logging for your search service, you can run a Kusto query.

Integrated vectorization is available in preview, so you may not have it enabled by default. However, you can check if it's working as expected by looking for specific query execution logs.
To do this, you'll need to run a Kusto query on your vector field. This will help you understand how Azure Integrated Vectorization is processing your data.
Here are the steps to run a Kusto query:
- Check if diagnostic logging is enabled for your search service
- Run a Kusto query on your vector field
Weighting
Weighting is a crucial aspect of Azure Integrated Vectorization, allowing you to assign relative weights to each vector query in search operations.
The default weight value is 1.0, and you must specify a positive number larger than zero. Weights are used when calculating the reciprocal rank fusion scores of each document.
You can assign weights to vector queries in a hybrid query, reducing or increasing their importance in the request. For example, assigning a weight of 0.5 to the first vector query reduces its importance.
Vector weighting applies only to vectors, while text queries have an implicit weight of 1.0. However, you can increase or decrease the importance of text fields by setting maxTextRecallSize in a hybrid query.
The weight value is used as a multiplier against the rank score of the document within its respective result set, affecting the final ranking of documents.
Benefits and Best Practices
Azure integrated vectorization offers several benefits, including a simpler and more maintainable codebase due to the elimination of a separate data chunking and vectorization pipeline.
The integrated vectorization pipeline automates indexing end-to-end, allowing data changes in the source to be propagated through the entire pipeline. This includes retrieval, document cracking, optional AI-enrichment, data chunking, vectorization, and indexing.
Batching and retry logic are also built-in, with Azure AI Search having internal retry policies for throttling errors that surface due to the Azure OpenAI endpoint maxing out on token quotas for the embedding model.
Here are some key best practices to keep in mind when setting up an Azure OpenAI vectorizer:
- Understand your objectives and choose the right use-case for vector searching.
- Combine traditional and vector searches for improved relevance and richness of search results.
- Optimize vector dimensionality to balance accuracy and performance.
- Exploit data locality by strategically placing data close to your primary user base.
- Implement tiered storage solutions to store frequently accessed vectors in faster, more expensive storage.
Choosing a Database
For applications that require scalability and global distribution, consider Azure Cosmos DB for NoSQL. It provides low latency and multi-region writes, ensuring fast and reliable data access globally.
If your application demands both relational data and vector capabilities, Azure Cosmos DB for PostgreSQL is a good choice. It combines PostgreSQL's relational database features with Cosmos DB's scalability and resilience.
Azure AI Search is ideal for applications that rely heavily on search functionalities. It supports complex queries, natural language processing, and integrates AI models to enhance search accuracy and relevance.
Consider evaluating the performance requirements and cost constraints of your application. Services like Azure SQL Database, while traditionally relational, can be extended to handle vector embeddings and may be more cost-effective for smaller projects with less intensive vector search needs.
For applications that already use MongoDB or PostgreSQL, leveraging Azure Cosmos DB for MongoDB or Azure Database for PostgreSQL can be a good option. This can significantly reduce migration time and costs while enabling your applications to benefit from Azure's vector search capabilities.
Here are some key factors to consider when selecting a vector database on Azure:
- Application type: Consider scalability and global distribution requirements.
- Search capabilities: Evaluate the need for complex queries and natural language processing.
- Existing infrastructure: Consider leveraging existing MongoDB or PostgreSQL infrastructure.
- Performance and cost: Evaluate performance requirements and cost constraints.
Best Practices
When setting up an Azure OpenAI vectorizer, consider the same best practices that we recommend for the Azure OpenAI embedding skill. This ensures consistency and ease of maintenance.
Ritam Das, a trusted advisor with a proven track record in translating complex business problems into practical technology solutions, emphasizes the importance of understanding your objectives. He advises that vector searching has a limited set of optimal use-cases, such as recommendation systems, classification tasks, and AI chatbots.
To better leverage vector databases on Azure, combine traditional and vector searches. This can improve the relevance and richness of search results, particularly in applications like recommendation systems. Many traditional databases are adding in new data types to support vector searching, making it simple to add a column to your existing data model.
Experiment with different dimensionalities for your vector embeddings to find the balance between accuracy and performance. Higher dimensions can capture more information, but may also increase computation and storage costs.
Expand your knowledge: Cloud Sql Supported Databases

Here are some key benefits of integrated vectorization:
- No separate data chunking and vectorization pipeline. Code is simpler to write and maintain.
- Automate indexing end-to-end. When data changes in the source (such as in Azure Storage, Azure SQL, or Cosmos DB), the indexer can move those updates through the entire pipeline.
- Batching and retry logic is built in (non-configurable). Azure AI Search has internal retry policies for throttling errors that surface due to the Azure OpenAI endpoint maxing out on token quotas for the embedding model.
- Projecting chunked content to secondary indexes. Secondary indexes are created as you would any search index (a schema with fields and other constructs), but they're populated in tandem with a primary index by an indexer.
Limitations
Be aware of the Azure OpenAI quotas and limits for embedding models, as retries fail if the quota is exhausted.
Azure OpenAI token-per-minute limits are per model, per subscription, so keep this in mind if you're using an embedding model for both query and indexing workloads.
It's a good idea to follow best practices by having an embedding model for each workload and deploying them in different subscriptions if possible.
Azure AI Search has service limits by tier and workloads, so be mindful of these when designing your application.
Sources
- https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization
- https://www.instaclustr.com/education/vector-databases-in-azure-6-service-options-and-how-to-get-started/
- https://dev.to/sfoteini/use-the-azure-ai-vision-multi-modal-embeddings-api-for-image-retrieval-3p11
- https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-query
- https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-configure-vectorizer
Featured Images: pexels.com