Extending Azure Cognitive Services: A Dive into LLM and Vector Search

In the constantly evolving landscape of artificial intelligence and machine learning, Microsoft is playing a prominent role with their innovative solutions. One of the most recent and exciting developments within their arsenal of tools is the introduction of vector search into the Cognitive Search API. This technology goes far beyond traditional search by using deep learning mechanisms to understand search queries and deliver relevant results. Microsoft’s Cognitive Search API now offers vector search as a service, meaning developers and enterprises can integrate this powerful functionality directly into their applications. This service is not only optimized to work with large language models such as those in Azure OpenAI, but also has the potential to revolutionize the way we handle search and data analysis in the future.

How Microsoft 365 Copilot Will Change the Way We Work

Tools for generative AI

Tools such as Semantic Kernel, TypeChat and LangChain can be used to build applications around generative AI technologies such as Azure OpenAI. These tools let you set constraints around the underlying large language model (LLM) and use it as a tool for building and implementing natural language interfaces.

What is an LLM?

An LLM is essentially a tool for navigating a semantic space, where a deep neural network predicts the next syllable in a series of tokens that follow your initial prompt. Where a prompt is open, the LLM can exceed its inputs, producing content that may seem plausible but is in fact nonsense.

Reliability of LLMs

Just as we trust the outputs of search engines, we also trust the outputs of LLMs. But training large language models with reliable data from sites such as Wikipedia, Stack Overflow and Reddit does not provide an understanding of the content. Sometimes the output can be correct, but sometimes it will be wrong.

How to avoid false output?

How can we avoid false and nonsensical output from our large language models and ensure that our users get accurate and logical answers?

Constraining large language models with semantic memory

What we need to do is limit the LLM. That’s where Microsoft’s new LLM-based development stack comes in handy. With tools such as TypeChat, you can enforce a specific output format or use an orchestration pipeline such as Semantic Kernel to work with additional sources of trusted information.

What is “semantic memory”?

Semantic memory uses vector search to provide a prompt that can be used to provide an actual output from an LLM. In this process, a vector database manages the context for the initial prompt and a vector search finds stored data matching the initial user query.

Adding vector indexing to Azure Cognitive Search

Azure Cognitive Search builds on Microsoft’s own work on search tools. It is a platform as a service, where your private data is hosted and Cognitive Service APIs can access your content.

Embedding vectors generated and stored for your content

It is important to note that Azure Cognitive Search is a “bring your own embedding vector” service. A nearest neighbor model is used for vector searches in Azure Cognitive Search.

Getting started with vector search in Azure Cognitive Search

Using Azure Cognitive Search for vector queries is simple. Vectors are stored as vector fields in a search index.

Going beyond simple text vectors

Much more is possible with Azure Cognitive Search’s vector capabilities than just text matching. Microsoft is rapidly productizing the tools and techniques it used to build its own GPT-4-based Bing search engine and several Copilots.