Microsoft Unveils DiskANN-Integrated Vector Search System in Azure Cosmos DB for Cost-Effective, Low-Latency Performance

The Importance of High-Dimensional Vector Search

Modern data systems rely heavily on the ability to search high-dimensional vector representations generated by deep learning models. These vectors capture the semantic and contextual meaning of data, allowing retrieval based on relevance and similarity rather than exact matches. This capability is crucial for applications like web search, AI assistants, and content recommendation systems, where meaningful access to information is required beyond structured queries.

Challenges with Traditional Vector Search Systems

Traditional vector databases focus on semantic search but require duplicating data from primary transactional databases, which introduces latency, storage overhead, and risks of data inconsistency. Managing two separate systems also complicates synchronization and limits scalability, flexibility, and data integrity, especially with frequent updates.

Limitations of Existing Vector Search Tools

Popular vector search tools like Zilliz and Pinecone operate as standalone services with segment-based or fully in-memory architectures. These require frequent index rebuilding and often suffer from latency spikes and high memory consumption, making them inefficient for large-scale or rapidly changing datasets. They also struggle with filtering queries, updates, and multi-tenant management due to lack of integration with transactional operations.

Microsoft’s Integrated Solution with DiskANN and Azure Cosmos DB

Researchers at Microsoft developed an innovative approach by integrating vector indexing directly into Azure Cosmos DB’s NoSQL engine using DiskANN, a graph-based indexing library known for large-scale semantic search performance. This eliminates the need for a separate vector database and leverages Cosmos DB’s native features like high availability, elasticity, multi-tenancy, and automatic partitioning. Each collection maintains a synchronized single vector index per partition using the Bw-Tree index structure.

Technical Innovations in DiskANN Integration

The DiskANN library was rewritten in Rust with asynchronous operations to fit the database environment. It retrieves or updates only necessary vector components (e.g., quantized versions, neighbor lists), reducing memory usage. Vector insertions and queries use a hybrid approach, mostly operating in quantized space. The system supports paginated searches and filter-aware traversal, enabling efficient handling of complex queries and scaling to billions of vectors. A sharded indexing mode allows indices partitioned by keys like tenant ID or time period.

Performance and Cost Efficiency

In experiments with 10 million 768-dimensional vectors, query latency was under 20 milliseconds (p50) with a recall@10 of 94.64%. Azure Cosmos DB’s query costs were 15× lower than Zilliz and 41× lower than Pinecone. Cost efficiency remained consistent as the index scaled from 100,000 to 10 million vectors, with latency and Request Units increasing less than twofold. Ingestion cost for 10 million vectors was approximately $162.5, lower than Pinecone and DataStax but higher than Zilliz. Recall stayed stable during heavy updates, and in-place deletions improved accuracy for evolving data distributions.

A Unified Approach to Semantic Search and Transactional Workloads

This research presents a practical and scalable solution that embeds vector search within a transactional database, simplifying operations and achieving significant improvements in cost, latency, and scalability. By integrating DiskANN into Azure Cosmos DB, Microsoft offers a promising template for combining semantic capabilities directly into operational data workloads.

For more details, check out the original paper and follow related updates on social media platforms.