Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Readme.md

Vector Search Engine Comparisons

From my research, the Vector Search landscape is converging around these five offerings:

  1. String Lookups (Database)
  2. Vector Embedding (ML)
  3. ANN/kNN (Vector Search)
  4. Filtering (Token Search - tf-idf/bm25)
  5. Re-Ranking (Feedback Loops, Model Re-Training, Learn-To-Rank, etc.)

This repository will discuss the various criteria to use when evaluating the landscape of vector search engine technologies.

Hosting Performance MLOps Availability Security Cost Community Dense vs Sparse
Pinecone Cloud Only N/A
Weaviate Both 31
Google Cloud Only N/A Dense
Elastic Both 161 Both
Algolia Cloud Only N/A Both
Vespa Both N/A
Milvus Both 194
Redis Both 617 Both
Qdrant Both 28
OpenSearch Both 135 Both
LucidWorks Cloud Only

Criteria

Hosting

Where is the search engine deployable?

  • Both: Can be deployed as a managed service in the cloud or self-managed on premise.
  • Cloud Only: Can only be deployed as a managed service in the cloud.

Performance

This gets challenging as some of the technologies are cloud exclusive and some are hybrid capable. We'll have to devise a benchmark that compares read/write SLAs against comparable hardware. Ideally, we'll use Locust.io to simulate fixed throughput concurrency.

Proposed benchmarks below:

Hardware Selection:

  • Must select single node instances with comparable CPU, RAM, Disk, and IOPs

Write:

  • Insert 1,000,000 vectors with a fixed dimensionality (384) as an output from the most popular sentence similarity transformer
  • Measure the time to completion/durability

Read:

  • Measure max reads/sec and query response latency
  • Fixed K value (10)
  • Will only be using cosine similarity to limit scope

MLOps

Availability

Security

Cost

Community

With the caveat that some of these vector search engines are baked into a parent repository (i.e. Redis vector search within core Redis) we've taken the total number of contributors as of 9/19/2022.

Dense vs Sparse

Can you perform dense vector searches in conjunction with sparse vector searches?

Database Details

Pinecone

Documentation

Weaviate

Documentation

Google

Documentation

Elastic

Documentation

Algolia

TBA

Vespa

Documentation

Milvus

Documentation

Redis

Documentation

Qdrant

Documentation

OpenSearch

Documentation

Extended Reading