Introduction
As we have discussed previously in Scalar Quantization 101 most embedding models output vector values, which is often excessive to represent the vector space. Scalar quantization techniques greatly reduce the space needed to represent these vectors. We've also previously talked about Bit Vectors In Elasticsearch and how binary quantization can often be unacceptably lossy.
With binary quantization techniques such as those presented in the RaBitQ paper we can address the problems associated with naively quantizing into a bit vector and maintain the quality associated with scalar quantization by more thoughtfully subdividing the space and retaining residuals of the transformation. These newer techniques allow for better optimizations and generally better results over other similar techniques like product quantization (PQ) in distance calculations and a 32x level of compression that typically is not possible with scalar quantization.
Here we'll walk through some of the core aspects of binary quantization and leave the mathematical details to the RaBitQ paper.
Building the Bit Vectors
Because we can more efficiently pre-compute some aspects of the distance computation, we treat the indexing and query construction separately.
To start with let's walk through indexing three very simple 2 dimensional vectors, , , and , to see how they are transformed and stored for efficient distance computations at query time.
Our objective is to transform these vectors into much smaller representations that allow:
- A reasonable proxy for estimating distance rapidly
- Some guarantees about how vectors are distributed in the space for better control over the total number of data vectors needed to recall the true nearest neighbors
We can achieve this by:
- Shifting each vector to within a hyper-sphere, in our case a 2d circle, the unit circle
- Snapping each vector to a single representative point within each region of the circle
- Retaining corrective factors to better approximate the distance between each vector and the query vector
Let's unpack that step by step.
Find a Representative Centroid
In order to partition each dimension we need to pick a pivot point. For simplicity, we'll select one point to use to transform all of our data vectors.
Let's continue with our vectors, , , and . We pick their centroid as the pivot point.
Here's all of those points graphed together: