RaBitQ binary quantization 101

Understand the most critical components of RaBitQ binary quantization, how it works and its benefits. This guide also covers the math behind the quantization and examples.

Introduction

As we have discussed previously in Scalar Quantization 101 most embedding models output float32float32 vector values, which is often excessive to represent the vector space. Scalar quantization techniques greatly reduce the space needed to represent these vectors. We've also previously talked about Bit Vectors In Elasticsearch and how binary quantization can often be unacceptably lossy.

With binary quantization techniques such as those presented in the RaBitQ paper we can address the problems associated with naively quantizing into a bit vector and maintain the quality associated with scalar quantization by more thoughtfully subdividing the space and retaining residuals of the transformation. These newer techniques allow for better optimizations and generally better results over other similar techniques like product quantization (PQ) in distance calculations and a 32x level of compression that typically is not possible with scalar quantization.

Here we'll walk through some of the core aspects of binary quantization and leave the mathematical details to the RaBitQ paper.

Building the Bit Vectors

Because we can more efficiently pre-compute some aspects of the distance computation, we treat the indexing and query construction separately.

To start with let's walk through indexing three very simple 2 dimensional vectors, v1v_1, v2v_2, and v3v_3, to see how they are transformed and stored for efficient distance computations at query time.

v1=[0.56,0.82]v_1 = [0.56, 0.82]
v2=[1.23,0.71]v_2 = [1.23, 0.71]
v3=[-3.28,2.13]v_3 = [\text{-}3.28, 2.13]

Our objective is to transform these vectors into much smaller representations that allow:

  • A reasonable proxy for estimating distance rapidly
  • Some guarantees about how vectors are distributed in the space for better control over the total number of data vectors needed to recall the true nearest neighbors

We can achieve this by:

  • Shifting each vector to within a hyper-sphere, in our case a 2d circle, the unit circle
  • Snapping each vector to a single representative point within each region of the circle
  • Retaining corrective factors to better approximate the distance between each vector and the query vector

Let's unpack that step by step.

Find a Representative Centroid

In order to partition each dimension we need to pick a pivot point. For simplicity, we'll select one point to use to transform all of our data vectors.

Let's continue with our vectors, v1v_1, v2v_2, and v3v_3. We pick their centroid as the pivot point.

v1=[0.56,0.82]v2=[1.23,0.71]v3=[-3.28,2.13] c=[(0.56+1.23+-3.28)/3,(0.82+0.71+2.13)/3] c=[-0.49,1.22]v_1 = [0.56, 0.82] \newline v_2 = [1.23, 0.71] \newline v_3 = [\text{-}3.28, 2.13] \newline ~\\ c = [(0.56 + 1.23 + \text{-}3.28) / 3, (0.82 + 0.71 + 2.13) / 3] \newline ~\\ c = [\text{-}0.49, 1.22]

Here's all of those points graphed together: