Watch this space for finalized datasets, baselines and metrics. Subscribe to our
for discussions and announcements.
Why this competition?
In the past few years, we’ve seen a lot of new research and creative approaches for large-scale ANNS, including:
- Partition-based, and graph-based indexing strategies (as well as hybrid indexing approaches).
- Mixing RAM and SSD storage to efficiently store and process large datasets that exceed the size of RAM.
- Using accelerator hardware such as GPUs, FPGAs, and other custom in-memory silicon.
- Leveraging machine learning for dimensionality reduction of the original vectors.
In addition to an uptick in academic interest, many implementations of these algorithms at scale now appear in production
and high availability datacenter contexts: powering enterprise-grade, mission-critical, and web-scale search applications.
In these deployment scenarios, benchmarks such as cost, preprocessing time, power consumption become just as important as
the recall-vs-latency tradeoff. Despite this, most empirical evaluations of algorithms have focused on smaller datasets
of about a million points, e.g. ann-bechmarks.com. However, deploying recent algorithmic advances in ANNS techniques for
search, recommendation and ranking at scale requires support at billion or substantially larger scale. Barring a few recent
papers, there is limited consensus on which algorithms are effective at this scale.
We believe that this challenge will be impactful in several ways:
- Provide a comparative understanding of algorithmic ideas and their application at scale.
- Promote the development of new techniques for the problem and demonstration of their value.
- Provide a compilation of datasets, many new, to enable future development of algorithms.
- Introduce a standard benchmarking approach.
By providing a platform for those interested in this problem, we aim to encourge more collaboration and collectively advance the field at a more rapid pace.
Researchers can request Azure compute credit from a pool sponsored by Microsoft Research.
Standard Hardware Tracks (T1 and T2)
There are two standard standard hardware tracks in which the trade-off between recall and throughput will be evaluated.
- (T1) In-memory indices with FAISS as the baseline.
DRAM is constrained to 64GB for search and 128GB for build. 4TB scratch space is provided for build.
- (T2) Out-of-core indices with DiskANN as the baseline.
In addition to the limited DRAM in T1, the machine will also host a ~1TB SSD for search.
Participants are expected to release their code for index building and search which the organizers will run on separate machines.
Participants provide a configuration for their index build code that would complete in 4 days on an Azure
with 4TB of SSD to be used for storing the data, index and other intermediate data (details likely to change).
For search, participants are allowed up to 10 hyperparameter configurations.
The protocol for evaluation is as follows:
- [on indexing machine] participants will be given a local path with 1B vector dataset.
- [on indexing machine] participants build an index from the 1B vectors and store back to local disk.
- [on indexing machine] Stored index is copied out to a temporary cloud storage location by the eval framework.
- [on search machine] organizers load the index from cloud storage to a local path and provide the path to the search code.
- [on search machine] organizers perform searches with held-out query set and measure recall and time to process the queries with several sets of parameters.
Finalized details for build and search hardware timing will be released along with the the eval framework.
Custom Hardware Track (T3)
Participants can use non-standard hardware such as GPUs, AI accelerators, FPGAs, and custom in-memory silicon.
In this track, participants will either 1) send their hardware, such as PCI boards to GSI Technology or 2) evaluate
themselves using the scripts made available by the organizers. For T3 participants sending hardware,
we will make specific delivery arrangements at participant’s expense. We will install the hardware on a system under
the organizers control (we have a few bare-metal options available) and follow any installation directions provided.
Participants will be allowed to temporarily log into the machine to finalize any installation and configuration,
or for debugging installation as needed. For T3 participants running the evaluation themselves, we request remote ssh
access and sudo accounts on the systems so that the organizers can verify the system and hardware (such as IPMI support,
minimum resource availability such as disk storage for datasets).
The evaluation phase will proceed like T1/T2, with a few modifications.
- For participants that send their hardware, T3 organizers will provide remote access to a separate indexing machine.
- [on separate indexing machine] participants download 1B vector dataset and store to local disk
- [on separate indexing machine] participants build an index from the 1B vectors and store back to local disk
- Stored index is copied to eval machine
- [on eval machine] T3 organizers load the index from local disk
- [on eval machine] T3 organizers perform searches with held-out query set and measure recall and time to process the queries with several sets of parameters.
- For participants that give us remote access to systems, participants are responsible for building their index.
- [on indexing machine] participants download 1B vector dataset and store to local disk
- [on indexing machine] participants build an index from the 1B vectors and store back to local disk
- Stored index is copied to eval machine
- [on eval machine] T3 organizers load the index from local disk
- [on eval machine] T3 organizers perform searches with held-out query set and measure recall and search time with several sets of parameters.
T3 will maintain different leaderboards for each dataset based on the following benchmarks:
- Recall vs throughput using the same ranking formula as the T1/T2 track
- Power- recall vs throughput/watt and a similar ranking formula to the T1/T2 track.
- Cost measured as cost/watt (measured as queries/second/watt and MSRP/watt)
- Total cost normalized across all tracks.
We will provide the exact details on how we collect and compute these benchmarks as well as additional machine and operating system specification before the competition begins.
We intend to use the following 6 billion point datasets (datasets are subject to change).
- BIGANN consists of SIFT descriptors applied to images from extracted from a large image dataset.
- Facebook-simsearchnet is a new dataset released by Facebook for this competition.
It consists of features used for image copy detection for integrity purposes.
The features are generated by Facebook SimSearchNet++ model.
- Microsoft-Turing-ANNS is a new dataset being released by the Microsoft Turing team for this competition.
It consists of Bing queries encoded by Turing AGI v5 that trains Transformers to capture similarity of intent in
web search queries. An early version of the RNN-based AGI Encoder is described in a
SIGIR'19 paper and a blogpost.
- Microsoft SPACEV1B is a new web search related dataset
released by Microsoft Bing for this competition.
It consists of document and query vectors encoded by Microsoft SpaceV Superior model to capture generic intent representation.
- Yandex DEEP1B image descriptor dataset consisting of the projected
and normalized outputs from the last fully-connected layer of the GoogLeNet model, which was pretrained on the Imagenet classification task.
- Yandex Text-to-Image-1B is a new cross-model dataset (text and visual),
where database and query vectors can potentially have different distributions in a shared representation space. Image embeddings are produced by the
Se-ResNext-101 model, and queries are textual embeddings produced by a variant of the DSSM model.
All datasets including ground truth data are in the common binary format that starts with 8 bytes of data consisting of num_points(uint32_t)
num_dimensions(uint32) followed by num_pts X num_dimensions x sizeof(type) bytes of data stored one vector after another. Data files
will have suffixes .fbin, .u8bin, and .i8bin to represent float32, uint8 and int8 type data. Note that a different query set
will be used for evaluation. The details of the datasets along with links to the base, query and sample sets, and the ground truth nearest neighbors
of the query set are listed below.
* new datasets
We recommend using Axel
for downloading BIGANN, Facebook-SSN++, Yandex DEEP1B and T2I datasets.
We recommend using AzCopy
for downloading Microsoft datasets.
Call for Participation and Timeline
Participation is open to all teams interested in developing new algorithms or re-implementing
existing algorithms more efficiently either in software or hardware. Participants are
requested to submit a brief document through CMT
for each track they will be competing in. The document should contain the following details:
- Name, email and affiliation of each participant in the team
- A name and/or URL for the submission.
- [Optional] To receive Azure credits for developing new ideas, please submit your request
by June 30th with preliminary data on smaller scale datasets and why you think
your algorithm will work well at billion scale. This will be used by the organizers to select strong
entries. We request teams who already have access to infrastructure (e.g. those from industry or
with access to large university clusters) to skip this.
For Track T3, the document should contain the following additional details to help organizers plan
and assess eligibility for seperate leaderboards:
- Type of hardware, e.g., PCIe extension board, rack-mounted system, or other.
- Evidence of the retail MSRP of the hardware, i.e., pricing on website or copy of the customer invoice.
- If hardware will be sent to GSI Technology (at the participants expense) or if organizers will given remote access to the systems.
For remote system access participants, whether their system supports standard IPMI power monitoring.
If not IPMI, then an equivalent power monitoring interface must be available.
- Operating system requirements.
- Whether the participant requires a separate machine for index building. We have limited Azure-based
Fsv2-series machines and some bare-metal machines managed by the T3 organizers.
Please review and complete the consent form for participation in Tracks T1/T2
and Track T3
. Note that there are separate consent forms
for the standard and custom hardware tracks. Completing the form is necessary for participation.
Timeline (subject to change)
- May: release of data, guidelines, and a call for participation. Registration open.
- Mid-June: Baseline results, testing infrastructure and final ranking metrics released.
- June 30th: Participants in need of compute resources to submit an expression of interest.
- Mid-July: Allocation of compute resources.
- July 30th: Final deadline for participants to submit an expression of interest through CMT.
- October 22nd: End of competition period. Teams to release of code in a containerized form, and complete a pull request to the eval framework with code to run the algorithms.
- October 29th: Participants submit a brief report outlining their algorithm and results.
- Mid-November: Release of preliminary results on standardized machines. Review of code by organizers and participants. Participants can raise concerns about the evaluation.
- Early December: Final results published, and competition results archived (the competition will go on if interest continues).
- During NeurIPS, organizers will provide an overview of the competition and results. Organizers will also request the best entries
(including leaderboard toppers, or promising new approaches) to present an overview for further discussion.
Organizers and Dataset Contributors
- Harsha Vardhan Simhadri, Microsoft Research India
- George Williams, GSI Technology
- Martin Aumüller, IT University of Copenhagen
- Artem Babenko, Yandex
- Dmitry Baranchuk, Yandex
- Qi Chen, Microsoft Research Asia
- Matthijs Douze, Facebook AI Research
- Lucas Hosseini, Facebook AI Research
- Ravishankar Krishnaswamy, Microsoft Research India and IIT Madras
- Gopal Srinivasa, Microsoft Research India
- Suhas Jayaram Subramanya, Carnegie Mellon University
- Jingdong Wang, Microsoft Research Asia
Organizers can be reached at firstname.lastname@example.org.
We thank Microsoft Research for help in organizing this competition, and contributing compute credits.
We thank Microsoft Bing and Turing teams for creating datasets for release.