November 7th marked the official close of the first Redis Vector Similarity Search (VSS) Engineering Lab using the arXiv scholarly papers dataset. Not too long ago, Sam Partee covered vector search basics, and Tyler Hutcherson explored intelligent document search, in a series of posts dedicated to the topic. While vector similarity search has proven itself at companies like Google, Microsoft, Facebook, and Amazon, Redis is focused on bringing it to the masses.
We saw nearly 90 engineers and practitioners, forming over 20 teams from all around the world, compete for cash and GPU hardware prizes. The goal: to expand the envelope of semantic search and intelligent document processing applications. We gave them an example to start with: a live-hosted demo app and code. From that point forward, the possibilities were endless.
Before continuing, we’d like to take this moment to thank each of our core sponsors that made this possible: Redis, MLOps Community, Saturn Cloud, and NVIDIA Inception. Without the community support, platform for compute, and prizes… the whole thing would have sputtered to a halt.
Our Favorite Vector Search
While every submission was fantastic, we picked just a few to highlight, covering 3 main themes:
Contextualize scientific researcher workflows with AI-powered recommendations directly in the browser.
Expose scholarly article search in entirely novel ways in a web app.
Command Line Tools
Browse paper recommendations directly in the terminal.
Two teams, “Hackunamadata” and “Simpa”, built intelligent browser extensions to assist researchers while they browse scholarly articles. This work falls into the hot category of “AI-based tooling”. While they are similar solutions, each approached it from a different angle. Both aim to make the research and writing process more efficient (this was a common theme).
Simpa works as a chrome extension for the paperswithcode.com site. Browsing papers becomes an enjoyable experience with the help of the paper recommendation view (see below). The extension provides further context by utilizing the “5Ws and 1H” (What, Why, When, Where, Who, and How). This makes it both useful, and also more explainable.
The name alone is an incredible contribution from team Hackunamadata. arXiv Copilot, playing off the infamous “GitHub Copilot” brand, is a chrome extension that provides scholarly article recommendations based on the current visible text in a Google Doc. The goal with the paper recommendations is to provide relevant sources for scientific claims or thesis, in real-time, as an author is writing.
The team also put together a short demo. Take a look here!
Several teams took the provided backend API and extended it to show novel ways to visually explore the data. We really LOVE these because it shows off the power of vector search in very eye-popping ways.
Darwinian Paper Explorer
Team “AreYouRedis” impressed us with their creative take on a new user-interface design and concept for exploring scholarly articles.
Beyond just vector search for finding similar papers, they include a way to study the trends of user-queried paper topics over time, including future forecasts.
Building on that, they illustrate how a subject of interest evolves throughout history with an arc diagram. It highlights the founding / most influential papers and co-citation relationships between papers.
For the topic “boosting” there are generally two foundational papers: XGBOOST and CATBOOST. Both receive the highest amount of citations moving forward.. as we expect!
Another knock-out team name… “Untitled1.ipynb” contributed some significant UI enhancements to the starting demo and improved the performance by fine-tuning the sentence transformer model from HuggingFace (checkout their blog for more info on this).
They also allow the user to input multiple papers simultaneously, filter by categories in different modes, and share the results with a “share sheet” button.
Team “RedisPlayerOne” assembled an intelligent application playing off the historical figure “Yves Saint Laurent”. This app provides a paper search UI with a Q&A function. However, as the team admitted, it will provide answers, though… “probably not the right one, but you might be surprised”.
Command Line Tools
Team “THM” took on a different approach to expose scholarly paper recommendations without a graphical user interface (GUI). They built a multi-purpose CLI tool that empowers researchers with the ability to search by keyword or vector similarity, generate citations on-demand, and perform Q&A. Checkout this video demo!
They provided a slick architecture diagram that demonstrates how the application is assembled.
They also made a flow diagram to get a feel for using their CLI.
Lastly, they also put together an impressive blog that logs activity throughout the hackathon. This source of information is so rich and useful for dev teams as they build software. We might just adopt a similar kind of tool here at Redis!
Vector Search Takeaways
With some of our favorites above, take a look at all of the submissions; judge for yourself! A few thoughts to wrap up our brief review of the first Redis Vector Search Engineering Lab:
- There’s a Cambrian explosion of novel AI apps occurring, underpinned by the progress made in NLP and generative AI.
- Much of this will be based on embedding-based representations, indexing, and search of unstructured data. Low latency and deployment cost sensitivity will be key for wider adoption.
- Redis, HuggingFace, and many other open-source tools are lowering the barrier for organizations to create new AI-based tooling with Vector Similarity Search. Best part? There’s more to come and new players every day.
Want to learn more?
Though the Engineering Lab is behind us, there are still opportunities to learn more about vector similarity search and how to use it in your business applications! Here are 3 ways below:
Partnership announced with RelevanceAI
Hot off the press! Redis is pleased to share our newest commercial partnership with RelevanceAI. The focus is on democratizing access to vector search applications and use cases. Relevance AI enables teams across an organization to process and analyze unstructured data like text and images, and it’s all powered by Redis . This is a great option to bridge the technical divide between end-users who need insights from vector search without needing engineers to build it out.
Check out the following demos!