Traditional keyword search and its limitations Imagine you're lost in the vast expanse of the internet, trying to find a nugget of information hidden somewhere among the billions of websites out there. Earlier, you would have had to depend on conventional keyword-based search engines to retrieve this information. These engines would lookup through a corpus of documents and retrieve results that contained one or more keywords from your search query. This method has certain limitations which we will review next. Loss of semantic meaning Suppose you enter "raining cats and dogs" in the search bar. The keyword-based search engines provided us with all the documents that contained the words "cats," "raining," "and," and "dogs," including various cat images, rain forecasts, and so forth. However, you want to understand the meaning of an idiomatic expression referring to "heavy rainfall" in this context. Although there may be some relevant results, this approach generates a large number of false positives. It poses a serious issue because the semantic relation between words of the search query is not being determined.
Faster query times: When searching for similar documents in a large dataset, traditional database systems may require scanning every document to compute the similarity between the query and each document. However, a vector database can perform such searches much faster because it uses Approximate Nearest Neighbour algorithms that allow for fast distance calculations.
Lower memory usage: Traditional databases may store documents in their original text form, which can require a lot of storage space. However, by using vector embeddings, we can represent documents in a much more compact form that requires less storage space. This not only saves disk space but also reduces the amount of memory required to perform similarity searches. Better scalability: As the amount of data grows, traditional database systems may struggle to maintain fast query times and efficient memory usage. Vector databases, on the other hand, are designed to handle large amounts of vector data and can scale efficiently with increasing data volume.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.