Vector database


Hello guys, In the rapidly evolving world of ai, managing and searching through high-dimensional data efficiently is becoming a critical challenge. This is where Vector Databases step in as a powerful solution.

Earlier, we have talked about The Complete AI and LLM Engineering Roadmap, RAG fundamentals, and 10 Must Read AI and LLM Engineering Books and today’s article we will discuss what is Vector Database and why do we need this.

For this, I have partnered with Rajendra Uppal, an Engineering Manager, Software Architect, and Thought Leader and an IIT Delhi alumnus and the author behind the popular Kite newsletter, unpacks the fundamentals of vector databases.

Rajendra shares practical insights on why vector databases matter, how they power AI applications like semantic search and recommendation systems, and what you need to know as a developer or architect to leverage them effectively.

Let’s dive into this essential building block for modern AI systems.

What is a Vector Database? (The Simple Answer)

Imagine you’re organizing a massive library, but instead of sorting books alphabetically, you organize them by how similar they are to each other.

Romance novels go near each other, science fiction clusters together, and cookbooks sit in their own section.

A vector database does something similar, but with any kind of data - text, images, audio, you name it.

Let’s Start with Vectors (Don’t Worry, It’s Simple!)

What’s a vector? Think of it as a list of numbers that describes something. Like a recipe for describing characteristics.

Toy Example: Let’s say I want to describe different fruits:

  • Apple: [5, 2, 8] (sweetness=5, sourness=2, size=8)
  • Lemon: [1, 9, 3] (sweetness=1, sourness=9, size=3)
  • Orange: [6, 4, 7] (sweetness=6, sourness=4, size=7)

See how each fruit becomes a list of numbers? That’s a vector!

Why Do We Need This?

Traditional databases are like filing cabinets - you need to know exactly what drawer to open.

If I ask a regular database “find me something like an apple,” it has no clue what “like” means.

But with vectors, I can ask: “Find me fruits similar to [5, 2, 8]” and it can calculate that oranges [6, 4, 7] are pretty close!

Real-World Example: Netflix Recommendations

When Netflix suggests movies, it’s not just looking at genres. It creates a vector for each movie based on hundreds of characteristics:

  • Movie: “The Matrix” → [0.8, 0.2, 0.9, 0.1, 0.7…] (action=0.8, romance=0.2, sci-fi=0.9, comedy=0.1, etc.)

When you like The Matrix, Netflix finds other movies with similar vectors. That’s why you get Blade Runner, not The Notebook!

How Does “Similarity” Work?

Think of it like measuring distance between points on a map. If two vectors are “close” in this mathematical space, the things they represent are similar.

Simple example:

  • Cat: [4, 1, 8, 9] (fluffy=4, barks=1, small=8, pet=9)
  • Dog: [3, 9, 7, 9] (fluffy=3, barks=9, small=7, pet=9)
  • Lion: [5, 2, 1, 2] (fluffy=5, barks=2, small=1, pet=2)

Cat and Dog are more similar to each other than either is to Lion, even though cats and lions are both felines!

The Magic: How Do We Get These Vectors?

This is where it gets cool. We use AI models (like the ones behind ChatGPT) to automatically convert things into vectors:

  • Text: “I love pizza” → [0.23, -0.45, 0.78, 0.12, …]
  • Images: Photo of a sunset → [0.67, 0.34, -0.12, 0.89, …]
  • Audio: Beatles song → [0.45, -0.23, 0.56, 0.78, …]

The AI learns patterns from millions of examples and creates these numerical descriptions automatically.

Real Applications You Use Daily

Google Search: When you search “cute puppy videos,” Google converts your query into a vector and finds web pages with similar vectors.

Spotify: Creates vectors for songs based on rhythm, genre, mood, etc. That’s how Discover Weekly works!

ChatGPT: When you ask a question, it converts your question to a vector and finds the most relevant information from its training.

Photo Apps: When you search “beach” in your photos, the app finds pictures with vectors similar to typical beach scenes.

Why Not Just Use Regular Databases?

Let me give you a concrete example:

Traditional Database Query: “Find all customers named John”

  • Perfect! Exact matches.

Vector Database Query: “Find customers similar to this one who might like our new product”

  • Traditional database: 🤷‍♀️ “I don’t know what ‘similar’ means”
  • Vector database: 💡 “Here are customers with similar purchase patterns, demographics, and behavior!”

The Technical Magic (Simplified)

  1. Convert everything to vectors using AI models
  2. Store vectors in a special database optimized for similarity search
  3. When querying: Convert your question to a vector too
  4. Find similar vectors using mathematical distance calculations
  5. Return the original data that those similar vectors represent

Think of it like having a super-smart librarian who understands the essence of what you’re looking for, not just the exact words you use.

That’s all guys, if you like this explanation of Vector Database don’t forget to subscribe to Rajendra’s substack, kite, he share interesting article like this there.

All the best with your AI journey.