Weβll build a semantic book recommendation system powered by OpenAI text embeddings, and visualize the results using Nomic Atlas β a powerful tool for interactive visualization of high-dimensional data.
We'll walk through:
Preparing the dataset (π from Kaggle)
Generating embeddings with OpenAI API
Performing semantic search using cosine similarity
Visualizing book relationships using Nomic Atlas
Weβll use the Goodreads Books dataset available on Kaggle:
π Download from Kaggle
Once downloaded, load it into a pandas DataFrame:
Before generating embeddings, itβs good to estimate token usage and cost:
This uses OpenAI's text-embedding-ada-002
which costs $0.0004 per 1K tokens.
Set up OpenAI's new SDK interface:
Apply it to your DataFrame and save the results:
Define cosine similarity and a recommendation function:
Install Nomic:
Prepare the data and embeddings for visualization:
Now send it to Atlas:
After a few moments, Atlas will give you a link to explore your interactive book map β similar books will cluster together!
β A semantic book search that understands meaning, not just keywords
β Recommender system using OpenAI embeddings
β Interactive map of books using Nomic Atlas
β End-to-end pipeline from dataset β embeddings β visualization
Semantic search opens up new ways to explore data. With just a few tools:
OpenAI for embeddings
NumPy for math
Pandas for data handling
Nomic Atlas for beautiful visualization
β¦you can build real-world AI systems in just a few lines of code.