📚 Book Recommendation System Using OpenAI Embeddings And Nomic Atlas Visualization

Preetam Kumar
06-09-2025

📚 Book Recommendation System Using OpenAI Embeddings And Nomic Atlas Visualization

We’ll build a semantic book recommendation system powered by OpenAI text embeddings, and visualize the results using Nomic Atlas — a powerful tool for interactive visualization of high-dimensional data.

We'll walk through:

Preparing the dataset (📁 from Kaggle)
Generating embeddings with OpenAI API
Performing semantic search using cosine similarity
Visualizing book relationships using Nomic Atlas

📥 Step 1: Download the Dataset

We’ll use the Goodreads Books dataset available on Kaggle:
👉 Download from Kaggle

Once downloaded, load it into a pandas DataFrame:


import pandas as pd

df_final = pd.read_csv('./goodreads_data.csv')
df_final.dropna(inplace=True)

🧠 Step 2: Estimate Embedding Token Cost

Before generating embeddings, it’s good to estimate token usage and cost:


import tiktoken

enc = tiktoken.encoding_for_model('text-embedding-ada-002')
description = list(df_final['Description'])

total_tokens = sum([len(enc.encode(item)) for item in description])
print(f'Total tokens : {total_tokens}')

cost = total_tokens * (0.0004 / 1000)
print(f'Estimated cost in dollar : {cost:.10f}')

This uses OpenAI's text-embedding-ada-002 which costs $0.0004 per 1K tokens.

🤖 Step 3: Generate Embeddings Using OpenAI

Set up OpenAI's new SDK interface:


import openai

client = openai.OpenAI(api_key="your-openai-api-key")

def get_embedding(text, model="text-embedding-ada-002"):
    text = str(text).replace("\n", " ")
    response = client.embeddings.create(input=[text], model=model)
    return response.data[0].embedding

Apply it to your DataFrame and save the results:


def get_embeddings_and_save_to_csv(embedding_cache_file):
    df_final['embedding'] = df_final['Description'].apply(lambda x: get_embedding(x))
    df_final.to_csv(embedding_cache_file, index=False)

embedding_cache_file = 'book_embedding.csv'
get_embeddings_and_save_to_csv(embedding_cache_file)

🔁 Step 4: Load Embeddings for Recommendation


df_embeddings = pd.read_csv('book_embedding.csv')

# Convert from string to actual vector
import numpy as np
df_embeddings['embedding'] = df_embeddings['embedding'].apply(eval).apply(np.array)

🎯 Step 5: Recommend Books Using Cosine Similarity

Define cosine similarity and a recommendation function:


def cosine_similarity(vec1, vec2):
    vec1 = np.array(vec1).flatten()
    vec2 = np.array(vec2).flatten()
    dot = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot / (norm1 * norm2) if norm1 and norm2 else 0.0

def get_recommendation_from_title(df_embeddings, title, k=5):
    if title not in df_embeddings['Book'].values:
        print(f"Book '{title}' not found.")
        return []

    book_embedding = df_embeddings[df_embeddings['Book'] == title]['embedding'].iloc[0]

    df_embeddings['similarity'] = df_embeddings['embedding'].apply(
        lambda emb: cosine_similarity(book_embedding, emb)
    )

    recommendations = df_embeddings[df_embeddings['Book'] != title] \
        .sort_values(by='similarity', ascending=False) \
        .head(k)

    return recommendations[['Book', 'similarity']]

🔍 Example:


get_recommendation_from_title(df_embeddings, 'War and Peace', 5)

🌍 Step 6: Visualize Embeddings with Nomic Atlas

Install Nomic:


pip install nomic

Prepare the data and embeddings for visualization:


from nomic import atlas

df_visualize = pd.read_csv('book_embedding.csv')
df_visualize['embedding'] = df_visualize['embedding'].apply(eval).apply(np.array)

data = df_visualize[['Book', 'Author', 'Genres']].to_dict(orient='records')
embeddings = np.array(df_visualize['embedding'].tolist())

Now send it to Atlas:


project = atlas.map_data(
    data=data,
    embeddings=embeddings
)

project.name = "Books"
project.save()

After a few moments, Atlas will give you a link to explore your interactive book map — similar books will cluster together!

📌 What You’ve Built

✔ A semantic book search that understands meaning, not just keywords
✔ Recommender system using OpenAI embeddings
✔ Interactive map of books using Nomic Atlas
✔ End-to-end pipeline from dataset → embeddings → visualization

💡 Final Thoughts

Semantic search opens up new ways to explore data. With just a few tools:

OpenAI for embeddings
NumPy for math
Pandas for data handling
Nomic Atlas for beautiful visualization

…you can build real-world AI systems in just a few lines of code.

🔗 Resources

OpenAI Embeddings Docs
Nomic Atlas
Goodreads Dataset on Kaggle

Related Concept:

📚 Book Recommendation System Using OpenAI Embeddings And Nomic Atlas Visualization

📚 Book Recommendation System Using OpenAI Embeddings And Nomic Atlas Visualization

📚 Book Recommendation System Using OpenAI Embeddings And Nomic Atlas Visualization

📥 Step 1: Download the Dataset

🧠 Step 2: Estimate Embedding Token Cost

🤖 Step 3: Generate Embeddings Using OpenAI

🔁 Step 4: Load Embeddings for Recommendation

🎯 Step 5: Recommend Books Using Cosine Similarity

🔍 Example:

🌍 Step 6: Visualize Embeddings with Nomic Atlas

📌 What You’ve Built

💡 Final Thoughts

🔗 Resources

Related Concept:

📚 Book Recommendation System Using OpenAI Embeddings And Nomic Atlas Visualization

🤖How OpenAI + MCP Servers Can Power the Next Generation of AI Agents for Automation

Semantic Search Using Text Embeddings (With ChatGPT + Python)

📚 Book Recommendation System Using OpenAI Embeddings And Nomic Atlas Visualization

🤖How OpenAI + MCP Servers Can Power the Next Generation of AI Agents for Automation

Semantic Search Using Text Embeddings (With ChatGPT + Python)

MySQL

Multiplexing in Mobile Computing

Introduction to Compiler

New DNA computer assesses water quality

5G services to roll out in four metros, selected cities in 2022

Xiaomi 13 Pro

MySQL Create User

Mobile Communication

Analyzing Algorithm Control Structure

Asymptotic Analysis of algorithms

🔹 Understanding UML Relationships in Java with Examples

Polymorphism, Abstraction, and Runtime Binding in Java with a Payment Example

Throughput and Latency

Sort array containing only 0 1 and 2

Sort array containing only 0 1 and 2

MySQL Create User

Find subarray with given sum

MySQL Drop User

JavaScript - Syntax

Find subarray with given sum