Integrating OpenAI with Qdrant and WordPress: A Comprehensive Guide to RAG Semantic Search
Integrating OpenAI with Qdrant and WordPress: A Comprehensive Guide to RAG Semantic Search

Introduction

Hey there! Ever wondered how cool it would be to have a super-smart chatbot that can search through your WordPress posts and give meaningful answers? Well, you’re in the right place! In this guide, we’ll walk you through the steps to create an intelligent chatbot using OpenAI and Qdrant with your WordPress content as the data source.

Imagine having a bot that can not only understand your queries but also dive deep into your WordPress posts to fetch relevant information. That’s what we’re going to build today. So, grab a cup of coffee, and let’s get started!

What We’ll Cover

  1. Listing posts from WordPress.
  2. Upserting WordPress post embeddings to Qdrant.
  3. Searching Qdrant based on the query using Qdrant’s Fastembed.
  4. Querying the OpenAI chatbot with the provided RAG context.
  5. Wrapping it all up with a neat FAQ section.

Step 1: List Posts from WordPress

First things first, we need to get our hands on the WordPress posts. We’ll use the WordPress REST API to fetch the posts. If you’re new to APIs, don’t worry. It’s just a way for different software to talk to each other.

Getting a WordPress Application Password and Storing Secrets in Google Colab

To fetch and interact with your WordPress posts programmatically, you’ll need an application password. This password allows your script to authenticate with the WordPress REST API. Once you have the application password, you can securely store your WordPress credentials in Google Colab using its secrets management feature.

Step-by-Step Guide to Getting a WordPress Application Password

  1. Log in to your WordPress Admin Dashboard: Go to your WordPress site and log in with an admin account.
  2. Navigate to the Users Section: From the dashboard, click on “Users” in the left-hand menu. This will display a list of all users registered on your site.
  3. Edit Your Profile: Find your username in the list and click “Edit” under your name. This will open your profile page.
  4. Generate a New Application Password:
    • Scroll down to the “Application Passwords” section.
    • Enter a name for your application (e.g., “Google Colab Integration”).
    • Click “Add New Application Password”.
  5. Copy the Generated Password: A new application password will be generated and displayed. Make sure to copy this password and store it securely. Once you navigate away from the page, you won’t be able to see the password again.

Storing Necessary Secrets in Google Colab

Google Colab allows you to securely store and use secrets using the secrets feature. This ensures your sensitive data, like passwords, is not exposed in your notebook.

Step-by-Step Guide to Storing Secrets in Google Colab

  1. Open Google Colab: Go to Google Colab and open a new or existing notebook.
  2. Access the Secrets Management Interface:
    • In your notebook, click on the “Files” tab on the left sidebar.
    • Click on the “Secrets” button (key icon).
    • You will see an interface to add new secrets.
  3. Add Your Secrets:
    • Click on the “Add secret” button.
    • Add the following secrets one by one:
      • Key: WP_SITE_URL, Value: https://your-wordpress-site.com (Replace with your WordPress site URL).
      • Key: WP_USERNAME, Value: your-username (Replace with your WordPress admin username).
      • Key: WP_PASSWORD, Value: your-application-password (Replace with the application password you generated).
  4. Accessing Secrets in Your Notebook:
    • Use the following code snippet to access the stored secrets in your Colab notebook:
import os

# Access stored secrets
wp_site_url = os.environ.get('WP_SITE_URL')
base_url = f'{wp_site_url}/wp-json/wp/v2'
wp_username = os.environ.get('WP_USERNAME')
wp_password = os.environ.get('WP_PASSWORD')

print("WordPress Site URL:", wp_site_url)
print("WordPress Username:", wp_username)
# Note: Do not print passwords in real applications, this is just for demonstration
print("WordPress Password:", wp_password)
  1. Using Secrets to Authenticate with WordPress API:
    • Now you can use these secrets to make authenticated requests to your WordPress site:

Conclusion

By following these steps, you ensure that your WordPress credentials are securely stored and used in your Google Colab environment. This allows you to safely interact with your WordPress site without exposing sensitive information in your code. Secure handling of credentials is crucial for maintaining the integrity and security of your applications.

Getting Started with WordPress API

To interact with your WordPress site, we’ll use Python’s requests library. If you’re not familiar with it, think of it as a way to ask the internet for information politely.

Code Snippet

posts_endpoint = f"{base_url}/posts"

# Initialize variables
per_page = 10
offset = 0
all_posts = []

while True:
    # Request tags with the current offset
    response = requests.get(posts_endpoint, params={"per_page": per_page, "offset": offset})

    # Check if the request was successful
    if response.status_code != 200:
        print(f"Failed to fetch posts: {response.status_code}")
        break

    # Get the posts from the response
    posts = response.json()

    # If no more posts are returned, break the loop
    if not posts:
        break

    # Add the fetched posts to the list of all posts
    all_posts.extend(posts)

    # Increment the offset for the next request
    offset += per_page

print(f'{len(all_posts)} retrieved')
# all_posts

Explanation

  • wordpress_site: Your WordPress site URL.
  • wp_endpoint: The endpoint for fetching posts.
  • response: The response from the API request.
  • posts: The JSON data from the response.

Running this will list all your WordPress post titles. Simple, right?

Step 2: Upsert WordPress Post Embeddings to Qdrant

Next, we’ll convert these posts into embeddings and store them in Qdrant. Embeddings are numerical representations of text that make it easier for computers to understand and compare.

Setting Up Qdrant

Qdrant is a vector search engine. Think of it as a smart database that can understand and search through text efficiently. You can start free with a free tier cloud instance with 1 GB of storage.

Code Snippet

First, install the necessary libraries:

!pip install requests qdrant-client fastembed

Now, load the FastEmbed TextEmbedding model:

from fastembed import TextEmbedding
from typing import List

# Example list of documents
# documents: List[str] = [
#     "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
#     "fastembed is supported by and maintained by Qdrant.",
# ]

# This will trigger the model download and initialization
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")

# embeddings_generator = embedding_model.embed(documents)  # reminder this is a generator
# embeddings_list = list(embedding_model.embed(documents))
#   # you can also convert the generator to a list, and that to a numpy array
# len(embeddings_list[0]) # Vector of 384 dimensions

Embed the documents, this may take several minutes if you have a lot of posts.

import json

documents = [json.dumps(post) for post in all_posts]

batch_size = 10
total_batches = len(documents) // batch_size + 1

pbar = tqdm(total=len(documents), desc="Generating embeddings")

# Generate embeddings in batches to improve performance
embeddings = []
for i in range(total_batches):
    start_idx = i * batch_size
    end_idx = min((i + 1) * batch_size, len(documents))
    batch = documents[start_idx:end_idx]
    
    batch_embeddings = embedding_model.embed(batch, batch_size=batch_size)
    embeddings.extend(batch_embeddings)
    pbar.update(len(batch))
    
pbar.close()

Create a DataFrame holding all posts. We will process this DataFrame later.

import pandas as pd
from tqdm.notebook import tqdm

tqdm.pandas()
df = pd.DataFrame(all_posts)

Generate PointStruct objects based on the documents and their embeddings:

import os
from qdrant_client import QdrantClient
from qdrant_client.http import models
from qdrant_client.http.models import PointStruct
from qdrant_client.http.models import Distance, VectorParams


def generate_points_from_dataframe(df: pd.DataFrame) -> List[PointStruct]:
    # Convert embeddings to list of lists
    embeddings_list = [embedding.tolist() for embedding in embeddings]
    
    # Create a temporary DataFrame to hold the embeddings and existing DataFrame columns
    temp_df = df.copy()
    temp_df["embeddings"] = embeddings_list
    # temp_df["id"] = temp_df.index
    
    # Generate PointStruct objects using DataFrame apply method
    points = temp_df.progress_apply(
        lambda row: PointStruct(
            id=row["id"],
            vector=row["embeddings"],
            payload={
                'content': row['content']['rendered'],
                'metadata': {
                    'id': row['id'],
                    "slug": row["slug"],
                    "title": row["title"]['rendered'],
                    'link': row['link'],
                    'excerpt': row['excerpt']['rendered'],
                }
            },
        ),
        axis=1,
    ).tolist()

    return points

points = generate_points_from_dataframe(df)

Upsert WordPress Posts with Embeddings to Qdrant

from google.colab import userdata
from qdrant_client import QdrantClient


qdrant_client = QdrantClient(
    url=userdata.get("QDRANT_URL"), api_key=userdata.get("QDRANT_API_KEY"), timeout=6000, prefer_grpc=True
)

collection_name = "soluvas-website-posts"
# Create the collection, run this only once
qdrant_client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)
operation_info = qdrant_client.upsert(
    collection_name=collection_name, wait=True, points=points
)
print(operation_info)

You have the code that fetches your WordPress posts, converts them to embeddings, and stores them in Qdrant. Now your posts are ready for fast and smart searching!

Step 3: Search Qdrant Based on the Query

Now that we have our data in Qdrant, let’s see how we can search through it using a query.

Searching with Fastembed

We’ll use Fastembed again to convert our query into an embedding and then search Qdrant.

Code Snippet

# query = 'How to create chart?'
# query = 'How to create a  Figma frame in Looker Studio?'
# query = 'How to use PostgreSQL database in Looker Studio?'
# query = 'How to install ERPNext?'
# query = 'How to Install Frappe/Erpnext Development Environment using Docker for Windows?'
# query = 'How to create a new ERPNext app'
# query = 'what is the best project management tool?'
# query = 'who are you?'
# query = 'what can you do for me?'
query = 'what Flowlu is good at?'
query_embedding = list(embedding_model.embed([query]))[0].tolist()

# Query Qdrant for relevant text chunks
q1 = qdrant_client.search(
    collection_name=collection_name,
    query_vector=query_embedding,
    with_payload=True,
    limit=1,
)
# q1[0].payload.keys()
q1[0].payload['metadata']['title']

Explanation

  • query: The query you want to search for.
  • embedding_model.embed([query]): Converts the query text into an embedding.
  • qdrant_client.search: Searches Qdrant for the closest embeddings to the query.

This will return the most relevant posts from your WordPress data based on the query. How cool is that?

Step 4: Query OpenAI Chatbot with the Provided RAG Context

Finally, we’ll use the search results to provide context for an OpenAI chatbot. This will allow the chatbot to answer queries more accurately by leveraging relevant content from your WordPress posts.

Setting Up OpenAI Chatbot

Now, we’ll pass the retrieved content to OpenAI’s GPT-3.5-turbo model. You can also use GPT-4o model if you like.

Code Snippet

from google.colab import userdata
from openai import OpenAI

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

client = OpenAI(
    # defaults to os.environ.get('OPENAI_API_KEY')
    api_key=OPENAI_API_KEY,
)
print(f'Query: {query}')

prompt = f"""Can you answer the following question based on the given context? If you can, then also provide link. If you cannot answer it, say, I don't know.
            
QUESTION: {query}
            
START OF CONTEXT

Title: {context.payload['metadata']['title']}

Slug: {context.payload['metadata']['slug']}

Link:  {context.payload['metadata']['link']}

Excerpt: {context.payload['metadata']['excerpt']}

```html
{context.payload['content']}
```

END OF CONTEXT

ANSWER:"""

completion = client.chat.completions.create(
    model='gpt-3.5-turbo',
    messages=[
        # {"role": "system", "content": "You are Soluvas, a helpful assistant. Answer the user's question directly. Additionally, provide a relevant link if possible."},
        {"role": "system", "content": "You are Soluvas, a helpful assistant."},
        {
            "role": "user",
            "content": prompt,
        },
    ],
)

print(completion.choices[0].message.content)

Explanation

  • context: Combines the content of the top search results.
  • prompt: Constructs a prompt for the OpenAI model with the context and the query.
  • openai.ChatCompletion.create: Sends the prompt to OpenAI’s GPT-3.5-turbo model and gets the response.

This code makes the chatbot more intelligent by giving it specific context from your WordPress posts, ensuring it provides accurate and relevant answers.

Example Queries and Answers

Query: How to create a chart?
To create a chart using Google Sheets and Looker Studio, you can follow these steps:

  1. Prepare your data in a Google Sheets spreadsheet in a table format.
  2. Go to Looker Studio and select “Blank Report.”
  3. Add your Google Sheets file as a data source, ensuring to use the first row as headers.
  4. Add the data to the report and create a new chart representing your data.
  5. To create a Stacked Bar Chart:
    • Click on “Add a chart” in the toolbar.
    • Choose Bar > Stacked column chart.
    • Set the Dimension to “Date” and configure other settings as desired.

For a detailed walkthrough with visuals, you can visit the link provided: How to Use Google Sheets and Looker Studio to Track Your Website or Business KPIs

Query: what Flowlu is good at?
Flowlu is good at managing projects, tracking workloads, prioritizing tasks, accessing a CRM for handling sales funnels, creating invoices, automating billing, tracking revenue, and optimizing expenses. It is best for small- to medium-sized businesses in creative, consulting, and IT industries looking to centralize their processes.

For more information, you can visit the following link: Flowlu Review: Comprehensive Project and Customer Management Platform

Query: How to install ERPNext?
To install ERPNext using Docker on Windows, you can follow the detailed instructions provided in the tutorial at the following link: How to Install Frappe/ERPNext Development Environment using Docker for Windows.

Conclusion

And there you have it! We’ve built a powerful, context-aware chatbot that can search through WordPress posts using Qdrant and answer queries intelligently with OpenAI. This setup can be incredibly useful for creating advanced support systems, content recommendation engines, or any application where understanding and retrieving specific information from a large corpus of text is essential.

Remember, the key steps are:

  1. Fetching posts from WordPress.
  2. Storing embeddings in Qdrant.
  3. Searching Qdrant with a query.
  4. Using the search results as context for the OpenAI chatbot.

Feel free to customize and expand upon this setup to fit your specific needs. Happy coding!

FAQ

Q: What is Qdrant?

A: Qdrant is a vector search engine that helps store and search through text embeddings efficiently. It’s great for applications that require semantic search capabilities.

Q: What are embeddings?

A: Embeddings are numerical representations of text. They capture the semantic meaning of the text and make it easier for machines to compare and understand different pieces of text.

Q: Why use Fastembed with Qdrant?

A: Fastembed provides a quick and efficient way to generate embeddings, which can then be stored and searched using Qdrant. This combination is powerful for creating intelligent search systems.

Q: How does OpenAI’s GPT-3.5-turbo / GPT-4o fit into this?

A: OpenAI’s GPT-3.5-turbo / GPT-4o is used to process the context retrieved from Qdrant and provide intelligent, context-aware responses. It enhances the chatbot’s ability to understand and answer queries accurately.

Q: Can I use other models for generating embeddings?

A: Yes, you can use other models like OpenAI’s text-embedding-ada-002 or any other embedding model you prefer. Just make sure it integrates well with your search engine setup.

Q: Is this setup scalable?

A: Absolutely! Both Qdrant and OpenAI are designed to handle large datasets and high traffic. You can scale your setup as needed to accommodate more data

Similar Posts

Disclosure: We may get a small commission if you buy certain products linked in this article. However, our opinions are our own and we only promote the products and services that we trust.