Build a Retrieval-Augmented Generation Chatbot using pgvector
33 min

Introduction
The adoption of AI driven chatbots is accelerating in order to respond quickly and save the effort of manually answering each question. Whether performing searches over documents or helping users with queries on a product, AI is used everywhere.
But what goes into configuring these chatbots so that they understand unique content? How are they able to generate relevant answers to user queries? Retrieval-Augmented Generation is an AI framework that allows LLMs to extend their existing knowledge based on the unique content passed from external sources.
In this tutorial, you will be use OpenAI's embedding API alongside pgvector, an open-source vector similarity search extension for PostgreSQL, to create and deploy a RAG chatbot on Koyeb. It will be able to generate relevant responses with AI while continuously updating its knowledge base in real-time.
You can deploy the Retrieval-Augmented Generation chatbot as configured in this guide using the Deploy to Koyeb button below:
Note: You will need to replace the values of the environment variables in the configuration with your own REPLICATE_TOKEN, OPENAI_API_KEY and POSTGRES_URL. Remember to add the ?sslmode=require parameter to the POSTGRES_URL value.
Requirements
To successfully follow this tutorial, you will need the following:
- Node.js and npminstalled. The demo app in this tutorial uses version 18 of Node.js.
- Git installed.
- An OpenAI account.
- A Replicate account.
- A Koyeb account to deploy the application.
Steps
To complete this guide and deploy the Retrieval-Augmented Generation chatbot, you'll need to follow these steps:
- Generate the Replicate API token
- Generate the OpenAI API token
- Create a PostgreSQL database on Koyeb
- Create a new Remix application
- Add Tailwind CSS to the application
- Create vector embeddings of a text using OpenAI and LiteLLM
- Add seed data to the database
- Build the components of our application
- Define the Remix application routes
- Build the homepage as the chatbot interface
- Build the chat API endpoint
- Deploy the Remix app to Koyeb
Generate the Replicate API token
HTTP requests to the Replicate API require an authorization token. To generate this token, log in to your Replicate account and navigate to the API Tokens page. Enter a name for your token and click the Create token button to generate a new token. Copy and securely store this token for later use as REPLICATE_API_TOKEN environment variable.
Locally, set and export the REPLICATE_API_TOKEN environment variable by executing the following command:
export REPLICATE_API_TOKEN="<YOUR_REPLICATE_TOKEN>"Generate the OpenAI API token
HTTP requests to the OpenAI API require an authorization token. To generate this token, log in to your OpenAI account and navigate to the API Keys page. Enter a name for your token and click the Create new secret key button to generate a new key. Copy and securely store this token for later use as OPENAI_API_KEY environment variable.
Locally, set and export the OPENAI_API_KEY environment variable by executing the following command:
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"Create a PostgreSQL database on Koyeb
To create a PostgreSQL database, log in to the Koyeb control panel and navigate to the Databases tab. Next, click on the Create Database Service button. Here, either accept or replace the default generated name, choose your preferred region, and confirm or customize the default role. When you are ready, click the Create Database Service button to provision your PostgreSQL database service.
Once you've created the database service, a list of your existing database services will be displayed. From there, select the newly created database service, copy the database connection string, and securely store it for later use as the POSTGRES_URL environment variable.
Create a new Remix application
To start building the application, create a new Remix project. Open your terminal and run the following command:
npx create-remix@latest rag-chatbotnpx allows us to execute npm packages binaries (create-remix in our case) without having to first install them globally.
When prompted, choose:
- Yeswhen prompted to initialize a new git repository
- Yeswhen prompted to install the npm dependencies
Once the installation is done, you can move into the project directory and start the app. In another terminal window, start the development server by typing:
cd rag-chatbot
npm run devThe app should be running on localhost:3000. Currently, it just displays a welcome page with some links to documentation. We are going to leave this running as we continue to build the app.
Note: According to one of the Remix Decisions, using .server in the filename is the only guaranteed way to exclude code from the client. You'll see later how we create the database connection and seed the data using server only modules.
Next, in your first terminal window, run the command below to install the necessary libraries and packages for building the application:
npm install pg pgvector
npm install -D dotenv tsx @types/pgThe above command installs the packages passed to the install command, with the -D flag specifying the libraries intended for development purposes only.
The libraries installed include:
- pg: A PostgreSQL client for Node.js.
- pgvector: A vector similarity search library for Node.js.
The development-specific libraries include:
- @types/pg: Type definitions for pg.
- tsx: To execute and rebuild TypeScript efficiently.
- dotenv: A library for handling environment variables.
Add Tailwind CSS to the application
For styling the app, we will be using Tailwind CSS. Install and set up Tailwind at the root of our project's directory by running:
npm install -D tailwindcssNext, run the init command to create tailwind.config.ts:
npx tailwindcss init --tsNext, we need to make use of Tailwind directives in our CSS file. Directives are custom Tailwind-specific at-rules that offer special functionalities for Tailwind CSS projects.
Create a tailwind.css file in the app directory, and add the snippet below in it:
/* File: app/tailwind.css */
@tailwind base;
@tailwind components;
@tailwind utilities;Tailwind scans our HTML, JavaScript/Typescript components, and any other template files for class names, and then generates all of the corresponding CSS for those styles. We need to configure our template paths so that Tailwind can generate all of the CSS we need by updating the content array of tailwind.config.ts as below:
// File: tailwind.config.ts
import type { Config } from 'tailwindcss'
export default {
  content: [], 
  content: ['./app/**/*.{ts,tsx,js,jsx}'], 
  theme: {
    extend: {},
  },
  plugins: [],
} satisfies ConfigLastly, you'll import and use the compiled app/tailwind.css inside app/root.tsx. Make the following changes to the default root.tsx file to finish setting up Tailwind with your Remix app:
// File: app/root.tsx
 import { cssBundleHref } from "@remix-run/css-bundle"; 
 import stylesheet from '~/tailwind.css'
. . .
export const links: LinksFunction = () => [
  ...(cssBundleHref ? [{ rel: "stylesheet", href: cssBundleHref }] : []), 
  { rel: 'stylesheet', href: stylesheet } 
];
. . .Create vector embeddings of a text using OpenAI and LiteLLM
OpenAI provides an embeddings API to generate vector embeddings of a text string. Among multiple models offered is the text-embedding-3-small, which is the newest and performant embedding model. By default, the length of the embedding vector will be 1536. You'll use this model to generate embeddings for the seed data to be added to the database. You'll use the litellm package to call OpenAI embedding models.
In your terminal window, execute the following to install LiteLLM:
npm install litellmTo create vector embeddings of a text, you'll simply use the asynchronous embedding method from litellm with text-embedding-3-small as the model name. For example, you could obtain the vector embedding with the flow described above like this:
import { embedding } from 'litellm'
// Generate embeddings of a message using OpenAI via LiteLLM
const embeddingData = await embedding({
  model: 'text-embedding-3-small',
  input: 'Rishi is enjoying using LiteLLM',
})
// Using the OpenAI output format, obtain the embedding vector stored in
// the first object of the data array
const getEmbeddingVector = embeddingData.data[0].embeddingNote: You need to make sure that the
OPENAI_API_KEYexists as an environment variable. Refer to an earlier section Generate OpenAI Token on how to generate the OpenAI API Token.
In this section, we'll implement a similar flow in our application.
Set up the database connection
The node-postgres (pg) library provides a low-level interface to interact directly with PostgreSQL databases using raw SQL queries.
To initiate the setup of a database connection, generate a .env file in the root directory of your project and include the following code, replacing the placeholder values with your own:
# Koyeb Managed Postgres Instance URL
POSTGRES_URL="<YOUR_DATABASE_CONNECTION_URL>?sslmode=require"The addition of the sslmode=require parameter to the POSTGRES_URL value above indicates that the database connection should be established with SSL enabled.
The values added to the .env file should be kept secret and not included in Git history. By default, Remix CLI ensures that .env is added to the .gitignore file in your project.
Create the database client
Following that, establish a database client to connect to the database. To achieve this, create a postgres directory in the app directory by running the following command:
mkdir app/postgresInside this app/postgres directory, create a db.server.ts file with the following code:
// File: app/postgres/db.server.ts
// Load the environment variables
import 'dotenv/config'
// Load the postgres module
import pg from 'pg'
// Create a connection string to the Koyeb managed postgres instance
const connectionString: string = `${process.env.POSTGRES_URL}`
// Create a in-memory pool so that it's cached for multiple calls
export default new pg.Pool({ connectionString })The code imports the dotenv configuration, making sure that all the environment variables in the .env file are present at runtime. Afterwards, the code imports the pg library, retrieves the database URL from the environment variables, and uses it to create a new pool instance, which is subsequently exported.
Create the database schema
Next, create a schema.server.ts file within the app/postgres directory and add the following code to it:
// File: app/postgres/schema.server.ts
import pool from './db.server'
async function createSchema() {
  // Create the vector extension if it does not exist
  await pool.query('CREATE EXTENSION IF NOT EXISTS vector;')
  // Create the data table if it does not exist
  await pool.query(
    'CREATE TABLE IF NOT EXISTS data (id SERIAL PRIMARY KEY, metadata text, embedding vector(1536));'
  )
  console.log('Finished setting up the database.')
}
createSchema()The code above defines how data will be stored, organized, and managed in the database. Using the pool database instance, it executes an SQL query to create the vector extension within the database if it does not already exist.
The vector extension enables PostgreSQL databases to store vector embeddings. After creating the vector extension, a subsequent SQL query creates a data table within the database. This table comprises three columns:
- An idcolumn for storing auto-incrementing unique identifiers for each row in the table.
- A metadatacolumn with atextdata type for storing text data.
- An embeddingcolumn with avector(1536)data type. This column will store vector data with a length of 1536 elements.
After executing the SQL queries, a message is printed to the console if there’s an error during the execution.
To execute the code added to the schema file, update the script section of your package.json file with the following code:
{
. . .
  "scripts": {
   "db:setup": "tsx app/postgres/schema.server", 
    . . .
  }
. . .
}The db:setup script runs the code within the schema.server.ts file when executed.
Test the database setup locally
To execute it, run the following command in your terminal window:
npm run db:setupIf the command is executed successfully, you will see no logs in your terminal window, marking the completion of the database connection setup.
Add seed data to the database
The RAG chatbot will operate by retrieving metadata from the database with vector embeddings that are closest match to the vector embedding of the user's query. In this section, we will insert 5 facts about Rishi into the database.
To get started, create a embedding.server.ts file in the app/postgres directory and include the following code within the file:
// File: app/postgres/embedding.server.ts
import { embedding } from 'litellm'
import { toSql } from 'pgvector/pg'
import pool from './db.server'
interface Row {
  metadata: string
  distance: number
}
const getErrorMessage = (error: unknown) => {
  if (error instanceof Error) return error.message
  return String(error)
}
// Utility to save embeddings in the Koyeb managed postgres instance
export const saveEmbedding = async (metadata: string, embedding: string): Promise<void> => {
  try {
    await pool.query({
      text: 'INSERT INTO data (metadata, embedding) VALUES ($1, $2)',
      values: [metadata, embedding],
    })
  } catch (e) {
    console.log(getErrorMessage(e))
  }
}
// Utility to find relevant embeddings from the Koyeb managed postgres instance
export const findRelevantEmbeddings = async (embedding: string): Promise<Row[] | undefined> => {
  try {
    const res = await pool.query(
      'SELECT metadata, embedding <-> $1 AS distance FROM data ORDER BY distance LIMIT 3',
      [embedding]
    )
    return res.rows
  } catch (e) {
    console.log(getErrorMessage(e))
  }
}
// Utility to create embedding vector using OpenAI via LiteLLM
export const generateEmbeddingQuery = async (input: string): Promise<string | undefined> => {
  try {
    // Generate embeddings of a message using OpenAI via LiteLLM
    const embeddingData = await embedding({
      input,
      model: 'text-embedding-3-small',
    })
    return toSql(embeddingData.data[0].embedding)
  } catch (e) {
    console.log(getErrorMessage(e))
  }
}The code starts by importing various libraries:
- pooldatabase instance for connecting to the database
- litellmfor creating embeddings via OpenAI
- dotenvfor managing environment variables
- pgvectorfor handling vector embeddings
The code further defines and exports four functions:
- getErrorMessage: takes an error parameter, checks if it's an instance of- Error, and returns its message if true, else it returns the string representation of the error. This is the recommended way of handling error messages (by Kent C. Dodds).
- saveEmbedding: executes an SQL query to insert a text and its corresponding vector embedding into the- datatable.
- findSimilarEmbeddings: given a vector embedding, executes an SQL query to retrieve the top 3 most similar vector embeddings from the database along with their corresponding metadata.
- generateEmbeddingQuery: receives a text input and obtains a vector embedding from Open AI API's response. It then transforms it into an SQL vector using the- toSqlmethod from- pgvectorand returns it.
To generate seed data for the database, create a seed.server.ts file in the app/postgres directory. Add the code below to the file:
// File: app/postgres/seed.server.ts
import { generateEmbeddingQuery, saveEmbedding } from './embedding.server'
const About = [
  'Rishi is a quick learner.',
  "Rishi is blown away by Koyeb's service.",
  'Rishi has been happy using Postgres so far.',
  'Rishi is having fun marketing www.launchfa.st.',
  'Rishi is super excited to collaborate on technical writing.',
]
async function seed() {
  await Promise.all(
    About.map(async (information: string) => {
      const embedding = await generateEmbeddingQuery(information)
      if (embedding) saveEmbedding(information, embedding)
    })
  )
  console.log('Finished seeding the database.')
}
seed()The code above imports the generateEmbeddingQuery and saveEmbedding methods, and defines an About array, which contains 5 facts related to Rishi.
For each fact in the About array, a vector embedding query is generated using the generateEmbeddingQuery function and then saved to the database using the saveEmbedding function. Errors that occur while creating and saving a vector embedding are logged on the console.
To execute the code in the seed file, update the scripts section of your package.json file with the code below:
{
. . .
  "scripts": {
  "db:seed": "tsx app/postgres/seed.server", 
    . . .
  }
. . .
}Test the database locally
The db:seed script added above executes the seed.server.ts file. To run the script, run the code below in your terminal window:
npm run db:seedSuccessfully running the command above should display no error message in your terminal window.
In this section, you have added 5 things about Rishi and their corresponding vector embeddings to the database.
Build the components of our application
It is now time to create the components that'll help you quickly prototype the UI and handle the complexities of creating a chatbot application with Remix.
Using shadcn/ui components
To quickly prototype the chat interface, you'll set up shadcn/ui with Remix. Specifically from shadcn/ui you'll be able to show toasts, use a baked-in accessible input element, and button element. In your terminal window, run the command below to start setting up the shadcn/ui:
npx shadcn-ui@latest initYou will be asked a few questions to configure a components.json, answer with the following:
✔ Would you like to use TypeScript (recommended)? no / **yes**
✔ Which style would you like to use? › **Default**
✔ Which color would you like to use as base color? › **Slate**
✔ Where is your global CSS file? **app/tailwind.css**
✔ Would you like to use CSS variables for colors? no / **yes**
✔ Are you using a custom tailwind prefix eg. tw-? (Leave blank if not)
✔ Where is your tailwind.config.js located? **tailwind.config.ts**
✔ Configure the import alias for components: **~/components**
✔ Configure the import alias for utils: **~/lib/utils**
✔ Are you using React Server Components? **no** / yes
✔ Write configuration to components.json. Proceed? **yes**With above, you've set up a CLI that allows us to easily add React components to your Remix application.
In your terminal window, run the command below to get the button, input and toast elements:
npx shadcn-ui@latest add button
npx shadcn-ui@latest add input
npx shadcn-ui@latest add toastWith above, you should now see a ui directory inside the app/components directory containing button.tsx, input.tsx, toaster.tsx, toast.tsx, and use-toast.ts.
Open the app/root.tsx, and make the following changes:
import stylesheet from '~/tailwind.css'
import { Toaster } from '~/components/ui/toaster'
import type { LinksFunction } from '@remix-run/node'
import { Links, LiveReload, Meta, Outlet, Scripts, ScrollRestoration } from '@remix-run/react'
export const links: LinksFunction = () => [{ rel: 'stylesheet', href: stylesheet }]
export default function App() {
  return (
    <html lang="en">
      <head>
        <meta charSet="utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
        <Meta />
        <Links />
      </head>
      <body>
        <Outlet />
        <ScrollRestoration />
        <Scripts />
        <LiveReload />
        <Toaster />
      </body>
    </html>
  )
}In the code above, you import the Toaster component (created by shadcn/ui), and made sure that it's present on each page in your Remix application. This allows you to use the useToast hook (exported by use-toast.ts) in your React components to show toasts in your application.
Highlight code blocks with react-syntax-highlighter
To render code blocks in the responses from AI in a visually appealing manner, we'll use the react-syntax-highlighter library. In your terminal window, install react-syntax-highlighter via the following command:
npm install react-syntax-highlighterNext, create a code-block.tsx file inside app/components directory:
// File: app/components/code-block.tsx
// Inspired by Chatbot-UI and modified to fit the needs of this project
// https://github.com/mckaywrigley/chatbot-ui/blob/main/components/messages/message-codeblock.tsx
import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter'
interface CodeBlockProps {
  language: string
  value: string
}
const CodeBlock = ({ language, value }: CodeBlockProps) => {
  return (
    <div className="relative w-full font-sans codeblock bg-zinc-950">
      <div className="flex items-center justify-between w-full px-6 py-2 pr-4 bg-zinc-800 text-zinc-100">
        <span className="text-xs lowercase">{language}</span>
      </div>
      <SyntaxHighlighter
        PreTag="div"
        showLineNumbers
        language={language}
        customStyle={{
          margin: 0,
          width: '100%',
          background: 'transparent',
          padding: '1.5rem 1rem',
        }}
        codeTagProps={{
          style: {
            fontSize: '0.9rem',
            fontFamily: 'var(--font-mono)',
          },
        }}
      >
        {value}
      </SyntaxHighlighter>
    </div>
  )
}
CodeBlock.displayName = 'CodeBlock'
export { CodeBlock }The code above begins with importing the React-compatible Prism syntax highlighter component from the react-syntax-highlighter library. Afterwards, it exports a CodeBlock React component that shows the language of the code block (for example, JavaScript) above the rendered code block.
Creating a memoized React markdown component
You'll want to render the responses from the AI as quickly as possible. For this, you'll set up an endpoint with streaming enabled that'll return the response in the form of tokens. To save re-renders of your React component responsible for showing the response from AI to the user, you'll use memo from React. With memo, you are able to make your UI render faster by skipping re-renders if the props of your React component have not changed.
To render responses from AI into HTML-friendly manner, we'll use the react-markdown library. In your terminal window, install react-markdown via the following command:
npm install react-markdownNext, create a mark.tsx file inside app/components directory with the following contents:
// File: app/components/mark.tsx
import { memo } from 'react'
import ReactMarkdown from 'react-markdown'
export const MemoizedReactMarkdown = memo(
  ReactMarkdown,
  (prevProps, nextProps) =>
    prevProps.children === nextProps.children && prevProps.className === nextProps.className
)The code above begins by importing memo from React and the Markdown renderer component from react-markdown. You're now done with optimizing re-renders in your Remix application.
Next, you'll use this component to create another component that'll highlight code blocks, mathematical expressions, and frontmatter responses beautifully. In your terminal window, execute the following command:
npm install remark-gfm remark-mathThe command above installs the packages that can detect and create relevant formats for mathematical expressions and frontmatter responses from a given text.
Create a memoized-react-markdown.tsx file inside the app/components directory with the following content:
// File: app/components/memoized-react-markdown.tsx
import clsx from 'clsx'
import remarkGfm from 'remark-gfm'
import remarkMath from 'remark-math'
import { CodeBlock } from '~/components/code-block'
import { MemoizedReactMarkdown } from '~/components/mark'
const MemoizedMD = ({ message, index }) => {
  return (
    <MemoizedReactMarkdown
      remarkPlugins={[remarkGfm, remarkMath]}
      components={{
        p({ children }) {
          return <p className="mb-2 last:mb-0">{children}</p>
        },
        code({ node, inline, className, children, ...props }) {
          const match = /language-(\w+)/.exec(className || '')
          if (inline) {
            return (
              <code className={className} {...props}>
                {children}
              </code>
            )
          }
          return (
            <CodeBlock
              key={Math.random()}
              language={(match && match[1]) || ''}
              value={String(children).replace(/\n$/, '')}
              {...props}
            />
          )
        },
      }}
      className={clsx(
        'prose dark:prose-invert prose-p:leading-relaxed prose-pre:p-0 mt-4 w-full break-words pt-4',
        index !== 0 && 'border-t'
      )}
    >
      {message}
    </MemoizedReactMarkdown>
  )
}
export default MemoizedMDThe code above begins by importing the packages you've just installed and the code block component we created in the previous subsection. It then uses the components prop from react-markdown which allows you to style each HTML element in your own desired way.
Create a knowledge base component
Let's say you want to update the database (aka knowledge base) with new information so that the chatbot can learn and give out responses based on the latest data. You'll create a component that'll take in the new information as sentences separated by commas (,) and call an API that will take care of inserting them into the database.
Create a knowledge-base.tsx file inside app/components directory with the following content:
// File: app/components/knowledge-base.tsx
import { useState } from 'react'
import { Maximize2 } from 'lucide-react'
import { Form, useNavigation } from '@remix-run/react'
export default function KnowledgeBase() {
  const { state } = useNavigation()
  const [expanded, setExpanded] = useState(true)
  return (
    <Form id="rag" method="post" className="absolute top-0 border p-3 m-2 rounded right-0 flex flex-col items-start">
      <div className="cursor-pointer absolute top-1.5 right-1.5">
        <Maximize2
          size={12}
          className="fill-black"
          onClick={() => {
            setExpanded((expanded) => !expanded)
          }}
        />
      </div>
      {expanded && <span className="text-xs font-medium">Update Knowledge Base</span>}
      {expanded && (
        <textarea
          id="content"
          name="content"
          autoComplete="off"
          placeholder="Add to the existing knowledge base. Seperate sentences with comma (,)"
          className="mt-2 p-1 border border-black/25 outline-none text-xs h-[45px] w-[280px] rounded"
        />
      )}
      {expanded && (
        <button disabled={state === 'submitting'} className="mt-3 text-sm px-2 py-1 border rounded" type="submit">
          {state === 'submitting' ? <>Submitting...</> : <>Submit →</>}
        </button>
      )}
    </Form>
  )
}The code above imports the Form component and useNavigation hook from Remix. These are used to handle form responses via route actions in Remix. Moreover, the code informs the user supplying the chatbot with new information when their content is being processed.
Use Vercel's ai package to prototype the chat UI
To handle the complexity of managing messages between a User and AI and calling the API to give out responses based on the conversation, you'll use the open source ai package from Vercel. In your terminal window, execute the following command to install it:
npm install aiDefine the Remix application routes
With Remix, creating a JavaScript or TypeScript file in the app/routes directory maps it to a route in your application. The name of the file created maps to the route's URL pathname (with the exception of _index.tsx, which is the index route).
Creating nested paths that do not rely on the parent layout is done by inserting a trailing underscore at the end of the first segment in the file name. For example, you want to serve requests to /api/something without relying on any parent layout, you would start the file with the name as api_ (first segment of the route) and then append .something to it.
The structure below is what our routes folder will look like at the end of this section:
├── _index.tsx
└── api_.chat.tsx- _index.tsxwill serve as the homepage, i.e. localhost:3000.
- api_.chat.tsxwill serve responses to localhost:3000/api/chat.
| URL | Matched Routes | 
|---|---|
| / | app/routes/_index.tsx | 
| /api/chat | app/routes/api_.chat.tsx | 
Build the homepage as the chatbot interface
To get started, open the app/routes/_index.tsx file and replace the existing code with the following:
import { useChat } from 'ai/react'
import { ChevronRight } from 'lucide-react'
import { Input } from '~/components/ui/input'
import KnowledgeBase from '~/components/knowledge-base'
import MemoizedMD from '~/components/memoized-react-markdown'
export default function Index() {
  const { messages, input, handleInputChange, handleSubmit } = useChat()
  return <>
    <KnowledgeBase />
    <div className="flex flex-col items-center">
      <div className="relative flex flex-col items-start w-full max-w-lg px-5 overflow-hidden">
        <form onSubmit={handleSubmit} className="flex flex-row w-[75vw] max-w-[500px] items-center space-x-2 fixed bottom-4">
          <Input
            id="message"
            value={input}
            type="message"
            autoComplete="off"
            onChange={handleInputChange}
            placeholder="What's your next question?"
            className="border-black/25 hover:border-black placeholder:text-black/75 rounded"
          />
          <button className="size-6 flex flex-col border border-black/50 items-center justify-center absolute right-3 rounded-full hover:bg-black hover:text-white" type="submit">
            <ChevronRight size={18} />
          </button>
        </form>
        <div className="w-full flex flex-col max-h-[90vh] overflow-y-scroll">
          {messages.map((i, _) => (
            <MemoizedMD key={_} index={_} message={i.content} />
          ))}
        </div>
      </div>
    </div>
  </>
}The code above begins by importing the useChat hook from ai package, the markdown component that you created earlier to render each message with it, the Input element from shadcn/ui, and the knowledge base component. In the React component on homepage, you'll deconstruct the following from the useChat hook:
- The reactive messagesarray which contains the conversation between the user and AI
- The reactive inputvalue inserted by user into the input field
- The handleInputChangemethod to make sure theinputvalue is in sync with the changes
- The handleSubmitmethod to call the API (/api/chat) to get a response for the user's latest message
Now, remember that the KnowledgeBase component is a form element. To handle form submissions in Remix on the server, you'll use Remix route actions. Update the homepage code in the app/routes/_index.tsx file with the following:
import { useChat } from 'ai/react'
import { useEffect } from 'react'
import { ChevronRight } from 'lucide-react'
import { Input } from '~/components/ui/input'
import { useActionData } from '@remix-run/react'
import { useToast } from '~/components/ui/use-toast'
import KnowledgeBase from '~/components/knowledge-base'
import { ActionFunctionArgs, json } from '@remix-run/node'
import MemoizedMD from '~/components/memoized-react-markdown'
import { generateEmbeddingQuery, saveEmbedding } from '~/postgres/embedding.server'
export const action = async ({ request }: ActionFunctionArgs) => { 
  const formData = await request.formData() 
  const content = formData.get('content') as string
  if (content) { 
    const messages = content.split(',').map((i: string) => i.trim()) 
    if (messages.length > 0) { 
      await Promise.all( 
        messages.map(async (information: string) => { 
          const embedding = await generateEmbeddingQuery(information) 
          if (embedding) saveEmbedding(information, embedding) 
        }), 
      ) 
      return json({ code: 1 }) 
    } 
  } 
  return json({ code: 0 }) 
} 
export default function Index() {
  const { toast } = useToast() 
  const actionData = useActionData<typeof action>() 
  const { messages, input, handleInputChange, handleSubmit } = useChat()
  useEffect(() => { 
    if (actionData) { 
      if (actionData['code'] === 1) { 
        toast({ 
          description: 'Knowledge base updated succesfully.', 
        }) 
        const formSelector = document.getElementById('rag') as HTMLFormElement
        if (formSelector) formSelector.reset() 
      } else { 
        toast({ 
          description: 'There was an error in updating the knowledge base.', 
        }) 
      } 
    } 
  }, [actionData]) 
  return (
    <>
      <KnowledgeBase />
      {/* Rest of the component as is */}
    </>
  )
}The changes above begin by importing the following:
- Functions that will create embedding queries and save it to the database
- The useActionDatahook from Remix that is responsible for managing the state of form response
- The useToasthook fromshadcn/uithat allow you to show toasts with a function call
- The jsonmethod from Remix that allows to create Response objects according as defined by web standards
The changes then show a creation of an action function that's responsible for:
- Listening only to non-GETrequests (for example,POST,PUT,DELETE) on the homepage
- Parsing the form data from the request
- Splitting the content on commas (,) to get an array of text when content is found inside the form data
- Creating and saving the embedding vector along with the respective text into the database
The additions also include use of the useToast and useActionData hooks. Once the form is submitted, the data returned by the action function is accessible via the useActionData hook. From the response returned, you'll be able to show toasts with suitable messages to inform if the update of the knowledge base was successful or not.
Build the chat API endpoint
Create a file named api_.chat.tsx in the app/routes directory to handle the POST request created by the useChat hook in our React component.
Use vector search to create relevant context from the query
Before we continue, let's discuss briefly why relevant context creation is important when making a RAG chatbot. By default, an AI API will be able respond using only the knowledge that it has been trained on. We want to make sure that the chatbot knowledge base is updated with the specific content that a user will ask questions about.
To create such context in realtime, you will search for relevant vector embeddings that closely represent the vector embedding of the user's query. Afterwards, you can obtain the metadata associated with the relevant vectors and set the relevant context to a string containing all of the metadata together.
To do all of that, put the following code in the app/routes/api_chat.tsx file:
import { json } from '@remix-run/node'
import type { ActionFunctionArgs } from '@remix-run/node'
import { findRelevantEmbeddings, generateEmbeddingQuery } from '~/postgres/embedding.server'
export const action = async ({ request }: ActionFunctionArgs) => {
  // Set of messages between user and chatbot
  const { messages = [] } = await request.json()
  if (messages.length < 1) return json({ message: 'No conversation found.' })
  // Get the latest question stored in the last message of the chat array
  const userMessages = messages.filter((i: { role: string }) => i.role === 'user')
  const input = userMessages[userMessages.length - 1].content
  // Generate embeddings of the latest question using OpenAI
  const embedding = await generateEmbeddingQuery(input)
  if (!embedding) return json({ message: 'Error while generating embedding vector.' })
  // Fetch the relevant set of records based on the embedding
  let similarQuestions = await findRelevantEmbeddings(embedding)
  if (!similarQuestions) {
    similarQuestions = []
    console.log({ message: 'Error while finding relevant vectors.' })
  }
  // Combine all the metadata of the relevant vectors
  const contextFromMetadata = similarQuestions.map((i) => i.metadata).join('\n')
}Use Replicate to obtain LLAMA 2 70B chat model responses
To easily fetch model responses from the Replicate platform, we'll use the replicate SDK. In your terminal window, execute the following command:
npm install replicatePreviously, you were able to successfully create the relevant context for the user's query. It's now time to prompt LLAMA 2 70B, a chat model from Meta, in order to enhance the AI response by inserting the context as part of the system knowledge. Because we want to get the response to the user as quickly as possible, we'll enable streaming using the ReplicateStream functionality exported by the ai package.
To do all of that, update the app/routes/api_.chat.tsx file with the following code:
import { json } from '@remix-run/node'
import type { ActionFunctionArgs } from '@remix-run/node'
import { ReplicateStream, StreamingTextResponse } from 'ai'
import { experimental_buildLlama2Prompt } from 'ai/prompts'
import Replicate from 'replicate'
import { findRelevantEmbeddings, generateEmbeddingQuery } from '~/postgres/embedding.server'
// Instantiate the Replicate API
const replicate = new Replicate({ 
  auth: process.env.REPLICATE_API_TOKEN, 
}) 
export const action = async ({ request }: ActionFunctionArgs) => {
  // Set of messages between user and chatbot
  const { messages = [] } = await request.json()
  if (messages.length < 1) return json({ message: 'No conversation found.' })
  // Get the latest question stored in the last message of the chat array
  const userMessages = messages.filter((i: { role: string }) => i.role === 'user')
  const input = userMessages[userMessages.length - 1].content
  // Generate embeddings of the latest question using OpenAI
  const embedding = await generateEmbeddingQuery(input)
  if (!embedding) return json({ message: 'Error while generating embedding vector.' })
  // Fetch the relevant set of records based on the embedding
  let similarQuestions = await findRelevantEmbeddings(embedding)
  if (!similarQuestions) {
    similarQuestions = []
    console.log({ message: 'Error while finding relevant vectors.' })
  }
  // Combine all the metadata of the relevant vectors
  const contextFromMetadata = similarQuestions.map((i) => i.metadata).join('\n')
  // Now use Replicate LLAMA 70B streaming to perform the autocompletion with context
  const response = await replicate.predictions.create({ 
    // You must enable streaming.
    stream: true, 
    // The model must support streaming. See https://replicate.com/docs/streaming
    model: 'meta/llama-2-70b-chat', 
    // Format the message list into the format expected by Llama 2
    // @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2
    input: { 
      prompt: experimental_buildLlama2Prompt([ 
        { 
          // create a system content message to be added as
          // the llama2prompt generator will supply it as the context with the API
          role: 'system', 
          content: contextFromMetadata.substring(0, Math.min(contextFromMetadata.length, 2000)), 
        }, 
        // also, pass the whole conversation!
        ...messages, 
      ]), 
    }, 
  }) 
  // Convert the response into a friendly text-stream
  const stream = await ReplicateStream(response) 
  // Respond with the stream
  return new StreamingTextResponse(stream) 
}The changes above create an instance of Replicate using their SDK, and then prompts the LLAMA 2 70B chat model using the syntax defined for the experimental_buildLlama2Prompt function of the ai package. Each item in the array passed to the prompt function contains a role key which, in our case, may be:
- system: representing the system knowledge
- user: representing the user message
- assistant: representing the responses from the model
You've successfully created a chat endpoint that uses Retrieval Augmented Generation to provide results closely tied to user input. In the upcoming section, you will proceed to deploy the application online on the Koyeb platform.
Deploy the Remix app to Koyeb
Koyeb is a developer-friendly serverless platform to deploy apps globally. No ops, servers, or infrastructure management is required and it has supports for different tech stacks including Rust, Golang, Python, PHP, Node.js, Ruby, and Docker.
With the app now complete, the final step is to deploy it online on Koyeb. Since the app uses a managed PostgreSQL service, the deployment process doesn't include a database setup.
We will use git-driven deployment to deploy on Koyeb. To do this, we need to create a new GitHub repository from the GitHub web interface or by using the GitHub CLI with the following command:
gh repo create <YOUR_GITHUB_REPOSITORY> --privateInitialize a new git repository on your machine and add a new remote pointing to your GitHub repository:
git init
git remote add origin git@github.com:<YOUR_GITHUB_USERNAME>/<YOUR_GITHUB_REPOSITORY>.git
git branch -M mainAdd all the files in your project directory to the git repository and push them to GitHub:
git add .
git commit -m "Initial commit"
git push -u origin mainTo deploy the code on the GitHub repository, visit the Koyeb control panel, and while on the Overview tab, click Create Web Service to start the deployment process:
- Select the GitHub deployment method.
- Choose the repository for your code from the repository drop-down menu.
- In the Environment variables section, click Add variable to include additional environment variables. Add the POSTGRES_URL,OPENAI_API_KEY, andREPLICATE_API_TOKENenvironment variables. For each variable, input the variable name, select the Secret type, and in the value field, choose the Create secret option. In the form that appears, specify the secret name along with its corresponding value, and finally, click the Create button. Remember to add the?sslmode=requireparameter to thePOSTGRES_URLvalue.
- Choose a name for your App and Service and click Deploy.
During the deployment on Koyeb, the process identifies the build and start scripts outlined in the package.json file, using them to build and launch the application. You can track the deployment progress through the displayed log output. When the deployment completes and the health checks return successfully, your application will be operational. You can visit it using Koyeb's application URL, which should look something like this:
https://<YOUR_APP_NAME>-<KOYEB_ORG_NAME>.koyeb.app/If you would like to look at the code for the demo application, you can find it in the repository associated with this tutorial.
Conclusion
In this tutorial, you created a Retrieval Augmented Generation chatbot using vector embeddings and the LLAMA 2 70B Chat model with the Remix framework. With Koyeb's managed PostgreSQL service supporting the pgvector extension, you are able to perform vector search in the database and create context relevant to user messages in the realtime.
Since the application was deployed using the Git deployment option, subsequent code push to the deployed branch will automatically initiate a new build for your application. Changes to your application will become live once the deployment is successful. In the event of a failed deployment, Koyeb retains the last operational production deployment, ensuring the uninterrupted operation of your application.




