RAG

Retrieval Augmented Generation (RAG) is a technique that combines the strengths of retrieval-based and generative models to improve the quality of generated responses. In this guide, we will explore how to implement RAG with FreeToken.

How RAG Works

RAG takes your curated knowledge base and uses it to enhance the responses generated by a language model. The process involves two main steps:

Embedding: Convert your knowledge base into embeddings that can be efficiently searched. Embeddings are esentially numerical representations of your documents that capture their semantic meaning. This allows the model to understand the context and relevance of the documents in relation to a query.
Retrieval: When a query is made, retrieve the most relevant documents from the knowledge base using the embeddings.

The retrieved documents are fed back into the language model to generate a response that is informed by the specific context of the query.

Implementing RAG with FreeToken

Private Document Store vs Documents

There are two ways to store documents in FreeToken for use with RAG:

Private Document Store: This is a dedicated vector storage solution for a specific group of documents. It is private in that once the store is created, it can only be accessed by the UUID via the API. This is useful for storing sensitive information for a specific application or user. Additionally, all documents in a Private Document Store are encrypted with the user's encryption key (not the public key), ensuring that only the user can access them (when encryption is enabled).
Documents: This is a general storage that is meant to be shared across all users of your App. Documents are stored in a single vector store and can be accessed by all users of your App. This is useful for storing public information that is relevant to all users using that App.

Examples use-cases:

Private Document Store: A user of your App might upload their personal documents, such as manuals, reports, personal notes, etc. You create the Private Document Store and upload the documents to it. Then when the user asks a question, the documents are retrieved and used to generate a response.
Documents: You might have a set of documents that are relevant to all users of your App, such as FAQs, help guides, product documentation, or marketing information. You upload these documents to the Documents storage, and then during any conversation with that App, the documents are available to be retrieved and used to generate responses. These documents are not private and should be considered public to all users of the App.

Preparing Documents

The first step is preparing your documents. Presently, the FreeToken API does not support documents that contain tables or images. You will need to convert this information into readable text for best results. We suggest using another LLM to convert this information into paragraph format. We hope to support these formats soon.

Upload documents to a Private Document Store

await FreeToken.shared.createPrivateDocumentStore(
    name: "USER ID: 1" // This is for your reference to identify the store in the FreeToken console
) { store in
    FreeToken.shared.createDocument(
        content: "This is a test document for the private document store.", searchScope: "test-document",privateDocumentStoreID: store.id
    )
} error: { error in
    // Handle error
    print("Error creating private document store: \(error)")
}

Upload documents to Documents storage

await FreeToken.shared.createDocument(
    content: "This is a test document for the public documents storage.", searchScope: "test-document"
) { document in
    // Document created
    print("Document created with ID: \(document.id)")
} error: { error in
    // Handle error
    print("Error creating document: \(error)")
}

Enabling RAG

To enable RAG, you need to enable it at the Agent level. You can change this option on the agent settings page in the FreeToken console. Once enabled, we will inject a specialized tool call named article_lookup into the system prompt of each AI run and handle the lookups when the AI makes a call to that tool. This is automatic, and you do not need to implement any additional logic in your application to handle RAG after uploading documents and enabling RAG in the Agent settings.

For more information on how tool calling works, checkout our tool calling guide.

What is the Search Scope?

The searchScope parameter is used as a filter for the documents during search. It allows you to categorize documents and retrieve only those that match a specific scope. For example, if you have multiple documents related to different topics, you can use the searchScope to filter them based on the topic of interest and then in the query, you can specify the searchScope to retrieve only those documents that are relevant to that topic.

// Example of runninag a message thread and scoping the search to specific public documents
await FreeToken.shared.runMessageThread(
    id: "msg_thr_1123",
    documentSearchScope: "sales-docs"
) { responseMessage in
    // Successfully ran message thread
    print("Response: \(responseMessage.content)")
} error: { error in
    // Handle error
    print("Error running message thread: \(error)") 
}

// Example of running a message thread and scoping the search to specific private documents
// Note: You can pass in more than one private document store ID to search across multiple stores.
await FreeToken.shared.runMessageThread(
    id: "msg_thr_1123",
    documentSearchScope: "manuals",
    privateDocumentStoreIds: ["id-1", "id-2"]
) { responseMessage in
    // Successfully ran message thread
    print("Response: \(responseMessage.content)")
} error: { error in
    // Handle error
    print("Error running message thread: \(error)") 
}

Document Metadata

When you upload documents, you can also provide metadata that will be associated with the document. This metadata can be used to provide additional context or information about the document to the AI model when it retrieves the document during a query.

Metadata is written in plain text format as it is only to be used by the AI model and not for filtering or searching documents. You can provide metadata when creating a document by passing the metadata parameter.

await FreeToken.shared.createDocument(
  content: "This is a test document for the public documents storage.",
  metadata: "TITLE: Test Document\nAUTHOR: John Doe\nDATE: 2023-10-01\nURL: https://example.com/test-document",
  searchScope: "test-document") { document in
    // Document created
    print("Document created with ID: \(document.id)")
} error: { error in
    // Handle error
    print("Error creating document: \(error)")
}

Manually Searching Documents

You can also manually search documents using the searchDocuments method. This allows you to retrieve document chunks based on a query and a specific search scope. Document chunks are relevant pieces of documents based on the search query. It's useful for when you want to retrieve documents without running the message thread - for example, from other areas of your application.

await FreeToken.shared.searchDocuments(
    query: "test document",
    searchScope: "test-document"
) { documentSearchResults in
    // Successfully retrieved document chunks
    for chunk in documentSearchResults.documentChunks {
        print("Document Chunk: \(chunk.id), Content: \(chunk.contentChunk), Type (privateDocument, publicDocument): \(chunk.documentType)), documentMetadata: \(chunk.documentMetadata)")
    }
} error: { error in
    // Handle error
    print("Error searching documents: \(error)")
}

Privacy and Encryption

If you have enabled encryption, then documents will automatically be encrypted when uploaded to the Private Document Store or Documents storage. This ensures that your documents are secure and only accessible by the intended users. Private Document Stores are encrypted with the user's encryption key, while Documents storage is encrypted with the App's public key that is provided to the FreeToken client.

Querying documents will automatically run an embedding to conert the query into a vector on-device before sending it to the FreeToken API. This ensures that the query is secure and does not expose any sensitive information to FreeToken systems. This happens even on devices that do not support AI models.