This repository provides a simple example of using Retrieval Augmented Generation (RAG) to provide question answering on your personal documents.
Make your own private fork of this repository, and clone it. Then all you need to do is to put any text files you want into the data/ directory, and run:
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY> python src/train.py gradio src/app.py
See below for more instructions to export from Notion or Slack.
With XetHub, you can easily check in all your data and store everything in one place (code, data, embeddings).
To store everything simply
git add . git commit -a -m "adding all my data" git push
And now you will be able to fetch and run your own personal question answering service from anywhere simply by cloning this repository.
pip install -r requirements.txt export OPENAI_API_KEY=YOUR_OPENAI_API_KEY
# Retrain from scratch python src/train.py
Run the app
- Download text files, any directory structure
- Put them into the data directory of this repository
- Train app!
- Follow the steps here: https://www.notion.so/help/export-your-content#export-as-markdown-&-csv
- Unzip the downloaded archive
- Move the unzipped folder/directory into the data directory of this repo and then train!
- Follow steps here: https://slack.com/help/articles/201658943-Export-your-workspace-data
A sample dataset has been provided in the sample-data directory, just copy the gen-ai folder into the data directory and use that for a very simple corpus of documents.
|File List||Total items: 9|
|Name||Last Commit||Size||Last Modified|