16
7
Fork 12

MyGPT Workshop: Build a ChatGPT For Your Own Data in One Hour

README.md

MyGPT

This repository provides a simple example of using Retrieval Augmented Generation (RAG) to provide question answering on your personal documents.

Make your own private fork of this repository, and clone it. Then all you need to do is to put any text files you want into the data/ directory, and run:

export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
python src/train.py
gradio src/app.py

See below for more instructions to export from Notion or Slack.

With XetHub, you can easily check in all your data and store everything in one place (code, data, embeddings).

To store everything simply

git add .
git commit -a -m "adding all my data"
git push

And now you will be able to fetch and run your own personal question answering service from anywhere simply by cloning this repository.

Requirements

We use langchain - index and openai.

pip install -r requirements.txt
export OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Usage

Train

# Retrain from scratch
python src/train.py

Run the app

gradio src/app.py

Getting Data

All

  1. Download text files, any directory structure
  2. Put them into the data directory of this repository
  3. Train app!

Notion

  1. Follow the steps here: https://www.notion.so/help/export-your-content#export-as-markdown-&-csv
  2. Unzip the downloaded archive
  3. Move the unzipped folder/directory into the data directory of this repo and then train!

Slack

  1. Follow steps here: https://slack.com/help/articles/201658943-Export-your-workspace-data

Sample Data

A sample dataset has been provided in the sample-data directory, just copy the gen-ai folder into the data directory and use that for a very simple corpus of documents.

File List Total items: 9
Name Last Commit Size Last Modified
assets Simplify design 12 months ago
data Basic app skeleton 1 year ago
model Basic app skeleton 1 year ago
sample-data converted pdf to markdown for sample data 1 year ago
src Simplify design 12 months ago
.gitattributes Initial commit 79 B 1 year ago
.gitignore Better error display & validation 287 B 1 year ago
README.md more instructions 1.8 KiB 10 months ago
requirements.txt Updated deps, verified working Windows 204 B 1 year ago

About

MyGPT Workshop: Build a ChatGPT For Your Own Data in One Hour

Repository Size

Loading repo size...

Commits 28 commits

File Types