Fine tune your own private Copilot

Introduction

The integration between Colab and our other tools is poor. While it's possible to open a notebook from a GitHub link in Colab, unfortunately, none of the rest of the repository content is brought into the Colab runtime. This makes it cumbersome to make use of other materials saved in your repo, that includes your dataset preprocessing scripts, structured training code, and maybe even the dataset itself. People have compromised and resorted to alternative solutions to complete a fine tuning lifecycle:

First create some dataset and put it in GDrive or a Hugging Face dataset repo.
Put up some code in notebook and run it in Colab, loading models from a Hugging Face model repo.
Save the fine tuned model back into a Hugging Face model repo.
Evaluate the fine tuned model. And if it's not ideal, go back to step 1.

This breaks one project into three pieces stored in different places:

a dataset repo,
a source code (notebook) repo,
and a model repo

There's no good way to cross reference between their individual versions. For example, if one fine tuning lifecycle deteriorates, one has to manually search back into three parallel history, letting alone the difficulty to revert to a good base.

In this guide we demonstrate that one can

Version all three pieces together in one XetHub repo.
Clones only what you need in the training to Colab runtime using Lazy clone feature.

This fine tuning example uses a Lora approach on top of Code Llama, quantizing the base model to int 8, freezing its weights and only training an adapter. Please accept their License at https://ai.meta.com/resources/models-and-libraries/llama-downloads/. Much of the code is refactored from [1], [2], [3].

How to use this repository?

This repository already contains a drop of Code Llama in Hugging Face format. You can fork this repository and opens fine-tune-code-llama.ipynb in Colab. Follow the instructions in the notebook to fine tune your private Copilot and save it back to your repo!

File List			Total items: 5
Name	Last Commit	Size	Last Modified
CodeLlama-7b-hf	Migrated repo from GitHub		6 months ago
scripts	Migrated repo from GitHub		6 months ago
.gitattributes	Initial commit	79 B	6 months ago
README.md	tweaked notebook	2.5 KiB	6 months ago
fine-tune-code-llama.ipynb	tweaked notebook	50 KiB	6 months ago

README.md

Fine tune your own private Copilot

Introduction

How to use this repository?

About

Repository Size

Commits 5 commits

File Types