Fine tune your own private Copilot

Introduction

The integration between GitHub and Colab has been annoyingly difficult. While it's possible to open a notebook from a GitHub link in Colab, unfortunately, none of the rest of the repository content is brought into the Colab runtime. This makes it cumbersome to make use of other materials saved in your repo, that includes your dataset preprocessing scripts, structured training code, and maybe even the dataset itself. People have compromised and resorted to alternative solutions to complete a fine tuning lifecycle:

First create some dataset and put it in GDrive or a Hugging Face dataset repo.
Put up some code in notebook and run it in Colab, loading models from a Hugging Face model repo.
Save the fine tuned model back into a Hugging Face model repo.
Evaluate the fine tuned model. And if it's not ideal, go back to step 1.

This breaks one project into three pieces stored in different places: a dataset repo, a source code (notebook) repo, and a model repo, and there's no good way to cross reference between their individual versions. For example, if one fine tuning lifecycle deteriorates, one has to manually search back into three parallel history, letting alone the difficulty to revert to a good base.

In this guide we demonstrate that one can

Version all three pieces together in one GitHub repo managed by XetData GitHub app.
Clones only what you need in the training to Colab runtime using Lazy clone feature.

This fine tuning example uses a Lora approach on top of Code Llama, quantizing the base model to int 8, freezing its weights and only training an adapter. Please accept their License at https://ai.meta.com/resources/models-and-libraries/llama-downloads/. Much of the code is refactored from [1], [2], [3].

How to use this repository?

This repository already contains a drop of Code Llama in Hugging Face format. You can fork this repository and opens fine-tune-code-llama.ipynb in Colab. Follow the instructions in the notebook to fine tune your private Copilot and save it back to your repo!

File List			Total items: 6
Name	Last Commit	Size	Last Modified
.xet	Configured repository to use git-xet.		5 months ago
CodeLlama-7b-hf	Code Llama back to base model		5 months ago
scripts	Colab (#1 )		5 months ago
.gitattributes	Configured repository to use git-xet.	90 B	5 months ago
README.md	Update README.md	2.6 KiB	5 months ago
fine-tune-code-llama.ipynb	Drop unnecessary cells	23 KiB	5 months ago

Repository Size

Loading repo size...

README.md

Fine tune your own private Copilot

Introduction

How to use this repository?

Repository Size

Commits 7 commits

File Types