Mount and load Llama 2 model and weights on XetHub in minutes.
README.md
Llama 2
Accept Terms & Acceptable Use Policy
Visit the Meta website to request access, then accept the license and acceptable use policy before accessing these models.
Note: Your XetHub user account email address must match the email you provide on this Meta website.
Llama 2 is distributed for both research and commercial use, following the license and acceptable use policy listed above. It is hosted on XetHub as a convenience. Please reference Meta documentation for more information around these models.
Why Llama 2 on XetHub?
Downloading models is time consuming and Llama 2 uses 331GB on disk. With XetHub, you can mount this repository in seconds and load a model within minutes for fast inference from an EC2 instance.
Repo mounted in 8.6s (FAST!)
ubuntu@ip-10-0-30-1:~$ sudo xet mount xet://XetHub/LLama2/main LLama2
Mounting to "/home/ubuntu/LLama2"
Cloning into temporary directory "/tmp/.tmpf834wy"
Mounting as a background task...
Setting up mount point...
0 bytes in 2 objects mounted
Mount at "/home/ubuntu/LLama2" successful. Unmount with 'umount "/home/ubuntu/LLama2"'
Mount complete in 8.629213s
Inference in seconds - model loaded in 306s (!)
(venv-test) ubuntu@ip-10-0-30-1:~/LLama2/code$ torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir ../models/llama-2-7b-chat/ \
--tokenizer_path ../models/tokenizer.model \
--max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 306.17 seconds
User: what is a recipe for mayonnaise?
> Assistant: Thank you for asking! Mayonnaise is a popular condiment made from a mixture of egg yolks, oil, vinegar or lemon juice, and seasonings. Here is a basic recipe for homemade mayonnaise:
...
Using Llama 2 with XetHub
Llama 2 requires running on a machine with a GPU and CUDA drivers installed. Once you have met those requirements, then using Llama 2 is as easy as:
xet mount xet://XetHub/Llama2/main Llama
cd Llama/code
pip3 install -r requirements.txt
torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir ../models/llama-2-7b-chat/ \
--tokenizer_path ../models/tokenizer.model \
--max_seq_len 512 --max_batch_size 4
# to switch the Llama models change --ckpt_dir param to one of:
--ckpt_dir ../models/llama-2-7b/
--ckpt_dir ../models/llama-2-13b/
--ckpt_dir ../models/llama-2-70b/
--ckpt_dir ../models/llama-2-7b-chat/
--ckpt_dir ../models/llama-2-13b-chat/
--ckpt_dir ../models/llama-2-70b-chat/
This uses xet mount
, which is included in PyXet, XetHub's Python SDK.
Detailed example using an AWS EC2 GPU instance
If you are not using a CUDA GPU then you can always launch a cloud GPU instance to use LLama 2. Here are detailed steps on how to use an EC2 instance and set it up to run LLama 2 using XetHub.
# I launched an AWS g4dn.8xlarge EC2 instance running Ubuntu 22.04 LTS (x86_64)
# NVIDIA CUDA installation steps
# For GPU driver install, I followed the steps here to download/install public NVIDIA driver: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html
# Then followed these steps:
# https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts
# And finally followed post installation step described here:
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
export PATH=/usr/local/cuda-12.2/bin${PATH:+:${PATH}}
# Python setup
sudo apt install python3-pip
sudo apt install python3.10-venv
—
# XetHub setup
# Install git-xet, PyXet, and configure authentication
wget https://github.com/xetdata/xet-tools/releases/download/v0.11.0/xet-linux-x86_64.deb
sudo dpkg -i xet-linux-x86_64.deb
sudo pip install pyxet
# PyXet docs are available at https://pyxet.readthedocs.io/en/latest/
# Went https://xethub.com/explore/install
# Clicked 'Create Token'
# Pasted git xet login command into terminal on EC2 instance.
# Verified ~/.xetconfig has stuff now
git config --global user.name <YOUR NAME>
git config --global user.email <YOUR EMAIL>
# On Linux calling xet mount requires root permissions (uses NFSv3)
# Once XetHub credentials installed for user, then copy them over for root so xet mount can use same creds
sudo cp ~/.xetconfig /root/.xetconfig
sudo git config --global user.name <YOUR NAME>
sudo git config --global user.email <YOUR EMAIL>
# You might need nfs-common package for xet mount.
sudo apt install nfs-common
—
ubuntu@ip-10-0-30-1:~$ sudo xet mount xet://XetHub/Llama2/main Llama2
Mounting to "/home/ubuntu/Llama2"
Cloning into temporary directory "/tmp/.tmpf834wy"
Mounting as a background task...
Setting up mount point...
Mount at "/home/ubuntu/Llama2" successful. Unmount with 'umount "/home/ubuntu/Llama2"'
Mount complete in 8.629213s
cd Llama2/code
—
# Install Python requirements to run Llama 2 models
python3 -mvenv ~/venv-test
source ~/venv-test/bin/activate
pip3 install -r requirements.txt
—
(venv-test) ubuntu@ip-10-0-30-1:~/Llama2/code$ torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir ../models/llama-2-7b-chat/ \
--tokenizer_path ../models/tokenizer.model \
--max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 306.17 seconds
...
Repository Structure
Models are available in the models directory, and the example code from https://github.com/facebookresearch/llama is in the code directory.
To run the models, make sure you specify the model checkpoint directory.
.
├── LICENSE
├── README.md
├── Responsible-Use-Guide.pdf
├── USE_POLICY.md
├── code
│ ├── CODE_OF_CONDUCT.md
│ ├── CONTRIBUTING.md
│ ├── README.md
│ ├── download.sh
│ ├── example_chat_completion.py
│ ├── example_text_completion.py
│ ├── llama
│ ├── requirements.txt
│ └── setup.py
└── models
├── MODEL_CARD.md
├── llama-2-13b
├── llama-2-13b-chat
├── llama-2-70b
├── llama-2-70b-chat
├── llama-2-7b
├── llama-2-7b-chat
├── tokenizer.model
└── tokenizer_checklist.chk
File List | Total items: 7 | ||
---|---|---|---|
Name | Last Commit | Size | Last Modified |
code | |||
models | |||
.gitattributes | |||
LICENSE | |||
README.md | |||
Responsible-Use-Guide.pdf | |||
USE_POLICY.md |
About
Mount and load Llama 2 model and weights on XetHub in minutes.
Repository Size
Activity 36 commits
-
committed 4139ba5a8c 2mo ago
-
committed 8e17b94b93 2mo ago
-
committed 22714f6ce5 4mo ago
-
committed df187c45b5 4mo ago
-
committed 85e8ce60a8 5mo ago