18
10
Fork 5

Mount and load Llama 2 model and weights on XetHub in minutes.

README.md

Llama 2

Accept Terms & Acceptable Use Policy

Visit the Meta website to request access, then accept the license and acceptable use policy before accessing these models.

Note: Your XetHub user account email address must match the email you provide on this Meta website.

Llama 2 is distributed for both research and commercial use, following the license and acceptable use policy listed above. It is hosted on XetHub as a convenience. Please reference Meta documentation for more information around these models.

Why Llama 2 on XetHub?

Downloading models is time consuming and Llama 2 uses 331GB on disk. With XetHub, you can mount this repository in seconds and load a model within minutes for fast inference from an EC2 instance.

Repo mounted in 8.6s (FAST!)

ubuntu@ip-10-0-30-1:~$ sudo xet mount xet://XetHub/LLama2/main LLama2
Mounting to "/home/ubuntu/LLama2"
Cloning into temporary directory "/tmp/.tmpf834wy"
Mounting as a background task...
Setting up mount point...
0 bytes in 2 objects mounted
Mount at "/home/ubuntu/LLama2" successful. Unmount with 'umount "/home/ubuntu/LLama2"'
Mount complete in 8.629213s

Inference in seconds - model loaded in 306s (!)

(venv-test) ubuntu@ip-10-0-30-1:~/LLama2/code$ torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir ../models/llama-2-7b-chat/ \
    --tokenizer_path ../models/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1

Loaded in 306.17 seconds

User: what is a recipe for mayonnaise?

> Assistant:  Thank you for asking! Mayonnaise is a popular condiment made from a mixture of egg yolks, oil, vinegar or lemon juice, and seasonings. Here is a basic recipe for homemade mayonnaise:
...

Using Llama 2 with XetHub

Llama 2 requires running on a machine with a GPU and CUDA drivers installed. Once you have met those requirements, then using Llama 2 is as easy as:

xet mount xet://XetHub/Llama2/main Llama

cd Llama/code
pip3 install -r requirements.txt

torchrun --nproc_per_node 1 example_chat_completion.py \
         --ckpt_dir ../models/llama-2-7b-chat/ \
         --tokenizer_path ../models/tokenizer.model \
         --max_seq_len 512 --max_batch_size 4

# to switch the Llama models change --ckpt_dir param to one of:
         --ckpt_dir ../models/llama-2-7b/
         --ckpt_dir ../models/llama-2-13b/
         --ckpt_dir ../models/llama-2-70b/
         --ckpt_dir ../models/llama-2-7b-chat/
         --ckpt_dir ../models/llama-2-13b-chat/
         --ckpt_dir ../models/llama-2-70b-chat/

This uses xet mount, which is included in PyXet, XetHub's Python SDK.

Detailed example using an AWS EC2 GPU instance

If you are not using a CUDA GPU then you can always launch a cloud GPU instance to use LLama 2. Here are detailed steps on how to use an EC2 instance and set it up to run LLama 2 using XetHub.

# I launched an AWS g4dn.8xlarge EC2 instance running Ubuntu 22.04 LTS (x86_64)
# NVIDIA CUDA installation steps

# For GPU driver install, I followed the steps here to download/install public NVIDIA driver: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

# Then followed these steps: 
# https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts

# And finally followed post installation step described here: 
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions

export PATH=/usr/local/cuda-12.2/bin${PATH:+:${PATH}}

# Python setup
sudo apt install python3-pip
sudo apt install python3.10-venv

—

# XetHub setup
# Install git-xet, PyXet, and configure authentication

wget https://github.com/xetdata/xet-tools/releases/download/v0.11.0/xet-linux-x86_64.deb
sudo dpkg -i xet-linux-x86_64.deb
sudo pip install pyxet

# PyXet docs are available at https://pyxet.readthedocs.io/en/latest/

# Went https://xethub.com/explore/install
# Clicked 'Create Token'
# Pasted git xet login command into terminal on EC2 instance.
# Verified ~/.xetconfig has stuff now
git config --global user.name <YOUR NAME>
git config --global user.email <YOUR EMAIL>

# On Linux calling xet mount requires root permissions (uses NFSv3)
# Once XetHub credentials installed for user, then copy them over for root so xet mount can use same creds
sudo cp ~/.xetconfig /root/.xetconfig
sudo git config --global user.name <YOUR NAME>
sudo git config --global user.email <YOUR EMAIL>

# You might need nfs-common package for xet mount.
sudo apt install nfs-common

—

ubuntu@ip-10-0-30-1:~$ sudo xet mount xet://XetHub/Llama2/main Llama2
Mounting to "/home/ubuntu/Llama2"
Cloning into temporary directory "/tmp/.tmpf834wy"
Mounting as a background task...
Setting up mount point...

Mount at "/home/ubuntu/Llama2" successful. Unmount with 'umount "/home/ubuntu/Llama2"'
Mount complete in 8.629213s

cd Llama2/code

—

# Install Python requirements to run Llama 2 models

python3 -mvenv ~/venv-test
source ~/venv-test/bin/activate
pip3 install -r requirements.txt

—

(venv-test) ubuntu@ip-10-0-30-1:~/Llama2/code$ torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir ../models/llama-2-7b-chat/ \
    --tokenizer_path ../models/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1

Loaded in 306.17 seconds

...

Repository Structure

Models are available in the models directory, and the example code from https://github.com/facebookresearch/llama is in the code directory.

To run the models, make sure you specify the model checkpoint directory.

.
├── LICENSE
├── README.md
├── Responsible-Use-Guide.pdf
├── USE_POLICY.md
├── code
│   ├── CODE_OF_CONDUCT.md
│   ├── CONTRIBUTING.md
│   ├── README.md
│   ├── download.sh
│   ├── example_chat_completion.py
│   ├── example_text_completion.py
│   ├── llama
│   ├── requirements.txt
│   └── setup.py
└── models
    ├── MODEL_CARD.md
    ├── llama-2-13b
    ├── llama-2-13b-chat
    ├── llama-2-70b
    ├── llama-2-70b-chat
    ├── llama-2-7b
    ├── llama-2-7b-chat
    ├── tokenizer.model
    └── tokenizer_checklist.chk

File List Total items: 7
Name Last Commit Size Last Modified
code Moved license and acceptable use to repo root 9 months ago
models copy Llama-2-13B-chat-GGML to xet://XetHub/Llama2/main/models/ recursively 9 months ago
.gitattributes add .gitattributes back 78 B 7 months ago
LICENSE Moved license and acceptable use to repo root 6.9 KiB 9 months ago
README.md Update README 6.6 KiB 9 months ago
Responsible-Use-Guide.pdf Moved license and acceptable use to repo root 1.2 MiB 9 months ago
USE_POLICY.md Moved license and acceptable use to repo root 4.7 KiB 9 months ago

About

Mount and load Llama 2 model and weights on XetHub in minutes.

Repository Size

Loading repo size...

Commits 36 commits

File Types