Llama 2

Accept Terms & Acceptable Use Policy

Please visit the Meta website, request access and accept the license and acceptable use policy before accessing these models.

Note: Your XetHub user account email address must match the email address you provide on this Meta website.

Llama 2 is distributed for both research and commercial use, following the license and acceptable use policy listed above.

It is hosted on XetHub as a convenience. Please refer to Meta documentation as the reference information around these models.

Why Llama 2 on XetHub?

Downloading the models is time consuming and uses up 331GB on disk. With XetHub, you can run these models in seconds, see the following example output from mounting and running Llama 2 from an EC2 instance.

Repo mounted in 8.6s (FAST!)

ubuntu@ip-10-0-30-1:~$ sudo xet mount xet://XetHub/LLama2/main LLama2
Mounting to "/home/ubuntu/LLama2"
Cloning into temporary directory "/tmp/.tmpf834wy"
Mounting as a background task...
Setting up mount point...
0 bytes in 2 objects mounted
Mount at "/home/ubuntu/LLama2" successful. Unmount with 'umount "/home/ubuntu/LLama2"'
Mount complete in 8.629213s

Inference in seconds - model loaded in 306s (!)

(venv-test) ubuntu@ip-10-0-30-1:~/LLama2/code$ torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir ../models/llama-2-7b-chat/ \
    --tokenizer_path ../models/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1

Loaded in 306.17 seconds

User: what is the recipe of mayonnaise?

> Assistant:  Thank you for asking! Mayonnaise is a popular condiment made from a mixture of egg yolks, oil, vinegar or lemon juice, and seasonings. Here i
s a basic recipe for homemade mayonnaise:

...

Using Llama 2 with XetHub

Llama 2 requires running on a machine with a GPU and CUDA drivers installed. Once you have met those requirements, then using Llama 2 is as easy as:

xet mount xet://XetHub/Llama2/main Llama

cd Llama/code
pip3 install -r requirements.txt

torchrun --nproc_per_node 1 example_chat_completion.py \
         --ckpt_dir ../models/llama-2-7b-chat/ \
         --tokenizer_path ../models/tokenizer.model \
         --max_seq_len 512 --max_batch_size 4

# to switch the Llama models change --ckpt_dir param to one of:
         --ckpt_dir ../models/llama-2-7b/
         --ckpt_dir ../models/llama-2-13b/
         --ckpt_dir ../models/llama-2-70b/
         --ckpt_dir ../models/llama-2-7b-chat/
         --ckpt_dir ../models/llama-2-13b-chat/
         --ckpt_dir ../models/llama-2-70b-chat/

This uses xet mount, which is included in pyxet, with docs available here.

Detailed example using an AWS EC2 GPU instance

If you are not using a CUDA GPU then you can always launch a cloud GPU instance to use LLama 2. Here are detailed steps on how to use an EC2 instance and set it up to run LLama 2 using XetHub.

# I launched an AWS g4dn.8xlarge EC2 instance running Ubuntu 22.04 LTS (x86_64)
# NVIDIA CUDA installation steps

# For GPU driver install, I followed the steps here to download/install public NVIDIA driver: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

# Then followed these steps: 
# https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts

# And finally followed post installation step described here: 
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions

export PATH=/usr/local/cuda-12.2/bin${PATH:+:${PATH}}

# Python setup
sudo apt install python3-pip
sudo apt install python3.10-venv

—

# XetHub setup
# Install git-xet, pyxet, and configure authentication

wget https://github.com/xetdata/xet-tools/releases/download/v0.11.0/xet-linux-x86_64.deb
sudo dpkg -i xet-linux-x86_64.deb
sudo pip install pyxet

# pyxet docs here: https://pyxet.readthedocs.io/en/latest/

# Went to https://xethub.com/explore/install
# Clicked 'Create Token'
# Pasted git xet login command into terminal on EC2 instance.
# Verified ~/.xetconfig has stuff now
git config --global user.name <YOUR NAME>
git config --global user.email <YOUR EMAIL>


# On Linux calling xet mount requires root permissions (uses NFSv3)
# Once XetHub credentials installed for user, then copy them over for root so xet mount can use same creds
sudo cp ~/.xetconfig /root/.xetconfig
sudo git config --global user.name <YOUR NAME>
sudo git config --global user.email <YOUR EMAIL>

# You might need nfs-common package for xet mount.
sudo apt install nfs-common

—

ubuntu@ip-10-0-30-1:~$ sudo xet mount xet://XetHub/Llama2/main Llama2
Mounting to "/home/ubuntu/Llama2"
Cloning into temporary directory "/tmp/.tmpf834wy"
Mounting as a background task...
Setting up mount point...

Mount at "/home/ubuntu/Llama2" successful. Unmount with 'umount "/home/ubuntu/Llama2"'
Mount complete in 8.629213s

cd Llama2/code

—

# Install Python requirements to run Llama 2 models

python3 -mvenv ~/venv-test
. ~/venv-test/bin/activate
pip3 install -r requirements.txt

—

(venv-test) ubuntu@ip-10-0-30-1:~/Llama2/code$ torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir ../models/llama-2-7b-chat/ \
    --tokenizer_path ../models/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1

Loaded in 306.17 seconds

...

Xet Layout

Models are available in the models directory, and the example code from https://github.com/facebookresearch/llama is in the code directory.

To run the models, make sure you specify the model checkpoint directory.

.
├── LICENSE
├── README.md
├── Responsible-Use-Guide.pdf
├── USE_POLICY.md
├── code
│   ├── CODE_OF_CONDUCT.md
│   ├── CONTRIBUTING.md
│   ├── README.md
│   ├── download.sh
│   ├── example_chat_completion.py
│   ├── example_text_completion.py
│   ├── llama
│   ├── requirements.txt
│   └── setup.py
└── models
    ├── MODEL_CARD.md
    ├── llama-2-13b
    ├── llama-2-13b-chat
    ├── llama-2-70b
    ├── llama-2-70b-chat
    ├── llama-2-7b
    ├── llama-2-7b-chat
    ├── tokenizer.model
    └── tokenizer_checklist.chk

File List			Total items: 7
Name	Last Commit	Size	Last Modified
code	Moved license and acceptable use to repo root		10 months ago
models	copy llama-2-70b-chat to xet://XetHub/LLamaV2/main/models/ recursively		10 months ago
.gitattributes	Initial commit	79 B	10 months ago
LICENSE	Moved license and acceptable use to repo root	6.9 KiB	10 months ago
README.md	Added pyxet doc links to README	6.6 KiB	10 months ago
Responsible-Use-Guide.pdf	Moved license and acceptable use to repo root	1.2 MiB	10 months ago
USE_POLICY.md	Moved license and acceptable use to repo root	4.7 KiB	10 months ago

README.md

Llama 2

Accept Terms & Acceptable Use Policy

Why Llama 2 on XetHub?

Repo mounted in 8.6s (FAST!)

Inference in seconds - model loaded in 306s (!)

Using Llama 2 with XetHub

Detailed example using an AWS EC2 GPU instance

Xet Layout

About

Repository Size

Commits 28 commits

File Types