Train LLAMA-2 on a Small GPU

Training a LLM model is hard without GPU but we can use LORA and QLORA

4-bit quantization via QLoRA allows efficient finetuning of huge LLM models on consumer hardware while retaining high performance. This improves accessibility and usability for real-world applications.

QLoRA quantizes a pre-trained language model to 4 bits and freezes the parameters. A

During fine-tuning, gradients are backpropagated through the frozen 4-bit quantized model into only the Low-Rank Adapter layers.

The 4-bit quantization does not hurt model performance.

Quantization

In this post we use 4bit quantization

pip install bitsandbytes

...

quant_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=getattr(torch, "float16"),
	bnb_4bit_use_double_quant=False,
)

Training parameters

pip install peft_params

...

peft_params = LoraConfig(
	lora_alpha=16,
	lora_dropout=0.1,
	r=64,
	bias="none",
	task_type="CAUSAL_LM",
)


training_params = TrainingArguments(
	output_dir="./results",
	num_train_epochs=1,
	per_device_train_batch_size=4,
	gradient_accumulation_steps=1,
	optim="paged_adamw_32bit",
	save_steps=25,
	logging_steps=25,
	learning_rate=2e-4,
	weight_decay=0.001,
	fp16=False,
	bf16=False,
	max_grad_norm=0.3,
	max_steps=-1,
	warmup_ratio=0.03,
	group_by_length=True,
	lr_scheduler_type="constant"
)

trainer = SFTTrainer(
	model=model,
	train_dataset=dataset,
	peft_config=peft_params,
	dataset_text_field="text",
	max_seq_length=None,
	tokenizer=tokenizer,
	args=training_params,
	packing=False,
)

Prepare Docker

Below how to configure the docker to run on RTX A3000

Base Image

FROM nvidia/cuda:12.1.1-base-ubuntu20.04

Set environment variables

Install system dependencies

RUN apt-get update
RUN apt-get install -y \
		git \
		python3-pip \
		python3-dev \
		python3-opencv \
		libglib2.0-0

Install PyTorch and torchvision

RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu121

Install any python packages you need

COPY requirements.txt requirements.txt
RUN python3 -m pip install notebook accelerate peft bitsandbytes transformers trl tensorboard

Full code

full code is there

https://github.com/venergiac/training_LLAMA-2_QLORA

Buil docker:

docker compose build

Start docker:

docker compose run

Open the jupyter and check it with nvidia-smi.