This will guide you through fine tuning an Open Source LLM using Unsloth
https://colab.research.google.com/drive/1oHPzDpApeuGSdOEFribp6Bt85hZphe39#scrollTo=6bZsfBuZDeCL
We will be installing the following packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes
<aside>
❗ Run pip install -U pyarrow==15.0.2
If pyarrow
throws an error for you
</aside>
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
https://huggingface.co/blog/4bit-transformers-bitsandbytes
https://www.youtube.com/watch?v=TPcXVJ1VSRI
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
We add Lora adapters so we only need to update 1-10% of all parameters
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
Now we will download and prepare our Dataset. We will be using a Medical Q&A dataset
https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
Question | Answer |
---|---|
what research (or clinical trials) is being done for Zellweger Syndrome ? | The National Institute of Neurological Disorders and Stroke (NINDS), and other institutes of the National Institutes of Health (NIH), conduct research exploring the molecular and genetic basis of Zellweger syndrome and the other PBDs, and also support additional research through grants to major research institutions across the country. Much of this research focuses on finding better ways to prevent, treat, and ultimately cure disorders such as Zellweger syndrome. |