This will guide you through fine tuning an Open Source LLM using Unsloth

https://docs.unsloth.ai/

https://colab.research.google.com/drive/1oHPzDpApeuGSdOEFribp6Bt85hZphe39#scrollTo=6bZsfBuZDeCL

1. Installation

We will be installing the following packages

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

<aside> ❗ Run pip install -U pyarrow==15.0.2 If pyarrow throws an error for you

</aside>

2. Load the model

from unsloth import FastLanguageModel
import torch

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

https://huggingface.co/blog/4bit-transformers-bitsandbytes

https://www.youtube.com/watch?v=TPcXVJ1VSRI

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

We add Lora adapters so we only need to update 1-10% of all parameters

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

3. Data Preparation

Now we will download and prepare our Dataset. We will be using a Medical Q&A dataset

https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset

Question Answer
what research (or clinical trials) is being done for Zellweger Syndrome ? The National Institute of Neurological Disorders and Stroke (NINDS), and other institutes of the National Institutes of Health (NIH), conduct research exploring the molecular and genetic basis of Zellweger syndrome and the other PBDs, and also support additional research through grants to major research institutions across the country. Much of this research focuses on finding better ways to prevent, treat, and ultimately cure disorders such as Zellweger syndrome.