Deploy an LLM | Notion

This will cover how to use Ollama for LLM inference using the CPU.

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11

Install pip

curl <https://bootstrap.pypa.io/get-pip.py> > get-pip.py
python3.11 get-pip.py

Install venv support

sudo apt install python3.11-venv

python3 -m venv .venv
source .venv/bin/activate

pip install -U "huggingface_hub[cli]"

Configure Ollama + WebUI (optional)

sudo docker exec -it ollama bash

ollama pull ollama run hf.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF:Q5_K_S

Now we can use Ollama to run use your model