This will cover how to use Ollama for LLM inference using the CPU.
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11
Install pip
curl <https://bootstrap.pypa.io/get-pip.py> > get-pip.py
python3.11 get-pip.py
Install venv support
sudo apt install python3.11-venv
python3 -m venv .venv
source .venv/bin/activate
pip install -U "huggingface_hub[cli]"
Configure Ollama + WebUI (optional)
sudo docker exec -it ollama bash
ollama pull ollama run hf.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF:Q5_K_S
Now we can use Ollama to run use your model