This will cover how to use Ollama for LLM inference using the CPU.

Install python

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11

Install pip

curl <https://bootstrap.pypa.io/get-pip.py> > get-pip.py
python3.11 get-pip.py

Install venv support

sudo apt install python3.11-venv
python3 -m venv .venv
source .venv/bin/activate

pip install -U "huggingface_hub[cli]"

Configure Ollama + WebUI (optional)

Download your model

sudo docker exec -it ollama bash
ollama pull ollama run hf.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF:Q5_K_S

Now we can use Ollama to run use your model

https://github.com/ollama/ollama/blob/main/docs/api.md