Skip to content

Model Setup

AIProfile runs with local GGUF models supported by node-llama-cpp.

Default model selection

By default, setup detects your machine and downloads the strongest curated model that is comfortably recommended for the available RAM or VRAM:

bash
npm run setup-model
npx --yes --ignore-scripts=false aiprofile setup-model

Use an explicit model ID when you want to override automatic selection:

bash
npm run setup-model -- --model qwen3-4b
npx --yes --ignore-scripts=false aiprofile setup-model --model qwen3-4b

qwen3-4b is the safe fallback model if no curated model is recommended for the detected machine.

Choose a model

List curated GGUF models with RAM and VRAM-aware recommendations:

bash
npm run setup-model -- --list-models
npx --yes --ignore-scripts=false aiprofile setup-model --list-models

Install a curated model by ID:

bash
npm run setup-model -- --model qwen3-14b --write-config
npx --yes --ignore-scripts=false aiprofile setup-model --model qwen3-14b --write-config

--write-config updates only llm.model and llm.model_path in config.yaml. Without it, the command prints the exact config values to set manually.

Mac RAM guide

The curated list uses GGUF files compatible with node-llama-cpp. Most options use Q4_K_M because it is a good balance of quality, size, and speed on Apple Silicon. A few tiny models use Q8_0 where the file is still small.

Mac memoryTry firstNotes
8 GBllama-3.2-3bqwen3-4b remains a good explicit fallback.
16 GBqwen3-8bGood laptop tier for stronger local answers.
32 GBmistral-small-3.2-24bBetter quality, slower startup and generation.
64 GBdeepseek-r1-qwen-32bAvoids 70B models unless they are comfortably recommended.
96 GBllama-3.3-70b, deepseek-r1-llama-70bHigh-end dense 70B-class models.
128 GBmistral-large-2411Split GGUF top-end curated option.

For 192 GB, 256 GB, or 512 GB Mac Studio machines, use custom Hugging Face GGUF URIs for larger models or higher-quality quantizations. The curated catalogue intentionally stops at models that are practical on 128 GB RAM.

Smaller model

Llama-3.2-3B-Instruct Q4_K_M is about 2 GB and has a lower RAM requirement:

bash
npm run setup-model -- --model llama-3.2-3b
npx --yes --ignore-scripts=false aiprofile setup-model --model llama-3.2-3b

Manual download:

text
https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF

Save as:

text
./models/llama-3.2-3b-instruct-q4_k_m.gguf

Custom Hugging Face or HTTP GGUF model

bash
npx --yes --ignore-scripts=false aiprofile setup-model --model hf:Qwen/Qwen3-8B-GGUF:Q4_K_M --write-config
npx --yes --ignore-scripts=false aiprofile setup-model --model hf:Qwen/Qwen3-235B-A22B-GGUF:Q4_K_M
npx --yes --ignore-scripts=false aiprofile setup-model --model https://huggingface.co/user/repo/resolve/main/model.gguf

Only GGUF models supported by node-llama-cpp can be loaded by AIProfile. Split GGUF models are supported when the model URI resolves to the first -00001-of-000NN.gguf part. All downloaded parts must remain in the same directory.