Building llama.cpp for MacBook M4
Running local models on hardware in MacOs
I have got a new MacbookPro on M4 chip with 24Gb memory. And the first thing that I would like to do on it is to test content generation performance.
First of all, I downloaded distributive from llama.cpp directly but MacOs asked permission to run on every file, on every library and finally crashed.
After that I learned that in MacOs the best software if you built it there.
Therefore, I decided to build llama.cpp and run gpt-oss:20b model on it.
First of all let’s check if our laptop supports Metal.
system_profiler SPDisplaysDataTypeAfter running this command I am getting this output
Graphics/Displays: Apple M4: Chipset Model: Apple M4 Type: GPU Bus: Built-In Total Number of Cores: 10 Vendor: Apple (0x106b) Metal Support: Metal 3 Displays: Color LCD: Display Type: Built-in Liquid Retina XDR Display Resolution: 3024 x 1964 Retina Main Display: Yes Mirror: Off Online: Yes Automatically Adjust Brightness: Yes Connection Type: Internal LG HDR 4K: Resolution: 3840 x 2160 (2160p/4K UHD 1 - Ultra High Definition) UI Looks like: 1920 x 1080 @ 60.00Hz Mirror: Off Online: Yes Rotation: SupportedLooks good, we have Metal 3
Next, we need to install some build packages
brew install cmake ninja gitAfter installing let’s use git to clone the project
git clone https://github.com/ggml-org/llama.cppcd llama.cppAnd build it
cmake -B build -G Ninja -DGGML_METAL=ONcmake --build build --config Release -j 8Then we need to run local llama.cpp
./build/bin/llama-cli --gpt-oss-20b-default -ngl 999 -p “test”The param -nlg 999 is using to define number of DNN layers that we place in GPU, so in our case we max them.
