Building llama.cpp for MacBook M4

Running local models on hardware in MacOs

I have got a new MacbookPro on M4 chip with 24Gb memory. And the first thing that I would like to do on it is to test content generation performance.

First of all, I downloaded distributive from llama.cpp directly but MacOs asked permission to run on every file, on every library and finally crashed.

After that I learned that in MacOs the best software if you built it there.

Therefore, I decided to build llama.cpp and run gpt-oss:20b model on it.

First of all let’s check if our laptop supports Metal.

system_profiler SPDisplaysDataType

After running this command I am getting this output

Graphics/Displays:    Apple M4:      Chipset Model: Apple M4      Type: GPU      Bus: Built-In      Total Number of Cores: 10      Vendor: Apple (0x106b)      Metal Support: Metal 3      Displays:        Color LCD:          Display Type: Built-in Liquid Retina XDR Display          Resolution: 3024 x 1964 Retina          Main Display: Yes          Mirror: Off          Online: Yes          Automatically Adjust Brightness: Yes          Connection Type: Internal        LG HDR 4K:          Resolution: 3840 x 2160 (2160p/4K UHD 1 - Ultra High Definition)          UI Looks like: 1920 x 1080 @ 60.00Hz          Mirror: Off          Online: Yes          Rotation: Supported

Looks good, we have Metal 3

Next, we need to install some build packages

brew install cmake ninja git

After installing let’s use git to clone the project

git clone https://github.com/ggml-org/llama.cppcd llama.cpp

And build it

cmake -B build -G Ninja -DGGML_METAL=ONcmake --build build --config Release -j 8

Then we need to run local llama.cpp

./build/bin/llama-cli --gpt-oss-20b-default -ngl 999 -p “test”

The param -nlg 999 is using to define number of DNN layers that we place in GPU, so in our case we max them.

Building llama.cpp for MacBook M4 ​

Building llama.cpp for MacBook M4