Nvidia gpu for llm. cpp; 20%+ smaller compiled model sizes than llama.