You need an inference engine. Just use a llama.cpp derivate (fuck ollama, for a few reasons) and download an open model from HuggingFace (heavily recommend mistral series, which are Apache 2 license I think but I don’t really remember)
You need to find a “quantization” of the model, you can find those from the model DNA on the right side of the screen in huggingface. You need a GGUF format to be exact.
Then all you need to do is tune some inference parameters and you’re golden.
Gonna have to shill some FOSS LLMs here
You need an inference engine. Just use a llama.cpp derivate (fuck ollama, for a few reasons) and download an open model from HuggingFace (heavily recommend mistral series, which are Apache 2 license I think but I don’t really remember)
You need to find a “quantization” of the model, you can find those from the model DNA on the right side of the screen in huggingface. You need a GGUF format to be exact.
Then all you need to do is tune some inference parameters and you’re golden.