• vivendi@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    9 days ago

    Gonna have to shill some FOSS LLMs here

    You need an inference engine. Just use a llama.cpp derivate (fuck ollama, for a few reasons) and download an open model from HuggingFace (heavily recommend mistral series, which are Apache 2 license I think but I don’t really remember)

    You need to find a “quantization” of the model, you can find those from the model DNA on the right side of the screen in huggingface. You need a GGUF format to be exact.

    Then all you need to do is tune some inference parameters and you’re golden.