gguf
Here are 756 public repositories matching this topic...
Distribute and run LLMs with a single file.
-
Updated
Jun 5, 2026 - C++
⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.
-
Updated
Jun 5, 2026 - Rust
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
-
Updated
Jun 5, 2026 - Python
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
-
Updated
Apr 7, 2026 - TypeScript
动手学Ollama,CPU玩转大模型部署,在线阅读地址:https://datawhalechina.github.io/handy-ollama/
-
Updated
Jan 15, 2026 - Jupyter Notebook
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Supports text-to-text, vision, text-to-image
-
Updated
Jun 5, 2026 - TypeScript
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG
-
Updated
Mar 8, 2026 - Python
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
-
Updated
Jun 7, 2026 - TypeScript
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
-
Updated
Jun 5, 2026 - Python
Interface for OuteTTS models.
-
Updated
Mar 23, 2026 - Python
An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.
-
Updated
Jun 3, 2026 - Go
A CLI to estimate inference memory requirements for Hugging Face models, written in Python.
-
Updated
May 18, 2026 - Python
Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.
-
Updated
Jun 5, 2026 - TypeScript
Llama 3+ inference in pure Java
-
Updated
Apr 24, 2026 - Java
Go library for embedded vector search and semantic embeddings using llama.cpp
-
Updated
Mar 6, 2026 - Go
Improve this page
Add a description, image, and links to the gguf topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gguf topic, visit your repo's landing page and select "manage topics."