Show HN: Local LLM Notepad – run a GPT-style model from a USB stick

What it is A single 45 MB Windows .exe that embeds llama.cpp and a minimal Tk UI. Copy it (plus any .gguf model) to a flash drive, double-click on any Windows PC, and you’re chatting with an LLM—no admin rights, Cloud, or network.

Why I built it Existing “local LLM” GUIs assume you can pip install, pass long CLI flags, or download GBs of extras.

I wanted something my less-technical colleagues could run during a client visit by literally plugging in a USB drive.

How it works PyInstaller one-file build → bundles Python runtime, llama_cpp_python, and the UI into a single PE.

On first launch, it memory-maps the .gguf; subsequent prompts stream at ~20 tok/s on an i7-10750H with gemma-3-1b-it-Q4_K_M.gguf (0.8 GB).

Tick-driven render loop keeps the UI responsive while llama.cpp crunches.

A parser bold-underlines every token that originated in the prompt; Ctrl+click pops a “source viewer” to trace facts. (Helps spot hallucinations fast.)

github.com

40 points

davidye324

13 days ago


9 comments

gxonatano 13 days ago

> walk up to any computer

Windows users seem to think their OS is ubiquitous. But in fact for most hackers reading this site, using Windows is a huge step backwards in productivity and capability.

exe34 12 days ago

Why not llamafile? Runs on everything from toothbrushes to toasters...

  • romperstomper 12 days ago

    Seconded for Llamafile, here is a link for references https://github.com/Mozilla-Ocho/llamafile . It indeed is working on all major platforms and its tooling allows easy creating of new llamafiles with new models. The only caveat is Windows where there is a limit 4Gb for executable files so just a llamafile launcher and the gguf file itself must be used. But this approach will work anywhere anyway.

ensocode 13 days ago

Interesting, will definitely try it. What can be expected? What other models do perform ok with this?

ge96 13 days ago

Wonder if you can use/interface with those coral accelerator boards