Projects done with the help of AI (Average Intelligence) tools have "Sloppy" in their name, while others never used it.
Don't get me wrong, AI is a great tool to get stuff done fast, but it's dum as hell and have to be carefully guided.
Some people compare it to a parrot with a gigantic brain. But it's still just a parrot facerolling on the keyboard.
Current Transformers LLM architecture is basically T9 text prediction from old phones, and models are dictionaries.
Repeatedly tap on your phone's text predictions - this is the current state of AI. We need "Conception-Text" models.
Though autocomplete was always a good tool anyway. Now with proper expectations you're ready to start building.
Oh, BTW. Stop using cloud AI services, start with your own local LMStudio/ComfyUI machine. Save monies.
You need 16+ GB VRAM on Nvidia card (preferably 24-32GB), 32+ GB RAM (pref 64-128GB), any half-decent CPU.
3 weeks of pure suffering and you're ready for a true/actual AI future, it'll pay off in less than a year.
Our videocards now can not only run games, but write somewhat useful code. That's pretty cool right?
And if part of your job or pipeline can actually be replaced by a parrot - maybe it should be replaced.
Think of writing and updating tests. If you're blank-staring at the wall right now, you get it.
Models less than 26B are absolutely useless, don't hold your hopes high.
Put Think with attention to details, stop the thought at 10 paragraphs. rule or use my portable caveman prompt.
Quantize your vision .mmproj file to Q8_0 so you don't have to blind the model completely.
Don't use uncensored/abliterated crap, every single bit of KL divergence makes a huge difference.
Never use Q8_0 KV Cache, it kills the tool calls because it introduces typos and lobotomizes the model.
When short on memory, always disable Unified KV Cache and set Max Concurrent Prediction to 1.
Use OpenAI-compatible API to connect to LM Studio. Best open-source agentic IDE atm seems to be https://zed.dev/
Useful models I've found so far that have any idea what they're doing and not dying while working:
Wasserman - unsloth/gemma-4-31b-it@iq4_nl (for 24GB GPU, very heavy and reliable, use temperature 1.0)
Drunk Wasserman - unsloth/gemma-4-31b-it@q2_k_xl (for 24GB GPU, same as above for bigger context window)
Pentester - xortron.criminalcomputing.2026.27b.next@q5_k_m (Qwen3.5 for 24GB GPU, use temperature 0.6)
Crackhead - google/gemma-4-26b-a4b@q4_k_m (for 16GB GPU + 32GB RAM, starter option, temperature 1.0)
On 16GB VRAM card you will have 48k context window while computing 8 layers on CPU, and it's still a Crackhead model. It's basically for testing before buying hardware for a fat model.
- ComfyUI-Enhancement-Utils - PC resource monitor and execution follower
- ComfyUI-SloppyAudio - Audio editing tools based on SoX and BS-RoFormer
- smol-caveman - Portable Caveman prompt designed for local LLMs. Read less slop and get much better results.
- ComfyUI-SloppyInstall.bat - Simplified pip install -r "requirements.txt" for custom nodes in portable ComfyUI.
- SloppyServer.bat - Single file local/Wi-Fi server for debugging multithreaded mobile Unity WebGL builds and other apps

