Internet in a box

JTH

TSP Legend
Reaction score
1,313
I've been going down numerous rabbit holes, testing Llama AI models, running it in a VM, I'm losing about 10-15% overhead on resources, but we are still in beta and I don't want to alter the HOST system.

It started with the idea I wanted an Internet in a box type of Cyberdeck (fully offline devise) for when the world ends.
You can download all the WIKI, and of course all other sorts of documents you deem relevant to rebuilding the world.
I can then build an extensive library, using purpose built AI models to sift through it when I need quick indexing or answers.

If you have the right data and models, you can build advisors, like a Doctor, Farmer, etc

Anyhow, I rebuilt the Laptop, due to needing one large partition to store models and databases in one space. An 8-core AMD Ryzen 7 4700U with 16GB of ram, and 500GB storage (without a dedicated GPU), can run Small" Llama 3 8B Instruct models if they are heavily quantized. This is about the equivalent of Chat GPT-3.5 but with about half the context memory, so I can run similar (but shorter) task.
 
Last edited:
I've been going down numerous rabbit holes, testing Llama AI models, running it in a VM, I'm losing about 10-15% overhead on resources, but we are still in beta and I don't want to alter the HOST system.

It started with the idea I wanted an Internet in a box type of Cyberdeck (fully offline devise) for when the world ends.
You can download all the WIKI, and of course all other sorts of documents you deem relevant to rebuilding the world.
I can then build an extensive library, using purpose built AI models to sift through it when I need quick indexing or answers.

If you have the right data and models, you can build advisors, like a Doctor, Farmer, etc

Anyhow, I rebuilt the Laptop, due to needing one large partition to store models and databases in one space. An 8-core AMD Ryzen 7 4700U with 16GB of ram, and 500GB storage (without a dedicated GPU), can run Small" Llama 3 8B Instruct models if they are heavily quantized. This is about the equivalent of Chat GPT-3.5 but with about half the context memory, so I can run similar (but shorter) task.
Interesting. Good luck.
 
  • Like
Reactions: JTH
Cool stuff JTH,

So, which models have stood out to you so far? And what is the largest one you've been able to run on your VM?

When I find time this week, I'm thinking of downloading and playing with Qwen3-4B-Thinking-2507 off of LM studios.
 
Cool stuff JTH,

So, which models have stood out to you so far? And what is the largest one you've been able to run on your VM?

When I find time this week, I'm thinking of downloading and playing with Qwen3-4B-Thinking-2507 off of LM studios.

Nothing stands out just yet, I've only been running basic reasoning test to compare how they answer the questions. Truth be told, it's both better than I expected (but slow), and also disappointing because I know I don't have the specs to run larger models. I'm slamming the CPUs hard which means when running the models I have no room for real-world task.

We both know what's going to happen here, I'm going to have build a machine...

Screenshot_2026-03-23_19-57-41.png
 
Progress report.

I've moved the AI software unto the Host. Also managed to get my integrated graphics card to kick in some of the work on the smaller models. The Llama Server was easier, the Open Web UI has harder but it has a lot of options.

There is a ton of features under the hood here, everything you wished GPT can do, this already has it, you can control many parameters, like logic/creative temp, context size and so much more. I have the Voice Module loaded, but need to configure this as it's own server service. Next I'll work on the project folders, so I can store specialized instruction sets. Then at some point expand out and feed it news (it's not connected to the internet) so some of these models are unaware of present circumstances.

Screenshot_2026-03-27_18-24-45.png
 
Back
Top