I agree with the other commenters, this post does not explain why you would not just run Ollama or Koboldcpp on Windows. What exactly makes running Ollama within virtualized NixOS in WSL in some way better than running natively?
If it's just the novelty aspect of it or some idealogical reason, that's fine, but it should be explained in the blog post before someone thinks this is a sane and logical way to run Ollama on a gaming PC.
^This. People act so helpless to big tech, and especially Windows. Linux is still more effort for normies, but it should not be an issue for most people on this sub to want learn it-you are doing things already on Windows that would not be much different than Linux. AI w/ Nividia alone justifies Linux because Windows has performance issues. Then you have the freedom of leaving Windows if it continues to get hostile.
Unless you're working for a company where you have no choice. Then you have no freedom to leave Windows. You have, obviously, the freedom to leave your job, but let's be adult.
Because it's adult to accept being spied on and your infos sold to whomever Microsoft seems worthy? Sure I can understand it, but I don't think it has anything to do with being an adult. Just what degree of personal privacy you are willing to part with.
Hard to say but I believe at this point it somewhat is if you use a minimal setup. But once more people move to it, we can be sure there will be more attacks like the infamous xz-utils backdoor.
I would dare guess the author just doesn’t know there is a perfectly functional Windows native Ollama release. I was doing the same thing until I realized that it makes no sense because I can just install ollama on Windows and then connect to it from within WSL.
As others have already pointed out if you're going to run Ollama in Windows anyway, why not use the native build? And if you want to use WSL, then I'd suggest using something like LocalAI which gives you a lot more control and support for additional formats (GGML, GGUF, GPTQ, ONNX, etc).
The clue seems to be "on your gaming PC". Many linux users still run Windows on their gaming machines due to compatibility. This would also probable be their device which has the most powerful GPU.
It's probably marginally faster for inference as long as you're willing to pay the extremely annoying cost of all the stupid filesystem issues. Given that ollama specifically has very good cross platform support it seems like a fool's errand.
it's unclear to me what you gain running ollama in wsl like this compared to switching to a productive native operating system [like nixos] or just installing the windows release of ollama and quietly forgetting about it.
i use nixos.wsl at work to have the same emacs configuration as on my laptop, and that's fine except the windows filesystem performance makes me want to throw the whole system in a dumpster. but on my home gaming machine i have some games that only run on windows so i just installed ollama's windows installer which works with my GPU and installs an autostart entry.
these days the windows box sits in a dark corner on my network with tailscale (again just the windows install), running sunshine too to start steam games on my laptop.
I use ChatGPT for most practical stuff but I really enjoy running local models. I find it really interesting and I think it's important for people to know how to run these without being beholden to big tech. If you have a used 3090 you can already run some really strong models. There are some really interesting local models as well, like the abliterated ones.
I recently gave this another go, and was pleasantly surprised that I could install Steam easily, the nVidia card was detected and driver installed, and I quickly installed the "Resident Evil 4" remake with awesome performance and no glitchiness
Then I rebooted my system and found that Steam had broken Gnome and I couldn't log in and had to go into safe mode and debug from the command line. 1 hour in, 1 thing installed.
I love the idea of this flake to run Ollama even on Windows, but just pointing people to your _everything_ flake is going to confuse people and make it look harder than it is to run Ollama on Nix.
If you are using a system-controlling Nix (nix-darwin, NixOS…), it’s as easy as `hardware.services.ollama.enable=true` with maybe adding `.acceleration=“cuda”` to force GPU usage or `host=“0.0.0.0”` to allow connections to Ollama that are not local to your system. In a home-manager situation it is even easier: just include `pkgs.ollama` in your `home.packages`, with an `.override{}` for the same options above. That should be it, really.
I will say that if you have a more complex NixOS setup that patches the kernel or can’t lean on cachix for some reason that using the ollama package takes a long time to compile. My setup at home runs on a 3950X Threadripper and when Ollama compiles it uses all the cores at 99% for about 16 minutes.
I've been running an ollama and deepseek in a container in TrueNAS k8s for several months. It's hooked up to my Continue extension in VSCode. I also mix it with cloud hosted "dumb" ones for other tasks like code completion. Ollama deepseek is reserved for heavier chat and code tasks.
It's fast as hell. Though you will need at least two GPUs to divide between ollama and if need something else(display/game/proxmox) to use it.
Nvidia never fixed their sysmem fallback policy for wsl2 though, running on wsl2 rather than native Windows just spell so much performance problems when VRAM overflows
That's what mine does as it's a gaming PC secondarily.... that being said I am running in to some limitations i.e. certain Sim Racing titles and VR Support (though both ALVR and WiVRN are excellent)
I agree with the other commenters, this post does not explain why you would not just run Ollama or Koboldcpp on Windows. What exactly makes running Ollama within virtualized NixOS in WSL in some way better than running natively?
If it's just the novelty aspect of it or some idealogical reason, that's fine, but it should be explained in the blog post before someone thinks this is a sane and logical way to run Ollama on a gaming PC.
Reproducability and security. You're getting the managed security of a Windows install plus your isolating it from Windows telementry.
> your isolating it from Windows telementry
1. Maybe, 2. For now
Yes that's true but that's how it always is.
Not if you can migrate to linux on a whim.
^This. People act so helpless to big tech, and especially Windows. Linux is still more effort for normies, but it should not be an issue for most people on this sub to want learn it-you are doing things already on Windows that would not be much different than Linux. AI w/ Nividia alone justifies Linux because Windows has performance issues. Then you have the freedom of leaving Windows if it continues to get hostile.
Unless you're working for a company where you have no choice. Then you have no freedom to leave Windows. You have, obviously, the freedom to leave your job, but let's be adult.
Because it's adult to accept being spied on and your infos sold to whomever Microsoft seems worthy? Sure I can understand it, but I don't think it has anything to do with being an adult. Just what degree of personal privacy you are willing to part with.
Brave to assume linux is more secure than Windows. I didn't read the rest of your comment.
Hard to say but I believe at this point it somewhat is if you use a minimal setup. But once more people move to it, we can be sure there will be more attacks like the infamous xz-utils backdoor.
I would dare guess the author just doesn’t know there is a perfectly functional Windows native Ollama release. I was doing the same thing until I realized that it makes no sense because I can just install ollama on Windows and then connect to it from within WSL.
As others have already pointed out if you're going to run Ollama in Windows anyway, why not use the native build? And if you want to use WSL, then I'd suggest using something like LocalAI which gives you a lot more control and support for additional formats (GGML, GGUF, GPTQ, ONNX, etc).
https://github.com/mudler/LocalAI
Just to awnser the question why not using windows. Here are some ideas why the author might have used nix instead:
- reproducible (with minor adjustments even on non-WSL systems)
- if you are used to nix, there is not much which beats it in terms of stability, maintainability, upgradability, and fun (?)
- additional services are typically easier to set up like tailscale-acl, used by the author, which uses pulumi under the hood
- despite some downsides (disc speed was an issue when I used it), WSL is surprisingly capable
IME storage speed is only an issue with if you’re using the shared folder capability. It’s as fast as any vm if you stick to the wsl block device.
Correct. I had the issue that I developed a windows native Software but wanted to use my linux Tools.
If the author like NixOS, they can use NixOS in native, and then they can get native Ollama again. No need to run nix on WSL
The clue seems to be "on your gaming PC". Many linux users still run Windows on their gaming machines due to compatibility. This would also probable be their device which has the most powerful GPU.
They can not game on it (yet). That was the point of this post. Having a 2 in 1 device. For gaming and on demand ai.
Given that Ollama runs quite fine on Windows if you have NVIDIA, why such complicated setup?
It would make more sense for AMD I suppose where Ollama's Windows support is lacking compared to Linux.
That said, neat tricks useful for other stuff as well.
Yeah… was about to ask that. I just run Ollama on windows. What am I actually solving by running it virtualized in WSL?
It's probably marginally faster for inference as long as you're willing to pay the extremely annoying cost of all the stupid filesystem issues. Given that ollama specifically has very good cross platform support it seems like a fool's errand.
it's unclear to me what you gain running ollama in wsl like this compared to switching to a productive native operating system [like nixos] or just installing the windows release of ollama and quietly forgetting about it.
i use nixos.wsl at work to have the same emacs configuration as on my laptop, and that's fine except the windows filesystem performance makes me want to throw the whole system in a dumpster. but on my home gaming machine i have some games that only run on windows so i just installed ollama's windows installer which works with my GPU and installs an autostart entry.
these days the windows box sits in a dark corner on my network with tailscale (again just the windows install), running sunshine too to start steam games on my laptop.
Running models at home seems like a waste of money while at the same time they are currently heavily subsidized in the cloud by dumb money.
How is it a waste of money if a lot of people, including the author of the article, already have an nVidia GPU in their PC?
Running locally has a lot of advantages - privacy, getting to learn how to run LLMs, not having to deal with quotas, logins, outages.
Home electricity bill of a decent GPU alone is usually more expensive than renting GPUs on demand.
Do you have any numbers to back that statement up?
E.g. I have solar panels and a home battery and pay less than a $100 a year for electricity.
Solar panels aren't free. And aren't hassle-free either.
I wouldn't even bring that into the equasion since a large part of the population can't use solar panels.
Not to mention GPU value depreciation.
I use ChatGPT for most practical stuff but I really enjoy running local models. I find it really interesting and I think it's important for people to know how to run these without being beholden to big tech. If you have a used 3090 you can already run some really strong models. There are some really interesting local models as well, like the abliterated ones.
it's nice to be able to have private conversations.
There's also big efficiency increases when batching multiple requests, making clouds inherently more cost effective for normal use cases.
Way better utilization of expensive hardware as well ofc.
Heavily subsidized on heavily price discriminated hardware (nvidia datacenter licensed hardware) kind of cancels out.
Sometimes you don't want your workflow to be subject to the whims of some LLM API provider.
Just did the opposite. Decided it’s time for Linux on desktop. Better for programming. Better for AI.
Bet being that I can get most games to work on it - that was the sticking point. (Thanks to Valve I think it’ll work out)
I recently gave this another go, and was pleasantly surprised that I could install Steam easily, the nVidia card was detected and driver installed, and I quickly installed the "Resident Evil 4" remake with awesome performance and no glitchiness
Then I rebooted my system and found that Steam had broken Gnome and I couldn't log in and had to go into safe mode and debug from the command line. 1 hour in, 1 thing installed.
I'll try again in 10 years.
I do not play games. I quietly celebrate each time my Linux-certified Fedora laptop successfully wakes up from sleep.
What distro was this on?
Installing Steam breaking GNOME sounds wild.
I'm sorry but "Steam broke Gnome" makes no sense. This is like the person who says they installed Firefox and it broke the internet.
This is great work, it solves an exact problem I too am having. Now I just need to upgrade my 12 year old GPUs to something that can run an LLM.
In my case, I would have to upgrade my >12 years old PC. :D
I love the idea of this flake to run Ollama even on Windows, but just pointing people to your _everything_ flake is going to confuse people and make it look harder than it is to run Ollama on Nix.
If you are using a system-controlling Nix (nix-darwin, NixOS…), it’s as easy as `hardware.services.ollama.enable=true` with maybe adding `.acceleration=“cuda”` to force GPU usage or `host=“0.0.0.0”` to allow connections to Ollama that are not local to your system. In a home-manager situation it is even easier: just include `pkgs.ollama` in your `home.packages`, with an `.override{}` for the same options above. That should be it, really.
I will say that if you have a more complex NixOS setup that patches the kernel or can’t lean on cachix for some reason that using the ollama package takes a long time to compile. My setup at home runs on a 3950X Threadripper and when Ollama compiles it uses all the cores at 99% for about 16 minutes.
Remove windows. And this is amazing.
I've been running an ollama and deepseek in a container in TrueNAS k8s for several months. It's hooked up to my Continue extension in VSCode. I also mix it with cloud hosted "dumb" ones for other tasks like code completion. Ollama deepseek is reserved for heavier chat and code tasks.
It's fast as hell. Though you will need at least two GPUs to divide between ollama and if need something else(display/game/proxmox) to use it.
Nvidia never fixed their sysmem fallback policy for wsl2 though, running on wsl2 rather than native Windows just spell so much performance problems when VRAM overflows
Gaming PC can run Linux to begin with.
That's what mine does as it's a gaming PC secondarily.... that being said I am running in to some limitations i.e. certain Sim Racing titles and VR Support (though both ALVR and WiVRN are excellent)
Quoting the article:
> I refused to manage a separate Ubuntu box that would need reconfiguring from scratch.
Immediately followed by:
> After hacking away at it for a number of weeks
Hmmm
Oh dear god, just use containers or instead a proper os rather than that disk chewing monstrosity...
Ollama also runs on Windows.
>gaming PC
>LLM
stinky