I worry that 7-Zip is going to lose relevance because lack of zstd support. zlib's performance is intolerable for large files and zlib-ng's SIMD implementation only helps here a bit. Which is a shame, because 7-Zip is a pretty amazing container format, especially with its encryption and file splitting capabilities.
I use ZSTD a ton in my programming work where efficiency matters.
But for sharing files with other people, ZIP is still king. Even 7z or RAR is niche. Everyone can open a ZIP file, and they don't really care if the file is a few MBs bigger.
Which reveals that "everyone can open a ZIP file" is a lie. Sure, everyone can open a ZIP file, as long as that file uses only a limited subset of the ZIP format features. Which is why formats which use ZIP as a base (Java JAR files, OpenDocument files, new Office files) standardize such a subset; but for general-purpose ZIP files, there's no such standard.
(I have encountered such ZIP files in the wild; "unzip" can't decompress them, though p7zip worked for these particular ZIP files.)
if by gui u mean the ability to right click a .zip file and unzip it just through the little window that pops up ur totally right. At least that + the unzipping progress bar is what I appreciate 7zip for
Being a bit faster or efficient won't make most people switch. 7z offers great UX (convenient GUI and support for many formats) that keeps people around.
A lot of synchronization primitives in the NT kernel are based on a register width bit mask of a CPU set, so each collection of 64 hardware threads on 64 bit systems kind of runs in its own instance of the scheduler. It's also unfortunately part of the driver ABI since these ops were implemented as macros and inline functions.
Because of that, transitioning a software thread to another processor group is a manual process that has to be managed by user space.
The NT kernel dates back to 1993. Computers didn’t exceed 64 logical processors per system until around 2014. And doing it back then required a ridiculously expensive server with 8 Intel CPUs.
The technical decision Microsoft made initially worked well for over two decades. I don’t think it was lame; I believe it was a solid choice back then.
> And x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it.
If that were the case the above system wouldn't have needed 8 sockets. With NUMA systems the app needs to be scheduling group aware anyways. The difference here really appears when you have a single socket with more than 64 hardware threads, which took until ~2019 for x86.
There were single image systems with hundreds of cores in the late 90s and thousands of cores in the early 2000s.
I absolutely stand by the fact that Intel and AMD didn't pursue high core count systems until that point because they were so focused on single core perf, in part because Windows didn't support high core counts. The end of Denmark scing forced their hand and Microsoft's processor group hack.
Do you have anything to say regarding NUMA for the 90s core counts though? As I said, it's not enough that there were a lot of cores - they have to be monolithically scheduled to matter. The largest UMA design I can recall was the CS6400 in 1993, to go past that they started to introduce NUMA designs.
> other systems had been exceeding 64 cores since the late 90s.
Windows didn’t run on these other systems, why would Microsoft care about them?
> x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it
For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.
> Windows didn’t run on these other systems, why would Microsoft care about them?
Because it was clear that high core count, single system image platforms were a viable server architecture, and NT was vying for the entire server space, intending to kill off the vendor Unices.
. For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.
Linux wasn't the only OS. Solaris and AIX were NT's competitors too back then, and supported higher core counts.
The linked Processor Group documentation also says:
> Applications that do not call any functions that use processor affinity masks or processor numbers will operate correctly on all systems, regardless of the number of processors.
I suspect the limitation 7zip encountered was in how it checked how many logical processors a system has, to determine how many threads to spawn. GetActiveProcessorCount can tell you how many logical processors are on the system if you pass ALL_PROCESSOR_GROUPS, but that API was only added in Windows 7 (that said, that was more than 15 years ago, they probably could've found a moment to add and test a conditional call to it).
It isn't just detecting the extra logical processors, you have to do work to utilise them. From the linked text:
"If there are more than one processor group in Windows (on systems with more than
64 cpu threads), 7-Zip distributes running CPU threads across different processor groups."
The OS does not do that for you under Windows. Other OSs handle that many cores differently.
> more than 15 years ago, they probably could've found a moment to add and test a conditional call to it
I suspect it hasn't been an issue much at all until recently. Any single block of data worth spinning up that many threads for compressing is going to be very large, you don't want to split something into too small chunks for compression or you lose some benefit of the dynamic compression dictionary (sharing that between threads would add a lot of inter-thread coordination work, killing any performance gain even if the threads are running local enough on the CPU to share cache). Compression is not an inherently parallelizable task, at least not “embarrassingly” so like some processes.
Even when you do have something to compress that would benefit for more than 64 separate tasks in theory, unless it is all in RAM (or on an incredibly quick & low latency drive/array) the process is likely to be IO starved long before it is compute starved, when you have that much compute resource to hand.
Recent improvements in storage options & CPUs (and the bandwidth between them) have presumably pushed the occurrences of this being worthwhile (outside of artificial tests) from “practically zero” to “near zero, but it happens”, hence the change has been made.
Note that two or more 7-zip instances working on different data could always use more than 64 threads between them, if enough cores to make that useful were available.
Are you sure that if you don't attempt to set any affinities, Windows won't schedule 64+ threads over other processor groups? I don't have any system handy that'll produce more than 64 logical processors to test this, but I'd be surprised if Windows' scheduler won't distribute a process's threads over other processor groups if you exceed the number of cores in the group it launches into.
The referenced text suggests applications will "work", but that isn't really explicit.
They're either wrong or thinking about windows 7/8/10. That page is quite clear.
> starting with Windows 11 and Windows Server 2022 the OS has changed to make processes and their threads span all processors in the system, across all processor groups, by default.
> Each process is assigned a primary group at creation, and by default all of its threads' primary group is the same. Each thread's ideal processor is in the thread's primary group, so threads will preferentially be scheduled to processors on their primary group, but they are able to be scheduled to processors on any other group.
That depends on what format you're using. Zip compresses every file separately. Bzip and zstd have pretty small maximum block sizes and gzip doesn't gain much from large blocks anyway. And even when you're making large blocks, you can dump a lot of parallelism into searching for repeat data.
I worry that 7-Zip is going to lose relevance because lack of zstd support. zlib's performance is intolerable for large files and zlib-ng's SIMD implementation only helps here a bit. Which is a shame, because 7-Zip is a pretty amazing container format, especially with its encryption and file splitting capabilities.
I use ZSTD a ton in my programming work where efficiency matters.
But for sharing files with other people, ZIP is still king. Even 7z or RAR is niche. Everyone can open a ZIP file, and they don't really care if the file is a few MBs bigger.
> Everyone can open a ZIP file, and they don't really care if the file is a few MBs bigger.
You can use ZSTD with ZIP files too! It's compression method 93 (see https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which is the official ZIP file specification).
Which reveals that "everyone can open a ZIP file" is a lie. Sure, everyone can open a ZIP file, as long as that file uses only a limited subset of the ZIP format features. Which is why formats which use ZIP as a base (Java JAR files, OpenDocument files, new Office files) standardize such a subset; but for general-purpose ZIP files, there's no such standard.
(I have encountered such ZIP files in the wild; "unzip" can't decompress them, though p7zip worked for these particular ZIP files.)
> new Office files
I know what you mean, I’m not being pedantic, but I just realized it’s been 19 years. I wonder when we’ll start calling them “Office files”.
Well, only a lunatic would use ZIP with anything but DEFLATE/DEFLATE64
Not that many people care about zstd; I would assume most 7-zip users care about the convenience of the gui.
.. but 7-zip has a pretty terrible GUI?
Hence why PeaZip is so popular, and J-Zip used to be before it was stuffed with adware.
Most people won't use that GUI, but will right click file or folder -> 7-Zip -> Add To ... and it will spit out a file without questions.
Granted Windows 11 has started doing the same for its zip and 7zip compressors.
Same trick goes for opening archives or executables (Installers) as archives.
All the GUI I need is right click-> extract here or to folder. And 7zip is doing that nicely.
> .. but 7-zip has a pretty terrible GUI?
Since you're asking, the answer is no. 7-Zip has an efficient and elegant UI.
if by gui u mean the ability to right click a .zip file and unzip it just through the little window that pops up ur totally right. At least that + the unzipping progress bar is what I appreciate 7zip for
https://github.com/mcmilk/7-Zip-zstd
https://github.com/M2Team/NanaZip
It includes the above patches as well as few QoL features.
Thanks! Any ideas why it didn't get merged? Clearly 7-Zip has some development activity going on and so does this fork...
Being a bit faster or efficient won't make most people switch. 7z offers great UX (convenient GUI and support for many formats) that keeps people around.
If anything, the gui and ux is terrible compared to winrar.
Why was there a limitation on Windows? I can't find any such limit for Linux.
A lot of synchronization primitives in the NT kernel are based on a register width bit mask of a CPU set, so each collection of 64 hardware threads on 64 bit systems kind of runs in its own instance of the scheduler. It's also unfortunately part of the driver ABI since these ops were implemented as macros and inline functions.
Because of that, transitioning a software thread to another processor group is a manual process that has to be managed by user space.
Wow. That's surprisingly lame.
The NT kernel dates back to 1993. Computers didn’t exceed 64 logical processors per system until around 2014. And doing it back then required a ridiculously expensive server with 8 Intel CPUs.
The technical decision Microsoft made initially worked well for over two decades. I don’t think it was lame; I believe it was a solid choice back then.
I mean, x86 didn't, but other systems had been exceeding 64 cores since the late 90s.
And x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it.
> And x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it.
If that were the case the above system wouldn't have needed 8 sockets. With NUMA systems the app needs to be scheduling group aware anyways. The difference here really appears when you have a single socket with more than 64 hardware threads, which took until ~2019 for x86.
There were single image systems with hundreds of cores in the late 90s and thousands of cores in the early 2000s.
I absolutely stand by the fact that Intel and AMD didn't pursue high core count systems until that point because they were so focused on single core perf, in part because Windows didn't support high core counts. The end of Denmark scing forced their hand and Microsoft's processor group hack.
Do you have anything to say regarding NUMA for the 90s core counts though? As I said, it's not enough that there were a lot of cores - they have to be monolithically scheduled to matter. The largest UMA design I can recall was the CS6400 in 1993, to go past that they started to introduce NUMA designs.
> other systems had been exceeding 64 cores since the late 90s.
Windows didn’t run on these other systems, why would Microsoft care about them?
> x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it
For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.
> Windows didn’t run on these other systems, why would Microsoft care about them?
Because it was clear that high core count, single system image platforms were a viable server architecture, and NT was vying for the entire server space, intending to kill off the vendor Unices.
. For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.
Linux wasn't the only OS. Solaris and AIX were NT's competitors too back then, and supported higher core counts.
Seems like this is a general Windows thing per https://learn.microsoft.com/en-us/windows/win32/procthread/p... - applications that want to run on more than 64 CPUs need to be written with dedicated support for doing so.
The linked Processor Group documentation also says:
> Applications that do not call any functions that use processor affinity masks or processor numbers will operate correctly on all systems, regardless of the number of processors.
I suspect the limitation 7zip encountered was in how it checked how many logical processors a system has, to determine how many threads to spawn. GetActiveProcessorCount can tell you how many logical processors are on the system if you pass ALL_PROCESSOR_GROUPS, but that API was only added in Windows 7 (that said, that was more than 15 years ago, they probably could've found a moment to add and test a conditional call to it).
It isn't just detecting the extra logical processors, you have to do work to utilise them. From the linked text:
"If there are more than one processor group in Windows (on systems with more than 64 cpu threads), 7-Zip distributes running CPU threads across different processor groups."
The OS does not do that for you under Windows. Other OSs handle that many cores differently.
> more than 15 years ago, they probably could've found a moment to add and test a conditional call to it
I suspect it hasn't been an issue much at all until recently. Any single block of data worth spinning up that many threads for compressing is going to be very large, you don't want to split something into too small chunks for compression or you lose some benefit of the dynamic compression dictionary (sharing that between threads would add a lot of inter-thread coordination work, killing any performance gain even if the threads are running local enough on the CPU to share cache). Compression is not an inherently parallelizable task, at least not “embarrassingly” so like some processes.
Even when you do have something to compress that would benefit for more than 64 separate tasks in theory, unless it is all in RAM (or on an incredibly quick & low latency drive/array) the process is likely to be IO starved long before it is compute starved, when you have that much compute resource to hand.
Recent improvements in storage options & CPUs (and the bandwidth between them) have presumably pushed the occurrences of this being worthwhile (outside of artificial tests) from “practically zero” to “near zero, but it happens”, hence the change has been made.
Note that two or more 7-zip instances working on different data could always use more than 64 threads between them, if enough cores to make that useful were available.
Are you sure that if you don't attempt to set any affinities, Windows won't schedule 64+ threads over other processor groups? I don't have any system handy that'll produce more than 64 logical processors to test this, but I'd be surprised if Windows' scheduler won't distribute a process's threads over other processor groups if you exceed the number of cores in the group it launches into.
The referenced text suggests applications will "work", but that isn't really explicit.
They're either wrong or thinking about windows 7/8/10. That page is quite clear.
> starting with Windows 11 and Windows Server 2022 the OS has changed to make processes and their threads span all processors in the system, across all processor groups, by default.
> Each process is assigned a primary group at creation, and by default all of its threads' primary group is the same. Each thread's ideal processor is in the thread's primary group, so threads will preferentially be scheduled to processors on their primary group, but they are able to be scheduled to processors on any other group.
That depends on what format you're using. Zip compresses every file separately. Bzip and zstd have pretty small maximum block sizes and gzip doesn't gain much from large blocks anyway. And even when you're making large blocks, you can dump a lot of parallelism into searching for repeat data.
Windows has a concept of processor groups, that can have up to 64 (hardware) threads. I assume they updated 7zip to support multiple processor groups.
Maybe WaitForMultipleObjects limit of 64 (MAXIMUM_WAIT_OBJECTS) applies?
An ugly limitation on an API that initially looks superior to Linux equivalents.
7-zip is one of the software that I miss since I’ve moved to macOS
Keka is also really nice!
https://www.keka.io/
Never heard of it, I'll give it a try!
If you're talking about the program you use in the terminal, you can install it via homebrew
How about PeaZip?
I've used PeaZip in the past but only on Windows, I was not aware that a MacOS version exists! I'll give it a try. Cheers
https://xkcd.com/619/
[flagged]