First off, the display looks great!
Second off, I didn't realize how deep the dep tree would be for this type of program -- 141 total! So much of it is the url crate, itself a dep of the git crate, but there's a bunch of others too. I'm just getting into learning Rust -- is this typical of Rust projects or perhaps typical of TUI projects in general?
(EDIT to strikeout) ~~The binary is also 53M as a result whereas /usr/sbin/tree is 80K on my machine -- not really a problem on today's storage, but very roughly 500-1000x different in size isn't nothing.~~
Maybe it's linking-related? I don't know how to check really.
(EDIT: many have pointed out that you can run `cargo build --release` with other options to get a much smaller binary. Thanks for teaching me!)
> The binary is also 53M
That's a debug binary, and the vast majority of that is debug symbols. A release build of this project is 4.3M, an order of magnitude smaller.
Also, compiling out the default features of the git2 crate eliminates several dependencies and reduces it further to 3.6M.
https://github.com/bgreenwell/lstr/pull/5
https://github.com/rust-lang/git2-rs/pull/1168
Stripping the binary further improves it to 2.9M, and some further optimizations get it to 2.2M without any compromise to performance. (You can get it smaller by optimizing for size, but I wouldn't recommend that unless you really do value size more than performance.)
No offense, but 4.3MB is huge for what it does. Most shells take less space than that! Where's all the bloat coming from?
> Most shells take less space than that!
Most shells dynamically link to a runtime your OS provides "for free". The 4.3 MiB binary in question is bundling the Rust runtime and its dependencies.
For reference, a statically-compiled C++ "Hello, World" is 2.2 MiB after stripping.
% cat hello.nix { pkgs ? import <nixpkgs> { crossSystem = "aarch64-linux"; } }: pkgs.stdenv.mkDerivation { name = "hello-static"; src = pkgs.writeText "hello.cpp" '' #include <iostream> int main() { std::cout << "Hello, World!" << std::endl; return 0; } ''; dontUnpack = true; buildInputs = [ pkgs.glibc.static ]; buildPhase = "$CXX -std=c++17 -static -o hello $src"; installPhase = "mkdir -p $out/bin; cp hello $out/bin/"; } % nix-build hello.nix ... % wc -c result/bin/hello 2224640 result/bin/hello
2.2MiB for "Hello, World"? I must be getting old...
The executable takes 33KB in C, 75KB in nim.
By switching to e.g. musl, you can go down to a single megabyte ;)
But in all seriousness, my example is quite cherrypicked, since nobody will actually statically link glibc. And even if they did, one can make use of link-time optimization to remove lots of patches of unused code. Note that this is the same strategy one would employ to debloat their Rust binaries. (Use LTO, don't aggressively inline code, etc.)
Just a `puts("Hello world!")` with -Os statically linked to musl is 22k
Just for fun, I wondered how small a canonical hello world program could be in macOS running an ARM processor. Below is based on what I found here[0] with minor command-line switch alterations to account for a newer OS version.
ARM64 assembly program (hw.s):
Assembling and linking commands:// // Assembler program to print "Hello World!" // to stdout. // // X0-X2 - parameters to linux function services // X16 - linux function number // .global _start // Provide program starting address to linker .align 2 // Setup the parameters to print hello world // and then call Linux to do it. _start: mov X0, #1 // 1 = StdOut adr X1, helloworld // string to print mov X2, #13 // length of our string mov X16, #4 // MacOS write system call svc 0 // Call linux to output the string // Setup the parameters to exit the program // and then call Linux to do it. mov X0, #0 // Use 0 return code mov X16, #1 // Service command code 1 terminates this program svc 0 // Call MacOS to terminate the program helloworld: .ascii "Hello World!\n"
Resulting file sizes:as -o hw.o hw.s && ld -macos_version_min 14.0.0 -o hw hw.o -lSystem -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.2.sdk -e _start -arch arm64
0 - https://smist08.wordpress.com/2021/01/08/apple-m1-assembly-l...-rwxr-xr-x 1 <uid> <gid> 16K Jun 18 21:23 hw -rw-r--r-- 1 <uid> <gid> 440B Jun 18 21:23 hw.o -rw-r--r-- 1 <uid> <gid> 862B Jun 18 21:21 hw.s
> The executable takes 33KB in C, 75KB in nim.
Did you statically link Glibc...? Or is this with a non-GNU libc?
Either way, it probably is true that this 2.2MiB number would be smaller on, say, Debian 5. And much smaller on PDP Unix.
We just have large standard libraries now
lto will remove most of it.
> Most shells dynamically link to a runtime your OS provides "for free"
Rust binaries also dynamically link to and rely on this runtime.
That's not intrinsically or pervasively true, although it's not uncommon.
Yes. The main areas where Rust code uses dynamic linking by default are Glibc and OpenSSL (through the popular `native-tls` crate). Most things outside of that will be statically linked by default. There is room to improve the situation by making more C wrapper libraries support either method.
For Rust code talking to Rust libraries (such as `std`), it's a totally different challenge, similar to what you face in C++. You can compile your pure Rust app in a way to divide it up into DLLs. The problem is that the polymorphism in Rust means these DLLs must all be built together. Calling a polymorphic function means code has to be generated for it. The only way around this is for your Rust library to speak the C ABI, which bloats code as building your C-style API surface will resolve all the polymorphized functions/structs, but at least gets you swappable dynamic linking.
The only way to avoid it is to be on linux with no_std, use musl statically or be on embedded. You cannot (or at least are supposed to not be able to) link to glibc statically and on every other OS you can only call syscalls via the system libraries. (Well, technically you can on most systems, it's just not stable across updates. OpenBSD will actively block it though)
Unless the majority of Rust builds on Linux statically link musl libc or use no_std, then it's pervasively true. And it's true on most non-Linux targets, including the BSDs and macOS. It's the same situation with Go.
why did you embed the c++ code in the .nix file?
just to have everything in one file? how to show how to do it with nix?
because it seem simpler to have a separate C++ file, and a simple shell script or makefile to compile it.
e.g. although I could figure out roughly what the .nix file does, many more people would know plain unix shell than nix.
and where is $out defined in the .nix file?
The nix file is besides the point - it gives you a totally hermetic build environment. Not OP, but it’s the only way I know how to get gcc to use a static glibc. All you should pay attention to is that it’s using a static glibc.
$out is a magic variable in nix that means the output of the derivation - the directory that nix moves to its final destination
> Not OP, but it’s the only way I know how to get gcc to use a static glibc.
/tmp$ gcc -O3 test.c -o test /tmp$ ldd test linux-vdso.so.1 (0x00007f3d9fbfe000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3d9f9e8000) /lib64/ld-linux-x86-64.so.2 (0x00007f3d9fc00000) /tmp$ gcc -static -O3 test.c -o test /tmp$ ldd test not a dynamic executable
>> Not OP, but it’s the only way I know how to get gcc to use a static glibc.
> /tmp$ gcc -static -O3 test.c -o test /tmp$ ldd test not a dynamic executable
yes, that last line above means it's a statically linked executable.
yes, i had a doubt about what the GP said, about their nix way being the only way to create a statically linked executable.
but I didn't remember all the details, because it's been a while since I worked with C in depth (moved to Java, Ruby, Python, etc.)(though I did a lot of that earlier, even in pre-Linux years), so I didn't say anything else. thanks, Josh Triplett for clarifying.
but one thing I do remember, is that static linking was the only option in the beginning, at least on Unix, and dynamic linking came only some time later.
when I started working on UNIX and C, there was no dynamic linking at all, IIRC.
https://en.m.wikipedia.org/wiki/Static_library
("dynamic linking" topic in above page links to the below page in Wikipedia: )
I thought glibc had some hacks in it to prevent it from working fully when statically linked? Is this just a myth or outdated or only affects C/C++ or what?
The issue is that some features of glibc want to dlopen additional libraries, most notably NSS. If you call `gethostbyname`, even a static glibc will try to dlopen NSS libraries based on /etc/nsswitch.conf, and if the dynamic NSS libraries are incompatible with your statically linked glibc, you'll have problems.
musl, by contrast, doesn't support NSS at all, only /etc/hosts and DNS servers listed in /etc/resolv.conf, so whether you statically or dynamically link musl, you just won't have any support for (for instance) mDNS, or dynamic users, or local containers, or various other bits of name resolution users may expect to Just Work.
thanks.
[flagged]
and by the way, ignorant mindlessly downvoting dudes who don't even bother to check if a comment is right or not, can shove it up, and take a flying leap into Lake Titicaca. they'll meet a lot of their brothers there, like giant frogs.
from a Google search:
>Overview Lake Titicaca, straddling the border between Peru and Bolivia in the Andes Mountains, is one of South America's largest lakes and the world’s highest navigable body of water. Said to be the birthplace of the Incas, it’s home to numerous ruins. Its waters are famously still and brightly reflective. Around it is Titicaca National Reserve, sheltering rare aquatic wildlife such as giant frogs.
:)
For reference, some statically-linked shells on my system:
For comparison, some dynamically-linked binaries (some old)2288K /bin/bash-static (per manual, "too big and too slow") 1936K /bin/busybox-static (including tools not just the shell) 192K /usr/lib/klibc/bin/mksh 2456K zsh-static
(The reason I don't have static binaries handy is because they no longer run on modern systems. As long as you aren't using shitty libraries, dynamic binaries are more portable and reliable, contrary to internet "wisdom".)804K ./bin/bash-3.2 888K ./bin/bash-4.0 908K ./bin/bash-4.1 956K ./bin/bash-4.2 1016K ./bin/bash-4.3 1092K ./bin/bash-4.4 1176K ./bin/bash-5.0 1208K ./bin/bash-5.1 1236K /bin/bash (5.2) 124K /bin/dash 1448K /bin/ksh93 (fattest when excluding libc!) 292K /bin/mksh 144K /bin/posh 424K /bin/yash 848K /bin/zsh
Among the features it has: an interactive terminal GUI, threaded parallel directory walking, and git repository support. In around a thousand lines of code, total, including tests, half of which is the GUI.
*TUI. Not GUI
I feel like that's just the result of having a native package manager making natural bloat and a compiler which hasn't had decades of work.
Likely needs features tuned. I compared Eza, similarly in Rust, and it's 1.6 MiB compiled. Looking at the Cargo.toml, it includes git2 with default-features = false. https://github.com/eza-community/eza/blob/main/Cargo.toml
Building in release:
Building with other release options brings it down to 2.3M:cargo build --release du -sh ./target/release/lstr -> 4.4M
[profile.release] codegen-units = 1 opt-level = "s" lto = true panic = "abort" strip = "symbols"
I did some benchmarks on one of our CLI and found that `opt-level = "z"` reduced the size from 2.68M to 2.28M, and shaved 10% on the build time, worth a try.
I'll try with `panic = "abort"` for our next release, thanks for the reminder.
You are probably looking at a debug build. On Linux, a release build (cargo build -r) is ~4.3M, and down to ~3.5M once stripped. This could be reduced further with some tricks applied to the release build profile.
Great catch! Comments mentioned getting it down to ~2MB but that’s still humongous.
If you just think about how roughly (napkin math) 2MB can be 100k loc, that’s nuts
Is It though? You won't get it on an embedded device (maybe) but you could install a thousand of these tools and barely even notice the space being taken up on most machines
I think that’s a lame argument. First because it’s kind of a fallacy. Size is absolute not relative to something. Especially for software. No one thinks of software size primarily in the context of their disk space.
Further I think everyone keeps getting larger and larger memory because software keeps getting more and more bloated.
I remember when 64gb iPhone was more than enough (I don’t take pictures so just apps and data) Now my 128 is getting uncomfortable due to the os and app sizes. My next phone likely will be a 256
I’m usually the first to complain about bloat but your counterpoints to the GPs “lame arguments” are themselves, fallacies.
> First because it’s kind of a fallacy. Size is absolute not relative to something. Especially for software. No one thinks of software size primarily in the context of their disk space.
That’s exactly how most people think about file sizes.
When your disk is full, you don’t delete the smallest files first. You delete the biggest.
> Further I think everyone keeps getting larger and larger memory because software keeps getting more and more bloated.
RAM sizes have actually stagnated over the last decade.
> I remember when 64gb iPhone was more than enough (I don’t take pictures so just apps and data) Now my 128 is getting uncomfortable due to the os and app sizes. My next phone likely will be a 256
That’s because media sizes increase, not executable sizes.
And people do want higher resolution cameras, higher definition videos, improved audio quality, etc. These are genuinely desirable features.
Couple that with improved internet bandwidth allowing for content providers to push higher bitrate media, however the need to still locally cache media.
> That’s because media sizes increase, not executable sizes.
Part of it is app sizes on mobile. But it's apps in the 200mb - 2gb range that are the problem, not ones that single-digit megabytes.
200MB apps wouldn’t even make a dent on a 64GB device.
The 2GB apps are usually so large because they include high quality media assets. For example, Spotify will frequently consumer multiple GBs of storage but the vast majority of that is audio cache.
I currently have 355 apps installed on my phone, so if they were all 200mb then they wouldn't fit on a 64GB device.
I agree that the largest data use tends to be media assets.
I’m intrigued, how many of them are actual 3rd party apps though? And how many are different layers around an existing app or part of Apple / Googles base OS? The latter, in fairness, consumes several GBs of storage too.
I’m not trying to dismiss your point here. Genuinely curious how you’ve accumulated so many app installs.
It's an interesting question. Some of them are definitely from the OS (either Google or Samsung).
Looking through at categories of app where I have multiple, I'm seeing:
- Transport provider apps (Airlines, Trains, Buses, Taxis etc)
- Parking payment apps
- Food delivery apps
- Hotel apps
- Payment apps
- Messaging / Video calling apps
- Banking apps
- Mapping apps
It's especially easy to accumulate a lot of apps if you travel through multiple countries, as for a lot of these apps you need different ones in different countries.
> No one thinks of software size primarily in the context of their disk space.
This is wrong. The reason why many old tools are so small was because you had far less space. If you have a 20tb harddrive you wouldn't care about whether ls took up 1kb or 2mb, on a 1gb harddrive it matters/ed much more.
Optimization takes time, I'm sure if OP wanted he could shrink the binary size by quite a lot but doing so has its costs and nowadays its rarely worth paying that since nobody even notices wether a program is 2kb or 2mb. It doesn't matter anymore in the age of 1TB bootdrives.
So bloated software is motivating you to spend more for the larger capacity phone?
What incentive does Apple have to help iOS devs get package sizes down, then?
Size may be absolute, but bigness and smallness are inherently and inescapably relative.
When you include the code for all the dependency features this uses, you probably do end up close to 100k LoC net, no?
lib.rs has a nifty (and occasionally shocking) portrayal of this on their crate pages.
says for this one the deps clock in at: ~19–29MB ~487K SLoC
Try `cargo build --release --no-default-features` to get a much smaller binary (~5-10MB) - Rust statically links dependencies but supports conditional compilation for optional features.
Glancing at the Cargo.toml, the package doesn't define any features anyways. `cargo b --no-default-features` only applies to the packages you're building, not their dependencies -- that would lead to very unpredictable behavior