of those 847,249,408 bytes, all but 22m are in a squashfs file called linuxfs. so this is really about that file, and the files that are in it.
if you decompress the file, you will find out what size everything in it would be installed on your machine. the installation doesnt technically need to copy every file from there; but if it did those files would take up 2,584,420,683 bytes or 2.41g. squashfs is getting the size of the compressed file just under 1/3 the actual size.
but not every file compresses the same; you cant (necessarily) pick a 100 files at random, divide by 3 and take that off the cd size.
you can remove some files, compress the whole thing using mksquashfs and then you will have the actual size, but every time you do that it takes 15-25 minutes to compress everything. wouldnt it be nice if we could do better? ive spent a few hours (more than 5) trying to figure out how. perhaps searching the internet would be a better use of that time, but i enjoy experimenting-- i learn stuff that i can sometimes put to immediate use.
i actually use mksquashfs, and when i do i use xz compression; i think thats typical. my first idea was to find out if compressing the files with xz would let us use tar or xz or unxz with --list...
nope. it tars ALL the files, then compresses that-- we only get the compressed size of the entire thing, and we already have that information; its the size of linuxfs (or basically the size of the iso.)
unsquashfs has -lls to list files without decompressing, but that doesnt help either. the size of libxul.so from -lls:
73,274,760
and the size if we mount linuxfs and du -b the same file:
73,274,760 ...thats the uncompressed size again.
now suppose we mounted the iso, then mounted linuxfs, and used xz (with the default settings) to compress each file in it INDIVIDUALLY. how long would that take?
about an HOUR AND a HALF. thats on an i-series intel processor, not a core2. it might go faster on a solid-state drive, or loading it all into ram first.
the actual size of the linuxfs file: 824,418,304 bytes
the size of each actual-size-file inside it, run through xz -c | wc -c to get a compressed bytecount and tallied?
886,843,744. using similar compression of INDIVIDUAL files, our total is off by 59.5m-- not bad; at first i thought id compressed with the wrong setting. (the fairly safe xz default is -6... it can go higher.)
i havent tried compressing each file individually with mksquashfs. however, i did try running tar -cvJf on the whole thing, to find out if that was more efficient. i figured it would be (but it doesnt give us the information we need.)
the same files tarred and compressed for comparison: 698,450,164. now someones wondering what we get if we compress the antix 17 iso for a faster download. ok, i will do that for you:
xz antix17.iso (a copy of the antix 17 iso) ; du -b antix17.iso.xz # 834,423,476
a savings of only 12.2m. weird, huh? so to review:
linuxfs: 824,418,304
linuxfs unsquashed and tarred into one file, then compressed with xz: 698,450,164 (an entirely useless number for this.)
linuxfs unsquashed and then each file processed individually with xz (so we can estimate compressed size in linuxfs): 886,843,744
in linuxfs there are 131,430 files. on average, we have OVER-estimated the size of each files compressed size by nearly 475 bytes.
practically speaking, that means that if we take our table of estimated compressed sizes and SUBTRACT the sizes of files we intend to delete, we OVER-estimate the savings and underestimate the final compressed size.
but it also suggests that if we take our table of estimated compressed sizes and ADD the remaining sizes (other than the ones we intend to delete) then we UNDER-estimate the savings and overestimate the final compressed size.
so if youre looking for a MUCH faster, more accurate way to estimate the iso size after compression, take the number 22m (23068672) and add the files from this table that you plan to keep in the iso:
Code: Select all
compressed uncompressed compressed total
493132 1265272 493132 /root/squashdu/linuxfs/bin/bash
261604 621700 754736 /root/squashdu/linuxfs/bin/btrfs
130780 297444 885516 /root/squashdu/linuxfs/bin/btrfs-calc-size
142084 326180 1027600 /root/squashdu/linuxfs/bin/btrfs-convert
130928 297444 1158528 /root/squashdu/linuxfs/bin/btrfs-debug-tree
129164 293348 1287692 /root/squashdu/linuxfs/bin/btrfs-find-root
140940 322084 1428632 /root/squashdu/linuxfs/bin/btrfs-image
130208 297444 1558840 /root/squashdu/linuxfs/bin/btrfs-map-logical
128684 293348 1687524 /root/squashdu/linuxfs/bin/btrfs-select-super
131236 301764 1818760 /root/squashdu/linuxfs/bin/btrfs-show-super
128748 293348 1947508 /root/squashdu/linuxfs/bin/btrfs-zero-log
130588 297444 2078096 /root/squashdu/linuxfs/bin/btrfstune
13212 34480 2091308 /root/squashdu/linuxfs/bin/bunzip2
335072 625828 2426380 /root/squashdu/linuxfs/bin/busybox
13212 34480 2439592 /root/squashdu/linuxfs/bin/bzcat
988 2140 2440580 /root/squashdu/linuxfs/bin/bzdiff
2092 4877 2442672 /root/squashdu/linuxfs/bin/bzexe
1688 3642 2444360 /root/squashdu/linuxfs/bin/bzgrep
13212 34480 2457572 /root/squashdu/linuxfs/bin/bzip2
and you should get a pretty accurate guess there.
is this overkill? it depends on how many times youve created a squashfs file only to have too large an iso. if you want to guess how many files you need to get rid of to make an iso fit on a cd, this could get you closer, faster.
but it still takes one person / machine running this script for an hour and a half or more, to get the table.
and no, i dont have the entire table; the first time i ran it, i didnt direct it to a file. but i timed it!
and here is the script: feel free to add newlines, i like one-liners. i put them in the bash history and grep as needed.
Code: Select all
iso="antiX-17.b1_386-full.iso" ; mkdir /root/squashdu ; mkdir /root/squashdu/linuxfs ; mkdir /root/squashdu/iso ; mount"$iso" /root/squashdu/iso/ ; mount /root/squashdu/iso/antiX/linuxfs /root/squashdu/linuxfs/ ; d=$(date) ; tot=0 ; echo -e"compressed\tuncompressed\tcompressed total" ; for p in $(find /root/squashdu/linuxfs -type f | cat -A | tr ' ' '^' | sed"s/\$$//g") ; do pf="$(echo $p | tr '^' ' ')" cs=$(xz"$pf" -c 2> /dev/null | wc -c 2> /dev/null) ; uc=$(du -b"$pf" 2> /dev/null | cut -f 1 2> /dev/null) ; tot=$(($tot+cs)) ; echo -e"$cs\t\t$uc\t\t$tot\t\t$pf" ; done ; echo ; echo"start: $d" ; echo -n"complete:" ; date ; umount /root/squashdu/linuxfs/ ; umount /root/squashdu/iso/ ; rmdir /root/squashdu/linuxfs/ /root/squashdu/iso/ ; rmdir /root/squashdu/ #### license: creative commons cc0 1.0 (public domain) http://creativecommons.org/publicdomain/zero/1.0/
this entire post is in the public domain, if you want to flatter me by posting it online somewhere.
oh, and its entirely possible someone else has done this. i havent checked, but i tried a lot of obvious alternatives. if you know another, post it here!