LuaSrcDiet reduces the size of Lua 5.1.x source files by
aggressively removing
all unnecessary whitespace and comments, optimizing constant tokens,
and renaming local variables to shorter names. See below for sample
output. The 5.1.x version is being actively worked on, while the older
5.0.x version is unmaintained. The older 5.0.x version cannot optimize
constant tokens or rename local variable names. The 5.1.x version
should not be considered completely error-free, but currently it can
reprocess its own source files without errors.
LuaSrcDiet is broadly similar to Luiz's lstrip
for Lua 5.1, which can be found on Luiz's Libraries and
tools for Lua page. Both work on tokens; LuaSrcDiet's modified
lexer and parser allows most optimization options to be enabled or
disabled separately, while lstrip cannot optimize constant tokens or
rename local variable names. LuaSrcDiet squeezes its own (heavily
commented) sources from 121KB down to 28KB. Further compression (bzip2
or lzma) brings the size down to under 10KB, representing a 12X
reduction in size.
Squeezing the sources and renaming locals can be used as a weak form of obfuscation. However,
note that the structure and arrangement of the source code stays
exactly the same, so do not
depend on such a weak form of obfuscation if you really needed
heavy-duty obfuscation.
For now, LuaSrcDiet has only optimization options that do not
require user assistance. There are many optimizations that can be
performed when there is user assistance, but user assistance requires
more code to maintain and keep consistent. If such user-assisted
optimizations are needed, it is easy to take advantage of the modular
nature of LuaSrcDiet to, say, reuse the lexer to scan and replace
special constants.
See below for more information,
statistics and a discussion on the performance of LuaSrcDiet.
[New, 20090606] Uh, not really, this is just some information for
clarification purposes. Owing to the use of LuaSrcDiet among WoW
add-ons, I've added the
following so that users of LuaSrcDiet can get a better picture of this
author's intentions.
LuaSrcDiet features include the following:
List of optimizations:
If comment removal is disabled, LuaSrcDiet only removes trailing
whitespace. LuaSrcDiet does not remove trailing whitespace in long
strings, electing to generate a warning instead. If empty line removal
is disabled, LuaSrcDiet keeps all significant code on the same lines
(this has not been extensively tested.) Thus, a user is able to debug
using the original sources since the line numbering is unchanged.
String optimization deals mainly with optimizing escape sequences,
but delimiters can be switched between single quotes and double quotes
if the source size of the string can be reduced. For long strings and
long comments, LuaSrcDiet also tries to reduce the '=' separators in
the delimiters if possible. For number optimization, LuaSrcDiet saves
space by trying to generate the shortest possible sequence, and in the
process it does not produce 'proper' scientific notation (e.g. 1.23e5)
but does away with the decimal point (e.g. 123e3) instead.
The local variable name optimizer uses a full parser of Lua 5.1
source
code, thus it can rename all local variables, including upvalues and
function parameters. It should handle the implicit "self" parameter
gracefully. In addition, local variable names are renamed into the
shortest possible names, starting with 'e', following English frequent
letter usage. Variable names are also reused whenever possible,
reducing the number of unique variable names. For example, in
LuaSrcDiet.lua (version
0.11.0), 683 local identifiers representing 88 unique
names were optimized into 32 unique names, all which are one character
in length, saving over 2600 bytes.
The --opt-entropy option
(version 0.11.1) calculates actual symbol frequencies so that local
variable names reuse the more popular letters first. This is slightly
better than following a English frequent
letter table. Note that the output file size stays the same, but you
will get a very slight improvement when the file is compressed. Initial
tests indicate the improvement for --maximum
over LuaSrcDiet 0.11.0 to be about 1% or less, comparing
bzip2-compressed file sizes.
Project Links on LuaForge: LuaSrcDiet
project page | File
releases
Latest
version: LuaSrcDiet-0.11.2
(106KB tar.gz)
What's new?
The following is the result of processing llex.lua from LuaSrcDiet 0.11.0
using various optimization options:
| LuaSrcDiet
Option |
Size (bytes) |
Link to
Sample File |
| Original |
12,421 |
llex.lua.txt |
| Empty lines only |
12,395 |
llex_emptylines.lua.txt |
| Whitespace only |
9,372 |
llex_whitespace.lua.txt |
| Local rename only |
11,794 |
llex_locals.lua.txt |
| --basic
(token optimizations only) |
3,835 |
llex_basic.lua.txt |
| Program default |
3,208 |
llex_default.lua.txt |
| --maximum |
3,130 |
llex_maximum.lua.txt |
A sample output of LuaSrcDiet 0.11.0 for processing llex.lua at maximum settings is
as follows:
Statistics for: llex.lua -> sample/llex.lua |
Overall, the file size reduced by more than 9KB. Tokens are
classified into 'real' or actual tokens, and 'fake' or whitespace
tokens. The number of 'real' tokens remained the same. Short comments
and long comments were completely eliminated. The number of line
endings was reduced by 59, while all but 152 whitespace characters were
optimized away. So, token separators (whitespace, including line
endings) now takes up just 10% of the total file size. No optimization
of number tokens was possible, while 2 bytes were saved for string
tokens.
For local variable name optimization, the report shows that 38
unique local variable names were reduced to 20 unique names. The number
of identifier tokens should stay the same (there is currently no
optimization option to optimize away non-essential or unused 'real'
tokens.) Since there can be at most 53 single-character identifiers,
all local variables are now one character in length. Over 600 bytes was
saved.
File sizes of LuaSrcDiet 0.11.0 main files in various forms:
| Source File |
Original Size (bytes) |
luac normal (bytes) |
luac stripped (bytes) |
LuaSrcDiet --basic (bytes) |
LuaSrcDiet --maximum (bytes) |
| LuaSrcDiet.lua
|
21,961 |
20,952 |
11,000 |
11,005 |
8,159 |
| llex.lua |
12,421 |
8,613 |
4,247 |
3,835 |
3,130 |
| lparser.lua |
41,757 |
27,215 |
12,506 |
11,755 |
7,666 |
| optlex.lua |
31,009 |
16,992 |
8,021 |
9,129 |
6,858 |
| optparser.lua |
16,511 |
9,021 |
3,520 |
5,087 |
2,999 |
| Total |
123,659 |
82,793 |
39,294 |
40,811 |
28,812 |
Compressibility of LuaSrcDiet 0.11.0 main files in various forms:
| Compression
Method |
Original size (bytes) |
luac normal (bytes) |
luac stripped (bytes) |
LuaSrcDiet --basic (bytes) |
LuaSrcDiet --maximum (bytes) |
| Uncompressed originals |
123,659 |
82,793 |
39,294 |
40,811 |
28,812 |
| gzip -9 |
28,288 |
29,210 |
17,732 |
12,041 |
10,451 |
| bzip2 -9 |
24,407 |
27,232 |
16,856 |
11,480 |
9,815 |
| 7-zip (max) (lzma) |
25,530 |
23,908 |
15,741 |
11,241 |
9,685 |
So, squeezed source code are smaller than stripped binary chunks and
compresses better than stripped binary chunks, at a ratio of 2.9 for
squeezed source code versus 2.3 for stripped binary chunks. Compressed
binary chunks is still a very efficient way of storing Lua scripts,
because using only binary chunks allow for the parts of Lua needed to
compile from sources to be omitted (llex.o, lparser.o, lcode.o, ldump.o), saving over 24KB in
the process.
Note that LuaSrcDiet does not answer
the question of whether embedding source code is better or embedding
binary chunks is better. It is simply a utility for producing smaller
source code files and an exercise in processing Lua source code using a
Lua-based lexer and parser skeleton.
The following is a primitive attempt to analyze in-memory Lua script
loading performance (using the loadstring
function in Lua.)
The LuaSrcDiet 0.11.0 files (original, squeezed with --maximum and
stripped binary chunks versions) are loaded into memory first before a
loop runs to repeatedly load the script files for 10 seconds. A null
loop is also performed (processing empty strings) and the time taken
per null iteration is subtracted as a form of null adjustment. Then,
various performance parameters are calculated. Note that LuaSrcDiet.lua was slightly
modified (#! line removed) to let the loadstring function run. The
results below were obtained with a Lua 5.1.3 executable compiled using "make generic" on Cygwin/Windows
XP SP2 on a Sempron 3000+ (1.8GHz). The LuaSrcDiet 0.11.0 source files
have 11,180 'real' tokens in total.
| Null loop |
Stripped binary chunk |
Original Sources |
Squeezed Sources |
||
| Total Size |
(bytes) |
0 |
39,294 |
123,640 |
28,793 |
| Iterations |
312,155 |
9,680 |
1306 |
1,592 |
|
| Duration |
(sec) |
10 |
10 |
10 |
10 |
| Time/iteration |
(msec) |
0.032 |
1.033 |
7.657 |
6.281 |
| Time/iteration,
adjusted for null |
(msec) |
- |
1.001 |
7.625 |
6.249 |
| Load rate |
(MB/sec) |
- |
37.44 |
15.46 |
4.39 |
| Load time per byte |
(ns) |
- |
25.5 |
61.7 |
217.0 |
| Load time per token | (ns) |
- |
- |
682 |
559 |
| Source time vs binary chunk time
ratio |
- |
1.00 |
7.62 |
6.24 |
|
| Binary chunk rate vs. source rate ratio | - |
1.00 |
2.42 |
8.53 |
The above shows that stripped binary chunks is still, in many ways,
the highest-performance form of fixed Lua scripts. On a very average
machine, scripts load at over 37MB/sec (in memory). This is very
comparable to the burst speeds of common desktop hard disks of 2008. If
instant response is paramount, stripped binary chunks has little
competition.
By contrast, source code that is squeezed to the maximum using
LuaSrcDiet can only muster an in-memory load rate of 4.4MB/sec. The
original sources load at about 15.5MB/sec, but most of the speed is
from the lexer scanning over comments and whitespace. A quick
calculation indicates that the speed of the lexer over comments and
whitespace can be as much as 65MB/sec, but note that the speed is all
for naught. What really matters are the real tokens, and the squeezed
source code manages to load faster than the original sources by 18%.
So, the loading of stripped binary chunks is faster than squeezed
source code by a bit over 6X. The 4.4MB/sec speed for squeezed source
code is still quite respectable. When an application considers the time
taken to load data from the disk and perhaps the time taken to
decompress, loading source code may be perfectly fine in terms of
performance. For programs that already embed source code, using
LuaSrcDiet to squeeze the source code probably speeds loading up by a
tiny bit in addition to making programs smaller.
Thanks to the LuaForge team for
hosting this material. This page was written on SeaMonkey.
LuaSrcDiet was
developed using the SciTE editor on
Cygwin, and managed using SVN. Parts of LuaSrcDiet is
based on Yueliang, which
is in turn based on the Lua
sources.