LuaSrcDiet


Introduction

LuaSrcDiet reduces the size of Lua 5.1.x source files by aggressively removing all unnecessary whitespace and comments, optimizing constant tokens, and renaming local variables to shorter names. See below for sample output. The 5.1.x version is being actively worked on, while the older 5.0.x version is unmaintained. The older 5.0.x version cannot optimize constant tokens or rename local variable names. The 5.1.x version should not be considered completely error-free, but currently it can reprocess its own source files without errors.

LuaSrcDiet is broadly similar to Luiz's lstrip for Lua 5.1, which can be found on Luiz's Libraries and tools for Lua page. Both work on tokens; LuaSrcDiet's modified lexer and parser allows most optimization options to be enabled or disabled separately, while lstrip cannot optimize constant tokens or rename local variable names. LuaSrcDiet squeezes its own (heavily commented) sources from 121KB down to 28KB. Further compression (bzip2 or lzma) brings the size down to under 10KB, representing a 12X reduction in size.

Squeezing the sources and renaming locals can be used as a weak form of obfuscation. However, note that the structure and arrangement of the source code stays exactly the same, so do not depend on such a weak form of obfuscation if you really needed heavy-duty obfuscation.

For now, LuaSrcDiet has only optimization options that do not require user assistance. There are many optimizations that can be performed when there is user assistance, but user assistance requires more code to maintain and keep consistent. If such user-assisted optimizations are needed, it is easy to take advantage of the modular nature of LuaSrcDiet to, say, reuse the lexer to scan and replace special constants.

See below for more information, statistics and a discussion on the performance of LuaSrcDiet.


Frequently Asked Questions

[New, 20090606] Uh, not really, this is just some information for clarification purposes. Owing to the use of LuaSrcDiet among WoW add-ons, I've added the following so that users of LuaSrcDiet can get a better picture of this author's intentions.



Features

LuaSrcDiet features include the following:

List of optimizations:

If comment removal is disabled, LuaSrcDiet only removes trailing whitespace. LuaSrcDiet does not remove trailing whitespace in long strings, electing to generate a warning instead. If empty line removal is disabled, LuaSrcDiet keeps all significant code on the same lines (this has not been extensively tested.) Thus, a user is able to debug using the original sources since the line numbering is unchanged.

String optimization deals mainly with optimizing escape sequences, but delimiters can be switched between single quotes and double quotes if the source size of the string can be reduced. For long strings and long comments, LuaSrcDiet also tries to reduce the '=' separators in the delimiters if possible. For number optimization, LuaSrcDiet saves space by trying to generate the shortest possible sequence, and in the process it does not produce 'proper' scientific notation (e.g. 1.23e5) but does away with the decimal point (e.g. 123e3) instead.

The local variable name optimizer uses a full parser of Lua 5.1 source code, thus it can rename all local variables, including upvalues and function parameters. It should handle the implicit "self" parameter gracefully. In addition, local variable names are renamed into the shortest possible names, starting with 'e', following English frequent letter usage. Variable names are also reused whenever possible, reducing the number of unique variable names. For example, in LuaSrcDiet.lua (version 0.11.0), 683 local identifiers representing 88 unique names were optimized into 32 unique names, all which are one character in length, saving over 2600 bytes.

The --opt-entropy option (version 0.11.1) calculates actual symbol frequencies so that local variable names reuse the more popular letters first. This is slightly better than following a English frequent letter table. Note that the output file size stays the same, but you will get a very slight improvement when the file is compressed. Initial tests indicate the improvement for --maximum over LuaSrcDiet 0.11.0 to be about 1% or less, comparing bzip2-compressed file sizes.


Download

Project Links on LuaForge: LuaSrcDiet project page | File releases

Latest version: LuaSrcDiet-0.11.2 (106KB tar.gz)

What's new?


Sample Output

The following is the result of processing llex.lua from LuaSrcDiet 0.11.0 using various optimization options:

LuaSrcDiet Option
Size (bytes)
Link to Sample File
Original
12,421
llex.lua.txt
Empty lines only
12,395
llex_emptylines.lua.txt
Whitespace only
9,372
llex_whitespace.lua.txt
Local rename only
11,794
llex_locals.lua.txt
--basic (token optimizations only)
3,835
llex_basic.lua.txt
Program default
3,208
llex_default.lua.txt
--maximum
3,130
llex_maximum.lua.txt

A sample output of LuaSrcDiet 0.11.0 for processing llex.lua at maximum settings is as follows:

Statistics for: llex.lua -> sample/llex.lua

*** local variable optimization summary ***
----------------------------------------------------------
Variable Unique Decl. Token Size Average
Types Names Count Count Bytes Bytes
----------------------------------------------------------
Global 11 0 22 138 6.27
----------------------------------------------------------
Local (in) 38 82 393 1020 2.60
TOTAL (in) 49 82 415 1158 2.79
----------------------------------------------------------
Local (out) 20 82 393 393 1.00
TOTAL (out) 31 82 415 531 1.28
----------------------------------------------------------

*** lexer-based optimizations summary ***
--------------------------------------------------------------------
Lexical Input Input Input Output Output Output
Elements Count Bytes Average Count Bytes Average
--------------------------------------------------------------------
TK_KEYWORD 249 956 3.84 249 956 3.84
TK_NAME 423 1202 2.84 423 575 1.36
TK_NUMBER 42 44 1.05 42 44 1.05
TK_STRING 63 630 10.00 63 628 9.97
TK_LSTRING 1 111 111.00 1 111 111.00
TK_OP 467 494 1.06 467 494 1.06
TK_EOS 1 0 0.00 1 0 0.00
--------------------------------------------------------------------
TK_COMMENT 125 4262 34.10 0 0 0.00
TK_LCOMMENT 2 1188 594.00 0 0 0.00
TK_EOL 329 329 1.00 170 170 1.00
TK_SPACE 897 3205 3.57 152 152 1.00
--------------------------------------------------------------------
Total Elements 2599 12421 4.78 1568 3130 2.00
--------------------------------------------------------------------
Total Tokens 1246 3437 2.76 1246 2808 2.25
--------------------------------------------------------------------

Overall, the file size reduced by more than 9KB. Tokens are classified into 'real' or actual tokens, and 'fake' or whitespace tokens. The number of 'real' tokens remained the same. Short comments and long comments were completely eliminated. The number of line endings was reduced by 59, while all but 152 whitespace characters were optimized away. So, token separators (whitespace, including line endings) now takes up just 10% of the total file size. No optimization of number tokens was possible, while 2 bytes were saved for string tokens.

For local variable name optimization, the report shows that 38 unique local variable names were reduced to 20 unique names. The number of identifier tokens should stay the same (there is currently no optimization option to optimize away non-essential or unused 'real' tokens.) Since there can be at most 53 single-character identifiers, all local variables are now one character in length. Over 600 bytes was saved.


File Sizes of Squeezed Sources versus Binary Chunks

File sizes of LuaSrcDiet 0.11.0 main files in various forms:

Source File
Original Size
(bytes)
luac normal
(bytes)
luac stripped
(bytes)
LuaSrcDiet
--basic (bytes)
LuaSrcDiet
--maximum (bytes)
LuaSrcDiet.lua
21,961
20,952
11,000
11,005
8,159
llex.lua
12,421
8,613
4,247
3,835
3,130
lparser.lua
41,757
27,215
12,506
11,755
7,666
optlex.lua
31,009
16,992
8,021
9,129
6,858
optparser.lua
16,511
9,021
3,520
5,087
2,999
Total
123,659
82,793
39,294
40,811
28,812

Compressibility of LuaSrcDiet 0.11.0 main files in various forms:

Compression Method
Original size
(bytes)
luac normal
(bytes)
luac stripped
(bytes)
LuaSrcDiet
--basic (bytes)
LuaSrcDiet
--maximum (bytes)
Uncompressed originals
123,659
82,793
39,294
40,811
28,812
gzip -9
28,288
29,210
17,732
12,041
10,451
bzip2 -9
24,407
27,232
16,856
11,480
9,815
7-zip (max) (lzma)
25,530
23,908
15,741
11,241
9,685

So, squeezed source code are smaller than stripped binary chunks and compresses better than stripped binary chunks, at a ratio of 2.9 for squeezed source code versus 2.3 for stripped binary chunks. Compressed binary chunks is still a very efficient way of storing Lua scripts, because using only binary chunks allow for the parts of Lua needed to compile from sources to be omitted (llex.o, lparser.o, lcode.o, ldump.o), saving over 24KB in the process.

Note that LuaSrcDiet does not answer the question of whether embedding source code is better or embedding binary chunks is better. It is simply a utility for producing smaller source code files and an exercise in processing Lua source code using a Lua-based lexer and parser skeleton.


Performance of Squeezed Sources versus Binary Chunks

The following is a primitive attempt to analyze in-memory Lua script loading performance (using the loadstring function in Lua.)

The LuaSrcDiet 0.11.0 files (original, squeezed with --maximum and stripped binary chunks versions) are loaded into memory first before a loop runs to repeatedly load the script files for 10 seconds. A null loop is also performed (processing empty strings) and the time taken per null iteration is subtracted as a form of null adjustment. Then, various performance parameters are calculated. Note that LuaSrcDiet.lua was slightly modified (#! line removed) to let the loadstring function run. The results below were obtained with a Lua 5.1.3 executable compiled using "make generic" on Cygwin/Windows XP SP2 on a Sempron 3000+ (1.8GHz). The LuaSrcDiet 0.11.0 source files have 11,180 'real' tokens in total.



Null
loop
Stripped
binary chunk
Original
Sources
Squeezed
Sources
Total Size
(bytes)
0
39,294
123,640
28,793
Iterations

312,155
9,680
1306
1,592
Duration
(sec)
10
10
10
10
Time/iteration
(msec)
0.032
1.033
7.657
6.281
Time/iteration, adjusted for null
(msec)
-
1.001
7.625
6.249
Load rate
(MB/sec)
-
37.44
15.46
4.39
Load time per byte
(ns)
-
25.5
61.7
217.0
Load time per token (ns)
-
-
682
559
Source time vs binary chunk time ratio

-
1.00
7.62
6.24
Binary chunk rate vs. source rate ratio
-
1.00
2.42
8.53

The above shows that stripped binary chunks is still, in many ways, the highest-performance form of fixed Lua scripts. On a very average machine, scripts load at over 37MB/sec (in memory). This is very comparable to the burst speeds of common desktop hard disks of 2008. If instant response is paramount, stripped binary chunks has little competition.

By contrast, source code that is squeezed to the maximum using LuaSrcDiet can only muster an in-memory load rate of 4.4MB/sec. The original sources load at about 15.5MB/sec, but most of the speed is from the lexer scanning over comments and whitespace. A quick calculation indicates that the speed of the lexer over comments and whitespace can be as much as 65MB/sec, but note that the speed is all for naught. What really matters are the real tokens, and the squeezed source code manages to load faster than the original sources by 18%.

So, the loading of stripped binary chunks is faster than squeezed source code by a bit over 6X. The 4.4MB/sec speed for squeezed source code is still quite respectable. When an application considers the time taken to load data from the disk and perhaps the time taken to decompress, loading source code may be perfectly fine in terms of performance. For programs that already embed source code, using LuaSrcDiet to squeeze the source code probably speeds loading up by a tiny bit in addition to making programs smaller.


Acknowledgements

Thanks to the LuaForge team for hosting this material. This page was written on SeaMonkey. LuaSrcDiet was developed using the SciTE editor on Cygwin, and managed using SVN. Parts of LuaSrcDiet is based on Yueliang, which is in turn based on the Lua sources.


This page Copyright © 2008 KHMan. Last Revised: 2009-06-06.
Canonical URL: http://luasrcdiet.luaforge.net/