Script Compiler
From OpenLaszlo
Contents |
Source Organization
The script compiler translates ECMAScript source text to swf byte codes. The script compiler consists of a parser, a code generator, and an assembler. The parser is written in JavaCC, and resides in /WEB-INF/lps/server/src. The code generator is written in Jython, and resides in /WEB-INF/lps/server/sc. The assembler is written in Java, and is in /WEB-INF/lps/server/src/com/laszlosystems/sc.
Interactive Development
To do interactive development on the script compiler:
> cd $LPS_HOME/server/sc > sci
The README.TXT file in that directory has additional notes on script compiler development.
Example
> cd $LPS_HOME/server/sc
> sci
Jython 2.1 on java1.4.1_01 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> from testing import *
>>> c("var a=[['a', 1, 'b', 2]", printInstructions=1)
constants 'a'
push 'a', 2, 'b', 1, 'a', 4
initArray
setVariable
>>> c("var o={a: 1, b: 2}")
push 'o', 'a', 1, 'b', 2, 2
initObject
setVariable
>>> c("var r=f('a', 1, 'b', 2)")
push 'r', 2, 'b', 1, 'a', 4, 'f'
callFunction
setVariable
Design
Overview
The script compiler compiles annotated ECMAScript source code into ActionScript bytecode. The annotation are #file, #line, and #pragma directives that allow the script compiler to be used as the back end of the element compilation phase in the application compiler.
The script compiler consists of these passes:
- Parsing
- The source text is transformed into a parse tree.
- Code Generation
- The parse tree is transformed into a sequence of objects that represent instructions.
- Assembly
- The instruction sequence is transformed into a sequence of bytes.
The script compiler supports some features and optimizations that are specific to its use within the Laszlo application compiler:
- Source location
- Instrumentation for debugging, profiling, and kranking
- Constraint compilation
And for the Flash execution environment:
- Manipulation of the activation scope
- Flash-specific optimizations
Parsing
JavaCC is used to scan (tokenize) and parse the source text. A post-parsing phase normalizes the parse tree to match evaluation order. This is necessary because the parser is right-recursive, and generates flat trees for a+b+c and a.b.c. We want right-branching trees.
Code Generation
The code generator uses the Visitor design pattern to transform the parse tree into a sequence of objects that represent ActionScript instructions.
Internally, the code generator performs source transformations (parse tree to parse tree transformations) to normalize certain constructs. These transformations are listed at the end of this chapter. (This compilation technique similar to intentional programming, although it was more directly inspired by syntactic macros in Dylan and Scheme.)
Assembly
The assembler turns objects that represent ActionScript instructions into bytecode sequences, and resolves branch references into offsets.
The assembler also performs these peephole optimizations:
- Replace adjacent PUSH instructions are replaced by a single PUSH instruction with multiple arguments.
- Transform PUSH; DUP into a PUSH with a repeated argument.
- Replace integer-valued floats by integers.
Note that PUSH merging is only valid because of the treatment of labels as pseudo-instructions, which break up a sequence of PUSHes if any but the first is a branch target. This is more conventionally done by optimizing within a basic block, but the compiler doesn't currently create basic blocks.
Note also that push merging and other optimizations may bleed each other. Since both types operations preserve program semantics, either application order is valid; the current implementation doesn't insure that the order is optimal.
Constraints
TBD
Implementation
The scanner and parser are written in Java (using JavaCC and jjtree). Subsequent stages are written in Python; the source for them is compiled to JVM bytecodes via Jython. Specific stages or classes may be rewritten in Java, depending on profiling, but this will take place after algorithm-level optimizatios have petered out.
The source files are heavily commented, and should be consulted as the primary reference. This document contains information that doesn't fit into the sources.
The compiler sources and testing infrastructure are in these files:
- actions.py
- SWF literals
- instructions.py
- Instructions and assembler
- compiler.py
- Parserinterface and compiler
- testing.py
- Interactive testing fns, and testing framework
- tests.py
- Test cases
- Parser.jjt
- Parser grammar
Appendix: Source Transformations
Expressions
function f(args) {body}
=>
function f(args) {
$$ = {...}
with (_root)
with ($$) ;{body}}
function (args) {body}
=>
function (args) {
$$ = {...}
with (_root)
with ($$) {body}}
a instanceof b
=>
$instanceof(a, b)
super.m(a, b)
=>
this.callInherited('m', a, b)
super(a, b)
=>
this.callInherited('constructor', a, b)
$$ is initialized to an object that binds each argument to its value, each local variable to undefined, and each local function definition name to its function. Its purpose is to insert an object that corresponds to the JavaScript activation object at the front of the scope chain, in front of _root.
The compiler generates variables beginning with "$lzsc$". User code shouldn't begin with this prefix. (Variable beginning with "$" in JavaScript are reserved for machine-generated code, but an lzx file could be machine generated, so this suffix is used as a second level of protection.)
Class Definitions
class C {}
=>
function C() {}
class C extends B {}
=>
function C() {}
Object['class']['extends'](B, C)
class C {var a}
=>
function C() {}
C.prototype.a = undefined
class C {var a=1}
=>
function C() {}
C.prototype.a = 1
class C extends B {var a=1}
=>
function C() {}
Object['class']['extends'](B, C)
C.prototype.a = 1
class C {var a=1, b=2}
=>
function C() {}
C.prototype.a = 1
C.prototype.b = 2
class C {function C(x) {this.x=x}}
=>
function C(x) {this.x=x}
class C extends B {function C(x) {this.x=x}}
=>
function C(x) {this.x=x}
Object['class']['extends'](B, C)
class C {function f(args) {body}}
=>
function C() {}
C.prototype.f = function(args) {body}

