CallProfiler
From OpenLaszlo
The call profiler generates statistics that can be used to measures the time spent in each function, and to generate a call graph.
The call profiler is an unsupported component of OpenLaszlo. It works, but it is tricky and lacks a front end.
There is also a way to use the Flex Builder (now Flash Builder) profiling tool to profile a Flash 10 application. See FlexProfiling
Also note that the profiler only works reliably on Mac OS X, and only works at all with the swf8 and DHTML runtimes. The Windows Flash player versions tested to date use a low-resolution timer that is not sufficient to give reasonable results.
Contents |
Starting
Run your program by loading it with ?profile=true. The profile option is only available if you have a build with it enabled. (Which it is normally the default. If the compiler complains, you will have to get the source and build it yourself. See SubversionBuildInstructions for details. You'll also have to enable the profiler in the build.properties file, located at the root of trunk, as follows:
# LFC variants to build build.lfc.runtime.debug = true # build.lfc.runtime.krank = true build.lfc.runtime.profile = true
.)
The program that you are profiling must not be a debug build. (You wouldn't want to profile all the debug overhead anyway, would you?)
Note that data collection can outstrip the spooling of data to the Tomcat listener. Typically you will have to let your application idle and watch the tail of the .profiler file for inactivity to know when you can stop your application (and thus stop the data collection and spooling). See Raw Data below for hints on trimming the data to analyze just a portion of a run.
Configuring Tomcat to Accept Large Data
It turns out there is a Tomcat server configuration maxPostSize, which defaults to 2 MB, but which can be set to a specified value, or zero for unlimited post size. For profiling DHTML, I needed to increase the size beyond the 2MB default.
jakarta-tomcat-5.0.30/conf/server.xml:
<Connector port="8080"
maxPostSize="8000000"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
LFC Node Events
A special set of events is logged for LFC Nodes (over and above the function calls that result from the construction and operation of the nodes).
- Every node that is constructed is given a _profile_name. That name is:
- #id if the node has an id
- .name if the node has a name
- constructor if the node has neither
- When a node with an id or name is constructed, an event start: _profile_name is logged as the first thing in construction.
- When a node with an id or name is done instantiating, an event done: _profile_name is logged as the last thing in instantiation.
- When a node queues up children to be instantiated in the background, the background instantiator will add profile records so that when it is actually making the children, it looks like the call came from the parent's _profile_name. This is done transitively, so a nodes sub-subnodes are also charged to the parent, etc.
The named nodes will show up in the profile report as if they were function calls. This will be slightly confusing because these named nodes will appear to be called for every child view that is built, but we believe this gives a more useful gauge on how costly individual nodes are.
User Events
Users can log their own events that will show up in the profile report. This is done by:
LzProfiler.event( message );
where message is any arbitrary string. These events will be timestamped, spooled out the profiling data stream, and they will appear in the fourth section of the analyzed report, which is a chronological listing of all user events. This can be helpful if you are interested in metering a particular sequence of events that may be spread across several functions or may occur asynchronously.
Analyzing the Data
The output can then be analyzed using:
perl $LZ_TOOLS_ROOT/profiler/profile.pl < your-app.profiler > profile.out
At the top of the report are some summary statistics:
Profiling: http://127.0.0.1:8080/lps/sample-apps/calendar/calendar.lzx?lzt=swf&profile=true Date: Tue Oct 28 21:33:04 GMT-0500 2003 Flash Player version: MAC 7,0,14,0 LPS version: 2.0.0 Total time: 16207ms(cpu) 51.64s(overhead) 0.69s(unmetered) 2115ms(unknown) 70.66s(elapsed)
The cpu time is the elapsed time spend in all metered functions (in milliseconds), the overhead is time spent spooling the data to the listener (in seconds), the unmetered time is time that elapsed while not in any metered function (in seconds, probably time sliced away from the player by the O/S or browser), the unknown time is the error after the other values are subtracted from the elapsed time (in milliseconds), the elapsed time is simply the difference between the last and first metered event (in seconds).
Note If you are trying to profile startup time, it may be worthwhile to manually trim the end of the raw data file to eliminate the idle calls to sendEvent.
The report format is in Unix gprof format, so you can man gprof for information. Each entry is a report for a node in the call graph: it lists the arcs to the node (the callers), the node, and the arcs from the node (the callees). The nodes are listed in order of time spent below that node (in the node and its callees, the arcs out of the node). Note that since the metered code is called from several entry points (during loading, in response to events, from the idle loop), the call graph is a forest, not a tree.
Here is a quick summary from an example:
4.56 64.61 5/627 lzAddLocalData [258]
[179] 3.7% 571.85 8102.27 627+0 "core/LzNode.as"#17/12 [179]
890.70 3680.40 536/538 LzNode.prototype.applyArgs [180]
the lines above the line with % are callers (arcs in), those below are callees (arcs out). The numbers in brackets are uids, which I use below:
for callers:
- 4.56
- time in 258 when calling 179
- 64.61
- time in 179's children due to 258 calling
- 5/627
- 258 calls 179 5 of the 627 times it was called
- lzAddLocalData
- name of 258
for functions:
- [179]
- uid
- 3.7%
- % total time below 179 (due to 179 and its callees)
- 571.85
- time in 179
- 8102.27
- time in callees
- 627+0
- total calls of 179 in this run + calls to self (including indirect)
- "core/LzNo..."
- name of 179, as file/line because it is an unnamed function
- [179]
- uid
for callees:
- 890.70
- time in 180 due to calls from 179
- 3680.40
- time in 180's callees due to calls from 179
- 536/538
- 179 calls 180 536 of the 538 times it was called
- LzNode.pro...
- name of 180
Other Reports
The report analyzer has 3 other sections. The second section is a 'flat' report that ranks all the functions by % of total time they contributed to the run. The third section is an index of functions to id's which can be helpful in searching for a particular function elsewhere in the report. The fourth section is a chronological listing of the user events that were logged.
Caveats
The profiler records real times as reported to it by the Flash player. It does not know when time elapses while the player is not running (due to either OS or browser time-slicing), but we believe this may only happen when at the top of the stack so may be the 'unmetered' time -- elapsed time that was not spent in any function known to us.
The profiler tries to record the times as efficiently as possible, but the only data structure available to it is an object, which is a hash table, and it is entirely likely that when adding a record to the object the object will have to be grown and/or rehashed which will look like time spent in a user function. We have not tried to measure this error but hope that it will be insignificant over a large program.
Raw data
If you want to do your own analysis, the raw output is in the format:
<timestamp> [CRSEV] <function>
- C
- function Call
- R
- function Return
- S
- Start of overhead (dump stats)
- E
- End of overhead
- V
- user eVent
The profile analyzer does not depend on the order of the raw data, so you can sort it and trim it to analyze a smaller portion of the raw data (just so long as you include complete call chains).
Note that the raw output is not sorted. If you want to analyze the raw output yourself or with another tool, you will want to first sort it by the timestamp field.
Gory details: The collector tries to be as efficient as possible. Currently, the most efficient way to collect data is to use objects, one for calls, one for events, and one for returns. These objects have timestamps as keys and function names or strings as values. (There is code to ensure that the timestamps are always unique.) This way the only allocation the data collector does is if the hash (object) has to grow (we don't currently have a way to measure and subtract this overhead, so it does perturb the data somewhat). There is no allocation of pairs (e.g., {<fnname>: <timestamp>}) or strings (e.g., <fnname> + ": " + <timestamp>), there is no array overhead (for players where arrays are not specialized -- perhaps this should be reevaluated this for FP7 where it appears there may be a special implementation of integer-keyed arrays).
The data spooler similarly to dumps the collected statistics as efficiently as possible. It runs in an idle loop, and breaks up the work to avoid being aborted by the player idle limits. It simply outputs the objects in hash order (using 'for in') to a socket. Note that the collection can get ahead of the spooling, so you typically need watch the output stream to decide when to stop your application (which will also stop collection).
The overhead of sorting and analyzing the data is all done off-line. The report generator does its own sorting, but if you want to analyze the raw data yourself, you will have to sort it first, to make any sense of it. The report generator is not sensitive to the data order, so you can sort and trim the raw data as you see fit and still run it through the report generator if you want to analyze just a portion of a run.
Controlling profiling
You can turn profiling off for a range of code using:
#pragma 'profile=false'
[... this code will not get profiled ...]
#pragma 'profile=true'
You can turn profiling off and on at runtime by:
var tmp = $lzprofiler; $lzprofiler = null;
and back on:
$lzprofiler = tmp;

