Personal tools

CallProfiler

From OpenLaszlo

The call profiler generates statistics that can be used to measures the time spent in each function, and to generate a call graph.

The call profiler is an unsupported part of the product. It works, but it is difficult to configure, and lacks a front end.

Also note that the profiler only works reliably on Mac OS X, and only works at all with the Flash 7 runtime. The Windows Flash player versions tested to date use a low-resolution timer that is not sufficient to give reasonable results.

NOTE: These instructions have not been updated to include the Legals DHTML profiler. They will be updated very soon to include that information.

Contents

Starting

This step is no longer required in Legals/LPS 4.0

The call profiler streams to port 4444 on the LPS server. You will need to attach a program to that port, that copies the streamed data to a file. To do this use netcat, for example:

 nc -l -p 4444 > profile.out

Then run your program by loading it with ?profile=true. The profile option is only available if you have a build with it enabled, which it is not the default. Therefore you must get the source and build it yourself. See SubversionBuildInstructions for details. You must also enable the profiler in the build.properties file, located at the root of trunk.

 # LFC variants to build
 build.lfc.runtime.debug = true
 # build.lfc.runtime.krank = true
 build.lfc.runtime.profile = true

The program that you are profiling must not be a debug build. (You wouldn't want to profile all the debug overhead anyway, would you?)

Note that data collection can outstrip the spooling of data to the Java listener. Typically you will have to let your application idle and watch the tail of the spool file for inactivity to know when you can stop your application (and thus stop the data collection and spooling). See Raw Data below for hints on trimming the data to analyze just a portion of a run.

LFC Node Events

A special set of events is logged for LFC Nodes (over and above the function calls that result from the construction and operation of the nodes).

  • Every node that is constructed is given a _profile_name. That name is:
    • #id if the node has an id
    • .name if the node has a name
    • constructor if the node has neither
  • When a node with an id or name is constructed, an event start: _profile_name is logged as the first thing in construction.
  • When a node with an id or name is done instantiating, an event done: _profile_name is logged as the last thing in instantiation.
  • When a node queues up children to be instantiated in the background, the background instantiator will add profile records so that when it is actually making the children, it looks like the call came from the parent's _profile_name. This is done transitively, so a nodes sub-subnodes are also charged to the parent, etc.

The named nodes will show up in the profile report as if they were function calls. This will be slightly confusing because these named nodes will appear to be called for every child view that is built, but we believe this gives a more useful gauge on how costly individual nodes are.

User Events

Users can log their own events that will show up in the profile report. This is done by:

 LzProfiler.event( message );

where message is any arbitrary string. These events will be timestamped, spooled out the profiling data stream, and they will appear in the fourth section of the analyzed report, which is a chronological listing of all user events. This can be helpful if you are interested in metering a particular sequence of events that may be spread across several functions or may occur asynchronously.

Analyzing the Data

The output can then be analyzed using:

 perl $LZ_TOOLS_ROOT/profiler/profile.pl < profile.out

At the top of the report are some summary statistics:

Profiling: http://127.0.0.1:8080/lps/sample-apps/calendar/calendar.lzx?lzt=swf&profile=true
 Date: Tue Oct 28 21:33:04 GMT-0500 2003
 Flash Player version: MAC 7,0,14,0
 LPS version: 2.0.0

Total time: 16207ms(cpu) 51.64s(overhead) 0.69s(unmetered) 2115ms(unknown) 70.66s(elapsed)

The cpu time is the elapsed time spend in all metered functions (in milliseconds), the overhead is time spent spooling the data to the listener (in seconds), the unmetered time is time that elapsed while not in any metered function (in seconds, probably time sliced away from the player by the O/S or browser), the unknown time is the error after the other values are subtracted from the elapsed time (in milliseconds), the elapsed time is simply the difference between the last and first metered event (in seconds).

Note If you are trying to profile startup time, it may be worthwhile to manually trim the end of the raw data file to eliminate the idle calls to sendEvent.

The report format is in Unix gprof format, so you can man gprof for information. Each entry is a report for a node in the call graph: it lists the arcs to the node (the callers), the node, and the arcs from the node (the callees). The nodes are listed in order of time spent below that node (in the node and its callees, the arcs out of the node). Note that since the metered code is called from several entry points (during loading, in response to events, from the idle loop), the call graph is a forest, not a tree.

Here is a quick summary from an example:

                        4.56    64.61    5/627   lzAddLocalData                   [258]
[179]          3.7%   571.85  8102.27  627+0    "core/LzNode.as"#17/12            [179]
                      890.70  3680.40  536/538   LzNode.prototype.applyArgs       [180]

the lines above the line with % are callers (arcs in), those below are callees (arcs out). The numbers in brackets are uids, which I use below:

for callers:

4.56 
time in 258 when calling 179
64.61 
time in 179's children due to 258 calling
5/627 
258 calls 179 5 of the 627 times it was called
lzAddLocalData 
name of 258

for functions:

[179] 
uid
3.7% 
% total time below 179 (due to 179 and its callees)
571.85 
time in 179
8102.27 
time in callees
627+0 
total calls of 179 in this run + calls to self (including indirect)
"core/LzNo..." 
name of 179, as file/line because it is an unnamed function
[179] 
uid

for callees:

890.70 
time in 180 due to calls from 179
3680.40 
time in 180's callees due to calls from 179
536/538 
179 calls 180 536 of the 538 times it was called
LzNode.pro... 
name of 180

Other Reports

The report analyzer has 3 other sections. The second section is a 'flat' report that ranks all the functions by % of total time they contributed to the run. The third section is an index of functions to id's which can be helpful in searching for a particular function elsewhere in the report. The fourth section is a chronological listing of the user events that were logged.

Caveats

The profiler records real times as reported to it by the Flash player. It does not know when time elapses while the player is not running (due to either OS or browser time-slicing), but we believe this may only happen when at the top of the stack so may be the 'unmetered' time -- elapsed time that was not spent in any function known to us.

The profiler tries to record the times as efficiently as possible, but the only data structure available to it is an object, which is a hash table, and it is entirely likely that when adding a record to the object the object will have to be grown and/or rehashed which will look like time spent in a user function. We have not tried to measure this error but hope that it will be insignificant over a large program.

Raw data

If you want to do your own analysis, the raw output is in the format:

 <timestamp> [CRSEV] <function>
function Call
function Return
Start of overhead (dump stats)
End of overhead
user eVent

The profile analyzer does not depend on the order of the raw data, so you can sort it and trim it to analyze a smaller portion of the raw data (just so long as you include complete call chains).

Note that the raw output is not sorted. If you want to analyze the raw output yourself or with another tool, you will want to first sort it by the timestamp field.

Gory details: The collector tries to be as efficient as possible. Currently, the most efficient way to collect data is to use objects, one for calls, one for events, and one for returns. These objects have timestamps as keys and function names or strings as values. (There is code to ensure that the timestamps are always unique.) This way the only allocation the data collector does is if the hash (object) has to grow (we don't currently have a way to measure and subtract this overhead, so it does perturb the data somewhat). There is no allocation of pairs (e.g., {<fnname>: <timestamp>}) or strings (e.g., <fnname> + ": " + <timestamp>), there is no array overhead (for players where arrays are not specialized -- perhaps this should be reevaluated this for FP7 where it appears there may be a special implementation of integer-keyed arrays).

The data spooler similarly to dumps the collected statistics as efficiently as possible. It runs in an idle loop, and breaks up the work to avoid being aborted by the player idle limits. It simply outputs the objects in hash order (using 'for in') to a socket. Note that the collection can get ahead of the spooling, so you typically need watch the output stream to decide when to stop your application (which will also stop collection).

The overhead of sorting and analyzing the data is all done off-line. The report generator does its own sorting, but if you want to analyze the raw data yourself, you will have to sort it first, to make any sense of it. The report generator is not sensitive to the data order, so you can sort and trim the raw data as you see fit and still run it through the report generator if you want to analyze just a portion of a run.

Controlling profiling

You can turn profiling off for a range of code using:

 #pragma 'profile=false'
 [... this code will not get profiled ...]
 #pragma 'profile=true'

You can turn profiling off and on at runtime by:

 var tmp = $lzprofiler;
 $lzprofiler = null;

and back on:

 $lzprofiler = tmp;

[_pw9_]

nvnv