Personal tools

JSPUTF8

From OpenLaszlo

Contents

Using International Text with Laszlo and JSP

Henry Minsky hminsky@laszlosystems.com

As of Laszlo 3.0, Unicode text is supported by Laszlo applications. The Flash Player, as of version 6, supports Unicode using the UTF-8 character set encoding.

A common goal is to write a back-end JSP (Java Server Pages) servlet to handle communication with a Laszlo or Flash application. However, there are several subtle issues when using non-ASCII text with the Flash runtime.

Below is a Laszlo application named test-char.lzx which illustrates how to get Unicode data into and out of a Laszlo application. The app sends a Japanese character up to the server, a .jsp servlet named echoUTF8.jsp will echo the data back to the app, where it is displayed.

There are two stumbling blocks you will encounter, one due to the Flash runtime's encoding of HTTP content, and the other due to the decoding of character strings from HTTP requests by the Java JSP implementation on the server.

Example code

First, here's the code for the Laszlo test application.

'''test-char.lzx:'''

<canvas debug="true" width="1200" height="600">
  <debug y="150" height="400" width="800" fontsize="14" />
  <dataset name="foo" src="http:echoUTF8.jsp" 
           request="false" 
           ontimeout="Debug.write(this.name + ': timed out')"
           onerror="Debug.write(this.name + ': error')"
           ondata="Debug.write('Response: ' + this.getPointer().serialize())" >
  </dataset>
  <simplelayout/>
  <button text="clickme">
    <method event="onclick">
      foo.setQueryParam('j', "&#20844;"); 
      foo.setQueryType(r1.value);
      Debug.write("setQueryType", r1.value);
      Debug.write("calling foo.doRequest()", foo);
      foo.doRequest();
    </method>
  </button>
  <radiogroup id="r1">
    <radiobutton value="'GET'">GET</radiobutton>
    <radiobutton value="'POST'" selected="true">POST</radiobutton >
  </radiogroup>
        

  <view fontsize="18" layout="axis:x"           
        datapath="foo:/response/formcomponent">
    <text bgcolor="#cccccc" width="200" visible="true"
          datapath="@name"/>
    <text bgcolor="#cccccc" width="300" visible="true" multiline="true"
          datapath="text()"/>
    <text bgcolor="#cccccc" width="300" visible="true"  multiline="true"
          datapath="@hex"/>
  </view>
</canvas>

And this is the .jsp which echos request data back to the client.

'''echoUTF8.jsp:'''

<%@ page import="java.util.*" %>
<%@ page contentType="text/xml; charset=UTF-8" %>
<response>
<%
    String method = request.getMethod();
    request.setCharacterEncoding("UTF-8");
    Enumeration params = request.getParameterNames();
         while(params.hasMoreElements()) {
         String n = (String)params.nextElement();
         String[] v = request.getParameterValues(n);
         for(int i = 0; i < v.length; i++) {
             String ustr = v[i];
             byte p[];
             if (method.equals("POST")) {
                 p = v[i].getBytes("UTF-8");        // WORKS for PROXIED POST
             } else {
                 p = v[i].getBytes("ISO-8859-1");  // OK for GET, proxied and direct
             }
             String nstr = new String(p, 0, p.length, "UTF-8");
             // Print out the hex encoding for debugging 
             StringBuffer debug = new StringBuffer();
             for (int ci = 0; ci < p.length; ci++){
                 debug.append(" 0x"+Integer.toString(( (char) (p[ci]) & 0xff), 16));
             }
             // parse query args (URLENCODING) to Unicode as UTF8
             out.print("<formcomponent method='"+method+"' name='" +
                         n + "' hex='"+debug.toString()+"'><![CDATA[");
             out.print(nstr);
             out.println("]]></formcomponent>");
         }
         }
%>
</response>

Encoding of data in HTTP requests from the app

There are four possible cases for how requests are made from the app to the server, depending on if the app is proxied or serverless, and whether it uses GET or POST method on a request.

Through empirical observation, I have discovered that the Flash runtime will use ISO-8859-1 encoding when making GET requests, and will use UTF-8 encoding for POST requests.

Request Method ProxiedServerless (SOLO)
GETISO-8859-1ISO-8859-1
POSTUTF8UTF8


As far as I can deduce, what the Flash runtime does when making an HTTP GET request is to treat each string as a list of byte values and then URL-encode the byte values in that array as if they were ISO-8859-1 characters. In my opinion, this is a confusing and incorrect thing to do. For POST requests, the strings seem to be encoded as their raw UTF-8 byte codes (one, two, or three bytes per character), which seems like the correct thing to do.

Due to the encoding depending on the request method, on the JSP side, you will see that the incoming request data string is treated as a byte stream (after it has been URL-decoded implicitly by the getParameterValue() method), and is then converted to a Java String using the appropriate encoding.

Data is sent back from the JSP to the client using XML with UTF-8 encoding. This appears to work, regardless of whether the client app is running in proxied or serverless mode.


Oracle JSP integration

Someone on the forums mentioned they had trouble with a connector from Oracle which sends XML via jsp pages, until they did the following:

Using the XSU utility Oracle gives back the row in XML. In the calling jsp
there is necessary to write the line 

   <%@ page language="java" contentType="text/html;charset=UTF-8"%>

To write UTF-8 codes into Oracle we use requesttype POST! 


More detailed exampled of use of Oracle/JSP is here: OracleJSP