Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Chinese and related problems in J2EE Web components

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article is to share with you the content of sample analysis of Chinese and related issues in J2EE Web components. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

XML:namespace prefix = o ns = "urn:schemas-microsoft-com:Office:office" / >

3. Send Chinese to the server

Although we generally don't use JSP or even servlet to handle data submitted by customers or access request parameters, JSP is always easier to use or update than Servlet or JavaBean (at least in tomcat 4.0.4, because we often have to restart the server to modify Servlet or JavaBean), so here we still have to use JSP to access the request parameters.

Whether in JSP or Servlet, we use the method getParameter (String name) of ServletRequest (or its subclass) to access the request parameter, which returns String, which means we can get the string that has been decoded to the byte stream from Inte.NET. It would be perfect if the server could decode these byte streams correctly. In fact, it is also very simple, to do this, all you need to do is for the server to know what encoding these byte streams are encoded in on the client side. As shown in figure 2-17, we want decoding==encoding.

ASPectratio= "t" v 17show extt = "edit" > 2003-10-241629040.gif "OVV title =" 2-title ">

Figure 3-1 data from browser to server

But there is often no intersection between ideal and reality, and we can never let the server know what encoding these bytes are encoded in. Even in the "HTML 4.01 Specification" of W3C (World Wide web Consortium) and rfc2616 ("Hypertext Transfer Protocol-HTTP/1.1") of the Internet Engineering Task Force (Internet Engineering Task Force, IETF), which are the main design organizations related to the World wide Web, no relevant recommendations have been found. Can let the server know what this code is under existing browsers and HTML Web pages (please let me know if you know what to do), so by default, Tomcat 4.0.4 uses ISO8859-1 to decode the data submitted by the client. Figure 2-17 decodingautomatically ISO8859-1, but if encodingdating ISO8859-1, it is obvious that this is a data transmission error. Note that the data we are talking about here is only the data in the Entity Body in the data sent by the client to the server.

1. Who decides Encoding?

Who decides that the browser's current Web page sends the encoding of the data byte stream to the Internet (the final object, of course, the server) through Form? The browser, of course. So how does the browser determine this code? That is the decoding used by inherited browsers to decode the current page (of course, it also decodes the current Web page, don't forget that the metadata on any file or Internet is bytes). In fact, inheritance is not all correct, as you will find later.

two。 What determines Encoding?

It is about these six aspects of information that make browsers decide what encoding to use:

1) determined by xsl

2) Special tags in entities (Entity Body)

3) the decoding manually set by the user on the Web page

4) Content-Type in response header (Response Header Field)

5) charset in HTML element META

6) decoding previously used by browsers

Their priorities may vary from browser to browser, but they are diminishing in IE6.0, and Microsoft's approach is ambiguous and intriguing, so it's no wonder it won the browser war. The following is only an explanation for this 6.11.

XSL (eXtensible Stylesheet Language, Extensible style sheet language) can easily convert XML (eXtensible Markup Language, Extensible markup language) into many other kinds of content. We are only concerned about its impact on browser coding settings when it converts XML to HTML. If the XSL follows the first working draft standard on XSL released by W3C in 1998 (setting the xmlns:xsl= "http://www.w3.org/TR/WD-xsl" in the XSL file), you can set the browser encoding by adding the element meta to the XSL and making similar settings:

If the XSL is the de facto XSLT (eXtensible Stylesheet Language Transformation, Extensible style sheet Transformation language), that is, xmlns:xsl= "http://www.w3.org/1999/XSL/Transform", the browser encoding uses UTF=16 unconditionally, even if the meta element is added to the XSL file to set the encoding.

An entity refers to the entity in the data returned by the server to the browser, or it can be simply understood as returning all the data except additional headers and blank rows. As mentioned earlier, windows 2000 Server (the operating system I used in my experiment) will add three rogue bytes (0xEF0xBB0xBF) to the file encoded in UTF-8. If the current Web page is a static resource, the server will directly return the Web page to the client without any processing. If the static Web page is also generated in Windows 2000 Server, then the first three bytes of the entity will be 0xEF0xBB0xBF, and the browser will soon detect these three bytes. So use UTF-8 to decode it correctly, and the next four points will be ignored.

Setting decoding manually is to set the encoding in the browser window, which explicitly tells the browser which encoding the Web page should use, and the user is always right.

Tomcat 4.0.4 does not set the Content-Type in the response header when returning a static Web page, which we use in JSP:

Or used in Servlet:

Response.setContentType ("text/html;charset=gb21312")

This is the setting of Content-Type in the response header (Response Header), whose value follows the MIME (Multipurpose Internet Mail Extension protocol, Multi-purpose Internet Mail extension Protocol) specification, as shown in figure 3-2, which requests all the data received by the JSP page http://localhost/scqdac/t.jsp on the client side, and 0x7d is the length of the valid data in the entity.

Figure 3-2 all the data returned by the server to the browser through the http protocol

As you can see from the figure, it is understandable that the Content-Type in the response header has nothing to do with the Content-Type of the meta in HTML. All the source code for t.jsp is as follows:

In the JSP file

.

It doesn't make any sense anymore, and no one will care about it. For a detailed definition and description of Content-Type,RFC2616 in the response header, see: http://www.ietf.org/rfc/rfc2616.txt.

We can also set the HTML page encoding by setting the HTML element meta in the Web page:

For the character set (Document Character Set) of HTML documents, see "HTML 4.01 Specification", http://www.w3c.org/TR/html401/html401.html (you can also learn about the setting of the attribute enctype of the HTML element form and its function).

If the charset and related information cannot be found, the browser uses the most recently used encoding.

As mentioned earlier, the browser's encoding is not entirely inherited from decoding, and when the decoding used to decode the Web page is UTF- 16:00, the entity that sends data to the server still uses UTF-8, at least by default in IE6.0. Another interesting thing is when encoding=ISO8859-1, the request header (Request Header Field)

Content-Type:application/x-www-form-urlencoded

When sending the data "name=" (text) to the server via Form, it is encoded as:

% 26% 2325105% 3B% 26% 2326159% 3B% 26% 2320013% 3B% 26% 2322269% 3B%

26% 2320154% 3B

This is indeed an interesting way of coding, and after a little analysis, we can find that the escape string is

I'm Chinese

So on the server side, we pass through

Request.getParameter ("text")

The resulting string will also be "I am Chinese", which is obviously the entity character used by SGML (Standard Generalized Markup Language), and HTML can certainly handle it well, so if the JSP corresponding to the form attribute action has:

Then the browser will reproduce "I am Chinese". So we can let it, of course, we can easily deal with it, after all, it is a standard thing.

3. Give us back the byte string.

The server self-assertively executes similar without our notification

String str = new String (bytes, "ISO8859-1")

And does not allow us to directly get the byte string transmitted by the customer (this method is not available in Servlet api). But we can still get ServletRequest to give us the byte string, which is to perform its inverse operation, encoded in ISO8859-1:

String str = request.getParameter ("text")

Byte bs [] = str.getBytes ("ISO8859-1")

At this time, we have sufficient reason to believe that the bs is completely sent to the server by the client, because decoding the byte stream with ISO8859-1 will not be distorted, and the string it gets, all the high-order bytes of the characters are equal to 0, that is to say, if we encode it with ISO8859-1, we will not lose the data, and we will get the original byte string.

4. Re-decode

As long as we re-decode these byte strings with the correct encoding, we will get the real characters submitted by the customer, which may have different internal codes from the characters on the client, but they are definitely the same characters.

String str = request.getParameter ("text")

Byte bs [] = str.getBytes ("ISO8859-1")

String text = new String (bs, "GBK")

Of course, the simplest and most effective thing is:

Request.setCharacterEncoding ("GBK")

String text = request.getParameter ("text")

Sometimes we don't use the second parameter of String in our study, we use it directly.

String text = new String (bs)

In fact, the default code of our system is GBK, and String refers to this default code.

Maybe at this point, we really feel how happy it would be to know what encoding the client browser uses, but we can't. Don't expect ServletRequest.getCharacterEncoding () to bring you anything. If you don't explicitly use the method of the ServletRequest object on the server side: setCharacterEncoding (String encoding) sets the encoding of the data in the request described by the object, then the getCharacterEncoding () of the object will return null, which we think is the perfect combination:

Request.setCharacterEncoding (request. GetCharacterEncoding ()

Whether getCharacterEncoding () is helpful or not, it doesn't make sense-you know it from me, so I don't need you to tell me.

Since three of the six methods that determine browser coding can be used by the server (but we do disdain to use the first method, although it is the most effective in the face of IE). If the form that submits data to the server is contained in a static Web page, we set the property of the HTML element meta, and if the form is contained in JSP, we set the contentType in the page directive. Then when processing the data submitted by the form, we can re-decode the byte string with the corresponding encoding. But don't be careless, this method is not entirely trustworthy, because our user may have used the second of the six methods to reset the browser's encoding, but fortunately, if not all the information in the Web page is English characters, the user will not be bored to perform this illegal operation unless he really wants to get garbled.

Thank you for reading! This is the end of this article on "sample Analysis of Chinese and related problems in J2EE Web components". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report