In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge of "what is the correct way for C # to read strings from UTF-8 stream". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Our following code reads the UTF-8-encoded string from a stream stream. We can consider the potential problems first.
String ReadString (Stream stream) {var sb = new StringBuilder (); var buffer = new byte [4096]; int readCount; while ((readCount = stream.Read (buffer)) > 0) {var s = Encoding.UTF8.GetString (buffer, 0, readCount); sb.Append (s);} return sb.ToString ();}
The problem is that in some cases the returned string is different from the original encoded string.
For example, the smiley face symbol? Sometimes it is decoded into 4 unknown characters:
Coded string:?
Decode string:?
We know that UTF-8 can use 1 to 4 bytes to represent a Unicode character. For more information about string encoding, please refer to character Encoding.
The Stream.Read method can return bytes from 1 to messageBuffer.Length, which means that the buffer may contain incomplete UTF-8 characters.
Once the UTF-8 encoding of the last character in the buffer is incomplete, Encoding.UTF8.GetString converts an invalid UTF-8 string. In this case, the method returns an invalid string because it cannot guess the missing byte.
We demonstrate the above behavior using the following code:
Var bytes = Encoding.UTF8.GetBytes ("?"); / / bytes = new byte [4] {240,159,152,138} var sb = new StringBuilder (); / / simulates reading the data stream for (var I = 0; I) byte by byte
< bytes.Length; i++){ sb.Append(Encoding.UTF8.GetString(bytes, i, 1));}Console.WriteLine(sb.ToString());// "????" 代替了 "????"Encoding.UTF8.GetBytes(sb.ToString());// new byte[12] { 239, 191, 189, 239, 191, 189, 239, 191, 189, 239, 191, 189 }如何修复代码 有多种方法可以修复代码。 第一种方法:只有当你得到全部数据时,才将字节数组转换为字符串。 string ReadString(Stream stream){ using var ms = new MemoryStream(); var buffer = new byte[4096]; int readCount; while ((readCount = stream.Read(buffer)) >0) {ms.Write (buffer, 0, readCount);} return Encoding.UTF8.GetString (ms.ToArray ());}
The second method: you can wrap the stream into a StreamReader object with correct encoding.
String ReadString (Stream stream) {using var sr = new StreamReader (stream, Encoding.UTF8); return sr.ReadToEnd ();}
In addition, you can use the System.Text.Decoder class to correctly decode the characters in the buffer. When performance is required, PipeReader and run classes can be used to read data in a memory-optimized manner.
That's all for "what's the right way for C # to read strings from a UTF-8 stream?" Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.