Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How VBS strings are implemented internally

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

Editor to share with you how to implement the VBS string internally, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

VBS is based on Microsoft's ActiveX/COM technology. In order to support any language, COM object defines a series of general data types, which Microsoft calls Automation object Type (Automation data types), one of which is BSTR. VBS represents a string internally as BSTR, and BSTR is defined in WTypes.h:

The copy code is as follows:

Typedef wchar_t WCHAR

Typedef WCHAR OLECHAR

Typedef OLECHAR * BSTR

As you can see from the definition, BSTR is a pointer to a wchar_t type (that is, Unicode in C), but BSTR is not an ordinary wchar_t pointer. The standard BSTR points to an wchar_t array with a length prefix and NUL Terminator. The first 4 bytes of BSTR is a prefix that indicates the length of the string. The value of the BSTR length field is the number of bytes of the string and does not include the NUL Terminator. For commonly used BSTR processing functions, please refer to the MSDN documentation.

The theory is a little abstract, so let's use the code to illustrate it:

The copy code is as follows:

Str = "Hello" & Chr (0) & "world"

This is a simple VBS code, but what does the VBScript interpreter do internally? It actually initializes a BSTR variable (regardless of the string concatenation process):

The copy code is as follows:

/ * just for demonstration purposes, the actual code is definitely not like this * /

BSTR str = SysAllocStringLen (L "Hello\ 0world", 11); to better understand the structure of BSTR, let's write it another way:

/ * BSTR contains a length prefix, but actually points to the first character * /

Wchar_t arr [] = {22 ~ (th) ~ (th) ~ 0 ~ (th) ~.

BSTR str = & arr [2]; the structure of this BSTR in memory is:

00000000 16 000000 48 00 65 00 6C 00 6C 00 6F 000000

00000010 77 00 6F 00 72 00 6C 00 64 000000

Orange represents a four-byte length prefix. The red highlight indicates the current point of the BSTR pointer, the blue highlight indicates the Chr (0) character in the string, and the green highlight indicates the closing character NUL of the BSTR (this character is added by the SysAllocStringLen function, so it takes up two bytes because it is Unicode). In other words, BSTR is the null-terminated string in C if you ignore the first four bytes.

Look at another piece of VBS code:

MsgBox Len (str) uses MsgBox to display the string length just defined. What is done inside the VBScript interpreter? Is it like the C language standard library function strlen, traversing the entire string with NUL as the end of the string identification?

The copy code is as follows:

/ * simple implementation of strlen function in C language * /

Size_t strlen (const char * str)

{

Const char * eos = str

While (* eos++)

Return ((int) (eos-str-1))

}

The answer is obviously no, because the string contains Chr (0), which, if implemented like strlen, will be truncated by Chr (0). The Len function should return 5, but it actually returns the correct number 11.

The internal implementation of VBS's Len function should be as follows:

The copy code is as follows:

/ * same as above, for demonstration only * /

Size_t Len (const BSTR str)

{

Return SysStringLen (str)

}

Or not call Windows API, because the prefix of the first 4 bytes of BSTR indicates the number of bytes of the string (excluding the BUL character at the end), so just move the pointer:

The copy code is as follows:

/ * cast to int pointer minus one and read, then divided by 2 (two bytes of a Unicode character) * /

Size_t Len (const BSTR str)

{

Return * ((int *) str-1) / 2

}

As you can see, because the length of BSTR can be obtained by prefix, there is no need to use NUL as the string Terminator, that is, the VBS string is binary safe (binary secure).

So why does the following code only show Hello?

MsgBox str, this seems to contradict what has been said above, but it is not. VBS strings are indeed compatible with Chr (0) characters, and MsgBox is truncated by Chr (0) because MsgBox internally calls the MessageBox function, which uses NUL as the string Terminator.

The copy code is as follows:

/ * for simplicity, only one parameter is implemented.

The second parameter of * MessageBox is terminated with NUL

* Pointer to a null-terminated string that contains the message to be displayed.

* so the Chr (0) contained in the VBS string truncates the string

, /

Int MsgBox (const BSTR str)

{

Return MessageBoxW (NULL, str, L ", 0)

}

That is, if the VBS built-in function or some methods of the COM component call the string parameter of Windows API in its internal implementation with NUL as the Terminator, it will be truncated by the Chr (0) character.

The above is all the content of the article "how VBS strings are implemented internally". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report