Data Affinity Analysis of web programming language 04/27 Update SLTechnology News&Howtos

Data Affinity Analysis of web programming language

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article focuses on "data affinity analysis of web programming language". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "data affinity analysis of web programming language".

At present, programming languages seem to have entered a period of vigorous development. A number of newer languages such as JavaScript, Perl, Python, Ruby and Groovy are becoming more and more familiar and used, while mainstream languages such as C++, C# and Java are constantly integrating functional and dynamic features. There are more and more treasures to choose from in the treasure chest of programmers, and the comparison and debate among languages in the community is becoming more and more hot. we often see comparisons about "process-oriented and object-oriented", "dynamic and static languages", "imperative and functional paradigms" and so on. I have noticed that most of these discussions focus on design-related topics, such as "comparison of Duck typing polymorphism of dynamic languages and inheritance polymorphism of static languages", "comparison of Prototype based and Class based" and so on. But I think there is another very important aspect to pay attention to, and that is data processing.

The reason why data processing is important is that both local information storage and information exchange between systems need to be based on a certain data format. In addition, no matter which paradigm the language belongs to and what pattern is used in the design, a large part of the program's work at the micro level is data processing. Therefore, it is of great practical significance to compare and understand the differences between languages from the perspective of data processing. Although data is usually platform-independent, different languages show different difficulties in dealing with data in a certain format, and even some data formats can only be implemented in a specific language, which is the difference in data affinity.

The data affinity (Data Affinity) of a language refers to the degree of compatibility between the language and a certain data format, which mainly depends on the data model of the language, the type system, and the support of the library. The stronger the affinity of the language to a certain data format, the easier it is to manipulate a certain type of data.

Binary byte block format

In low-level operating systems, embedded and communication systems, binary byte blocks are the most common data format. Because of its compact layout and proximity to machines, binary data is often used as a data format for inter-system communication or system files. However, the general high-level language is not convenient to deal with 0101 directly, but based on the structured representation of operational data such as records, structures and classes, so there is a problem of conversion between low-level binary byte blocks and high-level structured data.

As the most important system language, C language has a high affinity for byte block data. This is not only because the C language has pointers that can directly access memory, but also because the struct of C can establish a direct mapping relationship with byte blocks. For example, in a distributed system based on Socket connection, the server and the client communicate through binary byte data. As long as the two sides of the communication define the common structure in advance, the sender first creates the corresponding structure variable and fills the field, then copy the memory block corresponding to the variable to Socket, and the receiver reads the byte block from Socket. Then convert the byte block to the corresponding structure pointer to read the field information. In the whole process, both sides of the communication have no complex process of information encoding and decoding. The sample code is as follows:

Struct t_data {int version; char type [10]; float value;}; / / sender struct t_data data; data.version = 1; strcpy (data.type, "degree"); data.value = 189.0; send (socket, (char*) & data, sizeof (data)); / / receiver struct t_data data; read (socket, (char*) & data, sizeof (data)) Printf ("d, s, f", data.version, data.type, data.value)

The above method also needs to pay attention to memory alignment and size problems in practical application. The memory alignment problem can be controlled by the compiler preprocessing command to ensure that the struct structure in memory has the same alignment with the transmitted byte blocks; the size side problem requires both sides of the communication to adopt the same size end mode, otherwise it needs to be converted.

C++ is fully compatible with the C structure, but if virtual functions are defined in C++ 's classes (including class and struct), the structure's byte block data affinity will be lost, which is a tradeoff when C++ programming. Byte-block data affinity is hard to see in other languages except for CAccord codes +, because it allows you to control the memory layout of structures / objects and allows non-type-safe casting of pointers, which is not allowed in languages such as Java,C#. Therefore, the encoding and decoding of byte blocks in Java and C # can only be parsed according to the offset and length of each field according to the protocol. The pointer and the direct mapping of structure and memory bring the affinity to byte block data, but it also leaves the hidden trouble of memory access and type safety. While Java and C # have reference safety and type safety, they also lose their affinity for byte block data.

Text format

Text format is another very common data format. "the Art of Unix programming" evaluates the text format as "Text streams are a valuable universal format because they're easy for human beings to read, write, and edit without specialized tools". Pipeline processing based on text stream is an acclaimed Unix style. Shell can connect all kinds of single-function commands through the pipeline to make the text flow flow on the pipeline, so the Shell language has a good affinity for text data. Many text data processing tasks can be done by Bash, which is the One Liner style that Hacker loves.

Let's look at two examples of text processing with Bash:

1. Count the number of gz files in the current directory:

Ls-l * .gz | wc-l

two。 Count the PV of each page every day on June 26 and 27, 2011 in the Web server log service.log

Cat service.log | grep ^ 2011-06-2 [6-7] | cut-d'- f 1,3 | sort | uniq-c

Service.log:

2011-06-25 13:00:55 / music/c.htm Safari

...

2011-06-26 08:01:23 / main.htm IE

2011-06-26 08:03:01 / sports/b.htm Chrome

...

2011-06-27 11:41:06 / main.htm IE

2011-06-27 11:52:41 / news/a.htm Firefox

Output:

2011-06-26 / main.htm

231 2011-06-26 / news/a.htm

2011-06-26 / sports/b.htm

288 2011-06-27 / main.htm

2011-06-27 / news/a.htm

2011-06-27 / sports/b.htm

The above two simple text data processing tasks are much more troublesome if implemented in C or C++, with at least a dozen or dozens of lines of code, coupled with compilation and debugging, the whole development efficiency may be an order of magnitude lower than that of Shell. In addition to Shell, Perl is also known for its powerful text data processing. Let's look at an example of a Perl regular expression:

While () {if (/ hello\ s (\ w+) / I) {print "say hello to $1"} else if (/ goodbye\ s (\ w+) / I) {print "say goodbye to $1"}}

Enter:

HeLLo world

Goodbye bug

Output:

Say hello to world

Say goodbye to bug

In the above example, we see the power of Perl to directly match strings and extract data. Perl's regular expression-based string processing is not only more powerful than system languages such as Python, but also more powerful and convenient than dynamic languages such as Python. This is because regular expressions are "first-class citizens" of the Perl language, which makes Perl have better text data affinity than other languages that support regular expressions in a library way. Later Ruby also learned that Perl directly supports regular expressions in the language.

Structured text format

XML is a universal (semi-structured) text data exchange format that has become popular in recent years. XML not only has the advantages of general text format, but also has the advantages of hierarchical expression and expansibility, so it has been widely used in configuration files and various Web Service since its birth. Modern programming is almost inevitable without dealing with XML, but dealing with XML in statically typed languages such as C++, Java, and C # is not a very easy task. Let's first look at an example of Java parsing and building the following XML:

Java Groovy JavaScript / / Java parses XML DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance (); try {DocumentBuilder db = dbf.newDocumentBuilder (); Document doc = db.parse ("src/languages.xml"); Element langs = doc.getDocumentElement (); System.out.println ("type =" + langs.getAttribute ("type")); NodeList list = langs.getElementsByTagName ("language"); for (int I = 0; I < list.getLength ()) List.item +) {Element language = (Element) list.item (I); System.out.println (language.getTextContent ());}} catch (Exception e) {e.printStackTrace ();} / / Java create XML DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance (); try {DocumentBuilder db = dbf.newDocumentBuilder (); Document doc = db.newDocument (); Element langs = doc.createElement ("langs") Langs.setAttribute ("type", "current"); doc.appendChild (langs); Element language1 = doc.createElement ("language"); Text text1 = doc.createTextNode ("Java"); language1.appendChild (text1); langs.appendChild (language1); Element language2 = doc.createElement ("language"); Text text2 = doc.createTextNode ("Groovy"); language2.appendChild (text2); langs.appendChild (language2) Element language3 = doc.createElement ("language"); Text text3 = doc.createTextNode ("JavaScript"); language3.appendChild (text3); langs.appendChild (language3);} catch (Exception e) {e.printStackTrace ();}

In order to parse and create a small piece of XML code, you need to write such lengthy Java code, while the dynamic language Groovy to implement the same functionality is very simple:

/ / Groovy parses XML def langs = new XmlParser (). Parse ("languages.xml") println "type = ${langs.attribute (" type ")}" langs.language.each {println it.text ()} / / Groovy create XML def xml = new groovy.xml.MarkupBuilder () xml.langs (type: "current") {language ("Java") language ("Groovy") language ("JavaScript")}

The above Groovy code for operating XML is concise and expressive, and the code almost corresponds to XML, just like DSL operating directly on XML, while the corresponding Java code does not see the shadow of XML. This shows that Groovy has a high affinity for XML data. Why is there such a difference in XML affinity between Java and Groovy? The reason is that Java requires that all methods and attributes must be defined and then called, strict static type checking makes Java can only express XML elements as "second-class citizens", while Groovy has no restrictions on static type checking and is free to use methods and attributes to express XML structures. In the above example of creating XML with Groovy, there are actually no methods such as langs and language in the groovy.xml.MarkupBuilder class, but the corresponding XML structure is automatically created when called.

In addition to XML, JSON is another general semi-structured plain text data exchange format, which is often regarded as lightweight XML. The original meaning of JSON is the object representation (Javascript Object Notation) of Javascript, which belongs to the syntax subset of Javascript, and Javascript has native support for JSON. Here is an example of creating a JSON object in Javascript:

Var json = {"langs": {"type": "current", "language": ["Java", "Groovy", "Javascript"]}}

Many Javascript programs get the JSON string from the server through AJAX and then parse the string into a JSON object. Because of Javascript's native support for JSON, parsing JSON strings in Javascript can be done in a general eval way, such as:

Var json = eval ("(" + jsonStr + ")"); alert (json.langs.type)

You can even:

Eval ("var json =" + jsonStr); alert (json.langs.type)

However, the versatility of eval brings some security risks, so it is only recommended to use eval to parse JSON for trusted data sources, and a special JSON parsing library can be used for untrusted data sources. In any case, Javascript's native support for JSON makes Javascript have a high JSON data affinity. In addition, Groovy 1.8 also adds native support for JSON, which makes JSON as easy to operate as Javascript.

At this point, I believe you have a deeper understanding of "data affinity analysis of web programming language". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.