What is the model of generics and metaprogramming 07/06 Update SLTechnology News&Howtos

What is the model of generics and metaprogramming

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "what is the model of generics and metaprogramming". In daily operations, I believe that many people have doubts about what the models of generics and metaprogramming are. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what is the model of generics and metaprogramming?" Next, please follow the editor to study!

Overview

The following figure contains generic systems for all the languages discussed in this article to outline the main contents of this article and how they fit together.

Basic idea

Assuming that we are programming in a language that does not have a generic system, we want to implement a generic stack data structure that is valid for any data type. The difficulty is that every function and type definition we write is valid only for data of the same size, the same way of replication, and the same behavior.

How to solve this problem? There are two basic ideas, one is to find a way for all data types to behave the same way in our data structure, and the other is to make multiple copies of our data structure and slightly adjust it to deal with each data type in a specific way. These two ideas constitute two basic approaches to solving generic problems, namely, "boxing" and "monomorphism".

Boxing means that we put everything in a unified "box" so that they all behave the same way. This is usually done by allocating memory on the heap and only placing pointers in the data structure. We can make different types of pointers behave the same way, so that the same code can handle all data types. However, this may come at the cost of additional memory allocation, dynamic lookup, and cache loss. In C, this is equivalent to having your data structure store void* pointers, and you also need to convert your data pointers to void* or type conversion from void* (if the data is not already on the heap, allocate it on the heap).

Monomorphism is to copy code multiple times for different types of data we are dealing with. In this way, each code directly uses the corresponding data structures and functions without any dynamic lookup. This is fast enough, but at the cost of inflated code size and compilation time, because the same code will be compiled multiple times with minor adjustments. In C, this is equivalent to defining your entire data structure in a macro and calling the macro where it is used.

In general, boxing helps reduce compilation time, but harms run-time performance, while monomorphism produces code that is run-time efficient, but requires extra time to compile and optimize the generated code. Of course, they are also different in how to expand. Boxing allows more dynamic behavior at run time, while monomorphism gives you more flexibility to deal with different instances of common code. It is also worth noting that in some large programs, the performance benefits of monomorphism may be offset by additional instructions from additional generated code that cause cache misses.

Each of the two basic schools has many directions that can be extended to add additional capabilities or security, and different languages have taken them both in a very interesting direction. Some languages such as Rust and C # even offer both options!

Boxing

Let's take the go language as an example:

Type Stack struct {values [] interface {} func (this * Stack) Push (value interface {}) {this.values = append (this.values, value)} func (this * Stack) Pop () interface {} {x: = this.values [len (this.values)-1] this.values = this.values [: len (this.values)-1] return x}

Examples of languages that use boxing. C (void*), Go (interface {}), Java without generics (Object), Objective-C without generics (id)

Generics based on type erasure boxing

Here are some basic packing problems.

Depending on the language, we often need to do type conversion every time we read and write data structures.

It is difficult to prevent consumers from putting different types of elements into data structures, which can lead to run-time exceptions.

The solution is to add generics to the type system while still fully using the basic boxing method at run time as before. This method is often referred to as type erasure because all types in the type system are "erased" and become the same type (such as Object).

Both Java and Objective-C started with basic boxing and later added generics based on type erasure, even using exactly the same collection types as before for compatibility, but you can choose generic parameters. Take a look at the example below, which comes from the Wikipedia article on Java generics.

List v = new ArrayList (); v.add ("test"); / / A String that cannot be cast to an Integer Integer i = (Integer) v.get (0); / / Run time error List v = new ArrayList (); v.add ("test"); Integer I = v.get (0); / / (type error) compilation-time error

Inferential boxing generics with unified expression

OCaml takes this idea a step further, using a unified representation, with no primitive types that need to be boxed and assigned (just as int in Java needs to be Integer to enter ArrayList), because all objects are either boxed or represented by a pointer-sized integer, so everything is a machine word. However, when the garbage collector looks at data stored in a common structure, it needs to distinguish between pointers and integers, so integers are marked with 1 bit (the pointer will not have this bit), leaving only a range of 31 or 63 bits.

OCaml also has a type inference system, so you can write a function, and if you don't comment it, the compiler will infer the most common type, which may cause the function to look like a dynamically typed language.

Let first (head:: tail) = head (* inferred type:'a list->'a *)

Inferring a type infers a function from a list of elements of type'a'to an element of type'a'. The code confirms the relationship that the return type is the same as the list type, but can be any type.

Interface

Another limitation of the underlying boxing method is that the boxing type is completely opaque. This is fine for data structures such as stacks, but features such as generic sort functions require some additional functions, such as specific types of comparison functions. There are many different ways to implement and export this feature at run time, and you can use multiple ways in the same language. However, most of the different languages are implemented in a specific way, and then the language extension takes full advantage of the selected implementation. This means that based on different runtime methods, there are two main options: vtables and dictionary delivery.

Interface vtables

If we want to expose typed functions while sticking to the boxing strategy, we just need to make sure that there is a unified way to find functions of a given type from the object. This method is called "vtables" (abbreviated from the "virtual method table"), and it is implemented by having a pointer to the function pointer table where the offset of each object in the general structure is 0. These tables allow generic code to find a specific type of function pointer for each type in the same way by indexing certain pointers at a fixed offset.

Note to the translator, the picture is as follows:

This is how interface types are implemented in Go and how dyn trait objects are implemented in Rust. When you convert a type to an interface type, it creates a wrapper that contains a pointer to the original object and a pointer to the vtable of the interface-specific type function. However, this requires additional pointers and memory, which is why sorting in Go requires slicing to implement the Sort.Interface interface rather than slicing elements to implement the Comparable interface.

Translator's note:

To sort slice in Go language, you need to implement the Sort.Interface interface on slice (slice), as follows:

Type Interface interface {Len () int / / Len is the total number of elements in the collection Less (I, j int) bool / / returns true if the element with index I is less than the element with index j, otherwise returns false Swap (I, j int) / / Swap elements with index I and j}

Mode of use:

Package main import ("fmt"sort") / / defines interface {} and implements three methods of sort.Interface interface: type IntSlice [] int func (c IntSlice) Len () int {return len (c)} func (c IntSlice) Swap (I, j int) {c [I], c [j] = c [j], c [I]} func (c IntSlice) Less (I, j int) bool {return c [I]

< c[j] } func main() { a := IntSlice{1, 3, 5, 7, 2} b := []float64{1.1, 2.3, 5.3, 3.4} c := []int{1, 3, 5, 4, 2} fmt.Println(sort.IsSorted(a)) //false if !sort.IsSorted(a) { sort.Sort(a) } if !sort.Float64sAreSorted(b) { sort.Float64s(b) } if !sort.IntsAreSorted(c) { sort.Ints(c) } fmt.Println(a)//[1 2 3 5 7] fmt.Println(b)//[1.1 2.3 3.4 5.3] fmt.Println(c)// [1 2 3 4 5] } 对于Java来说，对数组排序需要在数组/集合元素上实现Comparable 接口，代码如下： class Simpson implements Comparable { String name; Simpson(String name) { this.name = name; } @Override public int compareTo(Simpson simpson) { return this.name.compareTo(simpson.name); } } public class SimpsonSorting { public static void main(String... sortingWithList) { List simpsons = new ArrayList(); simpsons.add(new SimpsonCharacter("Homer ")); simpsons.add(new SimpsonCharacter("Marge ")); simpsons.add(new SimpsonCharacter("Bart ")); simpsons.add(new SimpsonCharacter("Lisa ")); Collections.sort(simpsons); simpsons.stream().map(s ->

S.name) .forEach (System.out::print); Collections.reverse (simpsons); simpsons.stream () .forEach (System.out::print);}}

Object oriented programming

Object-oriented programming languages make good use of the power of vtables. Object-oriented languages like Java do not have separate interface objects that contain vtables, but instead have a vtable pointer at the beginning of each object. Languages like Java have inheritance and interface systems, which can be implemented with these object vtables.

In addition to providing additional functionality, embedding vtables in each object also solves the problem of constructing new types before. Unlike Go, in Java, sorting functions can use the Comparable interface on this type.

Reflection

Once you have vtables, it is not difficult for the compiler to generate other type information, such as field name, type, and location. This allows you to access all the data in one type with the same code, which can check the data in any other type. You can add a "reflection" function, which can be used to implement functions such as any type of serialization. As an extension of the boxing paradigm, it has the same problem, that is, it requires only one piece of code, but requires a lot of dynamic lookups, which can lead to poor serialization performance.

Languages with reflection capabilities and examples of their use for serialization include Java, C #, and Go.

Dynamically typed language

Reflection is very powerful and can accomplish many different metaprogramming tasks, but one thing it can't do is create a new type or edit the type information of an existing field. If we add this capability and implement it through reflection, we will eventually get a dynamically typed language. In languages like Python and Ruby, its super reflective system brings amazing metaprogramming capabilities, and code that uses its metaprogramming capabilities is ubiquitous.

"but Tristan, dynamic languages don't work this way, they just use hash tables to do everything!" Some people might say that. Well, a hash table is just a data structure that implements an editable type information table. And that's just how some interpreters like CPython work. If you take a look at how a high-performance JIT like V8 is implemented, it behaves like vtables and reflect information! V8's hidden classes (vtables and reflection information) and object layout are similar to those you see in the Java virtual machine, except that objects can be changed to the new vtable at run time.

Dictionary delivery

In addition to associating vtables with objects, another way to implement dynamic interfaces is to pass the required function pointer tables to the generic functions that need them. This approach is somewhat similar to constructing a Go-style interface object on call, except that the function pointer table is passed as a hidden parameter rather than packaged together as one of the existing parameters.

Although this approach is used by Haskell type classes, GHC (GHC is the Haskell compiler) can also do singularization optimization through inlining and specialization. Dictionary delivery is also used by OCaml, which provides an explicit parameter passing dictionary in the form of a first-class module, but there is also a mechanism for suggesting the addition of implicit parameters.

Swift Witness Tables

The generic implementation of Swift is more interesting, passing it through a dictionary while putting the size of the type and how to move, copy, and release them into a function pointer table that provides all the information you need to handle any type in a uniform manner without boxing. In this way, Swift can implement generics without singularization, and there is no need to use a uniform expression for all types. Although there are still all dynamic lookup costs, it also saves the cost of allocating memory, memory, and cache incoherence. The Swift compiler can monomorphize and inline generics within and across modules using functions annotated with @ inlinable to avoid these costs, using heuristic algorithms to estimate how much code will swell.

This feature also explains why Swift allows you to add and rearrange fields in your structure to achieve ABI stability, although they provide the @ Frozen attribute to opt out of dynamic lookups for performance reasons.

Connotation type analysis

Another way to implement an interface for a boxing type is to add the type ID to a fixed part of the object, like where the vtable pointer accesses, and then generate a function for each interface method, with a large switch statement on all types that implement the interface method and dispatch it to the correct specific type method.

I don't know what languages use this technique, but C++ compilers and Java virtual machines do similar things when using profile-guided optimization to understand that a common call point works primarily on certain types of objects. They check each generic type instead of the call point, and then statically schedule that generic type, usually dynamic scheduling as a backup. In this way, the branch predictor can predict the general situation branch that will be taken and continue to schedule instructions through static calls.

Monomorphism

Another way to implement generics is monomorphism. In this way, you need to find a way to output multiple versions of the code for each type. When the compiler compiles, the code goes through multiple stages of expression, and theoretically we can copy it at any one of these stages.

Generate source code

The easiest way to monomorphic is to copy at the source code level. In this way, the compiler does not even need to support generics, as users of languages such as C and Go (the compiler does not support generics) sometimes do.

In C, you can use a preprocessor to define your data structure in a macro or header file and include # defines multiple times. In Go, there are scripts like genny that simplify the code generation process.

The disadvantage of this is that copying the source code has many drawbacks and edge conditions that need to be noted, and multiple parsing and type checking of basically the same code also brings a lot of extra work to the compiler. Secondly, depending on the language and tools, this generic method will be ugly to write and use, for example, if you write a macro in C language macro, each line should end with a backslash, and all types and function names need to be manually connected with identifiers to avoid collisions.

D string mixins

However, code generation does have some benefits, that is, you can use an omnipotent programming language to generate code, and it uses methods that users are already familiar with.

Some languages that implement generics in other ways also include a clean way of code generation to address more general metaprogramming use cases that are not covered by their generic systems. The most obvious example is the D language's string mixin, which uses all the features of D to generate D code into strings in the middle of compilation.

Rust process macro

A similar example is Rust's procedure macro, which takes the token stream as input and outputs the token stream, while the provider converts the token stream to a string or from a string to a token stream. The advantage of this approach is that the token stream can save source code location information. Using macros, you can directly paste user-written code from input to output in the form of token. If the user's code causes a compiler error in the macro output, the error message from the compiler output will correctly point to the file, row, and column where the user code is located, but if the macro generates an error, then the error message will point to the macro call. For example, if you use a macro that encapsulates a function in a log call, and an error occurs in the implementation of the encapsulated function, the compiler's error will point directly to your code where the error is, not to the macro.

Syntax tree macro

Some languages do go a step further, providing the ability to consume and generate abstract grammar tree (AST) types in macros. Examples of this include templates Haskell, Nim macros, OCaml PPX, and almost all Lisps.

The problem with AST macros is that you don't want users to learn a bunch of functions that construct AST types. The Lisp family of languages solves this problem, and its syntax corresponds very directly to AST, but the construction process is still tedious. Therefore, all the languages I mentioned have some form of "reference" primitive, and if you provide a code snippet in the language, it will return the syntax tree. These reference primitives also provide methods to concatenate the values of the syntax tree, just like string concatenation. The following is an example of a template Haskell.

-- using AST construction functions genFn:: Name-> Q Exp genFn f = do x $(varE f) x |]

One disadvantage of doing procedural macros at the syntax tree level rather than the token level is that syntax tree types often change with new language features, while token types remain compatible. For example, OCaml's PPX system requires a special infrastructure to migrate the parse tree to the language version used by the macro. Rust's related libraries add parsing and referencing utilities, so you can write syntax tree macros in a style similar to procedural macros. Rust even has an experimental library that provides reflection in this way.

Template

The next way to implement generics is to push the generated code to the next stage of compilation. The templates used in C++ and D use this way, you can specify "template parameters" on types and functions, and when you instantiate a template of a specific type, the type is replaced into the function, and the function is then type-checked to make sure the combination is valid.

Template T myMax (T a, T b) {return (a > b?a:b);} template struct Pair {T values [2];}; int main () {myMax (5Power6); Pair p {5pm 6}}; / / This would give us a compile error inside myMax / / about Pair being an invalid operand to `>`: / / myMax (p, p);}

The problem with the template system is that if you include a template function in your library and the user instantiates it with the wrong type, the compilation error is difficult to understand. This is very similar to what can happen when a library in a dynamically typed language handles the user passing the wrong type. The D language has an interesting solution, similar to the popular practice in dynamic languages: just use the helper function to check whether the type is valid, and if it fails, the error message points to the helper function! The following is an example from the D language.

/ / We're going to use the isNumeric function in std.traits import std.traits; / / The `if` is optional (without it you'll get an error inside like C++) / / The `if` is also included in docs and participates in overloading! T myMax (T) (T a, T b) if (isNumerichia T) {return (a > b?a:b);} struct Pair (T) {T [2] values;} void main () {myMax (5Power6) / / This would give a compile error saying that `(airplane int, airplane int)` / / doesn't match the available instance `myMax (T a, T b) if (isNumericint) `: / / myMax (p, p);}

Clocking 20 has a feature called "concepts" that works the same way except that it is designed to be more like defining interfaces and type constraints.

Compile time function

D's template has many extensions that allow you to use compile-time function evaluation and static if to make the template behave like a function, accepting a set of parameters at compile time and returning a non-generic runtime function. This makes D template a full-featured metaprogramming system. As far as I know, modern C++ templates have similar functions, but the implementation mechanism is not clean enough.

Other languages take the concept that generics are just compile-time functions a step further, such as Zig.

Fn Stack (comptime T: type) type {return struct {items: [] T, len: usize, const Self = @ This (); pub fn push (self: Self, item: t) {/ /...}};}

Zig uses the same language at compile time and run time, and functions are distinguished based on whether or not parameters are marked as comptime. There is also a language that uses a separate but similar language at the meta level level, called Terra. Terra is a dialect of Lua that allows you to build low-level functions similar to the C language and then manipulate them at the meta-level using Lua API and the original language of reference and splicing.

Function MakeStack (T) local struct Stack {items: & T;-& T is a pointer to T len: int;} terra Stack:push (item: t) -. End return Stack end

Terra's crazy metaprogramming capabilities allow it to do a lot of things, such as implementing domain-specific language compilers as simple functions, or implementing Java and Go interfaces and object systems in libraries with a small amount of code. It can then save the generated runtime code as an undependent object file.

Rust generics

The next type of monomorphic generics is to take the code generation process a step further after type checking. As mentioned above, C++ can be used to obtain error types in generic library functions like those in dynamically typed languages, because there is basically only one type in the template parameters. So this means that we can solve this problem by adding type systems to our meta-level and statically check whether they support the operations you use. This is how generics work in Rust, and at the language level, in Swift and Haskell.

In Rust, you need to declare "trait bounds" on your type parameters, where trait, like interfaces in other languages, declares a series of functions provided by types. The Rust compiler checks to see if the body of your generic function works with any type of trait bounds, and does not allow you to use functions that are not declared by trait bounds. This way, when a generic function in Rust is instantiated, it will never get a compiler error in the library function. The compiler also needs to do type checking only once for each generic function.

Fn my_max (a: t, b: t)-> T {if a > b {a} else {b}} struct Pair {values: [T; 2],} fn main () {my_max (5Pair 6); let p: Pair = Pair {values: [5J 6]}; / / Would give a compile error saying that / / PartialOrd is not implemented for Pair: / / my_max (pjingp);}

At the language level, the type system required for boxed generics is very similar, which is why Rust can use the same type system to support both generics! Rust 2018 even adds a uniform syntax where the v: & impl SomeTrait parameter is monomorphic, but the v: & dyn SomeTrait parameter is boxed. This approach also allows compilers such as Swift's compiler and Haskell's GHC to use boxing as a means of optimization even if boxing is used by default to implement generics.

Machine code monomorphism

The next step in monomorphic generics is to move forward in the compiler backend. Just as we can copy source code templates with generic type placeholders, we can generate machine code with specific type placeholders. Then we can work like a linker, and with memcpy and some patches, we can quickly mark up these templates! The disadvantage is that each monomorphic copy cannot be specifically optimized by the optimizer, but because there is no repetitive optimization, the compilation speed can be much faster. We can even make the code stamper into a small JIT that is included in the binary file and mark the monomorphic copy at run time to avoid the expansion of the binary file.

In fact, I do not know which language generics work in this way, this is just an idea when I am writing this article, as a natural extension of this taxonomy, this is what I hope to get out of it! I hope this article will give you a better understanding of generic systems in different languages, how to classify them, and promote your thinking, and maybe we will find new cool programming language directions.

At this point, the study of "what is the model of generics and metaprogramming" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.