In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "how to use Unity 2018 depth optimization rendering pipeline". In daily operation, I believe many people have doubts about how to use Unity 2018 depth optimization rendering pipeline. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "how to use Unity 2018 depth optimization rendering pipeline". Next, please follow the editor to study!
There has been no article during this period of time, in fact, in addition to the busy school curriculum, but also spent a lot of time on sorting and optimizing the old code, a large part of which includes the "stringent" of memory management, and even the use of some new features to render pipeline code executed in CPU for in-depth optimization.
First of all, let's introduce the three existing compilation methods of Unity, Mono VM, IL2CPP, Burst.
The Mono virtual machine is a traditional and ancient implementation scheme, and its cross-platform advantage makes the early Unity always rely on it. However, the performance of Mono itself is very poor, and the code execution efficiency is only about half of that of Native C++. This also makes the early Unity put on the label of "can't do PC", "can't do masterpiece", "engine performance is poor", and now, except that Mono VM,PC is still used in Editor for development efficiency, the host computer The demand for mobile phones and other platforms is getting lower and lower.
IL2CPP uses IL as the middle tier to compile the code into Native CPP. After the release of the official version 2018.3, great progress has been made in the compilation and optimization of IL2CPP, and some odd minor problems have been gradually reduced. Officials also highly recommend the use of Mono instead of high-end platforms, which will be the main compilation method we use at present. But this compilation method is not without pits, on the contrary, it has more pits than Mono VM. Because after the translation of the IL layer, some advanced uses of C # are very likely to cause additional performance consumption in the process of translation, so we should use the thinking of CPP to consider development. After all, there is still a huge gap between game development and traditional .net development. Of course, it does not require all developers to fly pointers in their code like me and write C# as C++. But at the very least, we should pay attention to the problems of programming habits, which should be sacrificed at the expense of development comfort (laughter).
Burst Compiler is a compiler dedicated to scientific operations, because it has no other functions and does not support managed types, so there cannot be any code in the code that accesses managed types. Since managed types do not support it, object-oriented programming such as abstraction and polymorphism is not allowed. It can be said that it is the most primitive C language programming. Burst is designed to handle scientific operations, such as processing matrices, vectors, etc., to provide optimized cool techs such as SIMD. Officials have been deifying Burst, saying that its performance can far outperform Clippers, but in fact, in my opinion, Burst can only be used when dealing with individual data processing methods, not much faster than C++ when dealing with other ordinary code, while only unmanaged types are allowed, and banning all OOP is often impractical for project development (at least until ECS is fully popularized). So at present, Burst will only be used by us as a development seasoning.
Our main optimization ideas are basically divided into the following categories:
Try to use unmanaged data types and manage memory manually to achieve a true Runtime 0 GC.
Try to develop C# as C++, the reasons have been described above.
Use Job System computing logic as much as possible to give full play to the advantages of multithreading.
Try to compile the code for pure mathematical operations using Burst Compiler.
The GPU Driven Pipeline code introduced in the previous article actually has a lot of room for optimization, so we first deal with the lighting part, because our pipeline uses Tile/Cluster Based Deferred Rendering, so we need to prepare the data of Compute Buffer in CPU, and the operation of light shadow needs to generate the matrix of the cone, so the operation of this part can be thrown into Job System for operation. Of course, there are a lot of parts in this section that access managed types, and most operations are logic rather than operations, so here you use traditional IL2CPP compilation instead of Burst, and the code structure looks something like this:
Because the entire rendering pipeline exists as a singleton, it is very safe to use static directly. In addition, instantiated variables are not allowed in Job System, so static can only be used here. If you do not want to use static, but you must use managed types, it is not impossible. My solution is as follows:
Such a method can forcibly assign the address of the managed type to the pointer, but the pointer is not within the administrative scope of GC, so it is important to pay attention to whether GCHandle should be turned on to ensure that the managed type will not be killed by GC during the execution of unmanaged code. Of course, Unity's Component is managed by the engine, so there is no need to consider that C # GC will be a demon.
As you can see, in the Job above, the information and matrix including the point light source and the spotlight source are processed, and the calculation results are transferred to the NativeArray and NativeList above, and finally the values are passed directly into the Compute Buffer. The calculation of this process has always been a logical focus. For example, Cubemap needs to calculate the View projection Matrix of 6 faces. After putting this part of the code into Job, in my high-end PC testing machine, when there are a large number of lights, the code execution time is directly reduced by 0.1ms-0.2ms, which is a gratifying progress, especially the pressure of the main thread in the later stage of project development may become a performance shortboard, such optimization is very meaningful.
In addition, there is a big head of the operation, that is, the matrix layout of Cascade Shadowmap and the calculation method of Cascade Shadowmap, which does not need to be explained here, among which the more expensive one is to obtain the cube position through Frustum Corner and to deduce the world coordinates through the inverse matrix of ViewProjection, and then calculate the checkerboard of pixels to prevent shadowmap jitter caused by camera movement, while the derivation of inverse matrix seems unattractive. In fact, the amount of computation is quite high, and the consumption on the sharing of CPU with weak single-core computing power should not be underestimated, so it is a good choice to put it into Job System. It is worth mentioning that this process belongs to pure numerical operation, so you can use Burst Compiler:
Pass in a lot of attribute values that need to be used or returned, and finally calculate in Execute, it is worth mentioning that Unity's existing GL.GetGPUProjectionMatrix does not support Burst nor operation in sub-threads, which is really crappy, so we can only write a D3D and OpenGL projection matrix standard conversion function, and because Burst does not support static variables, even isD3D this kind of flag also needs to be passed manually (a faint feeling of egg pain).
Although the code becomes extremely ugly because of the ghosts and animals restrictions of Burst, But it just works well! The code for the calculation part is roughly as follows:
There are 4 cascade in total, so this code will be executed in 4 threads, which saves us the 0.1ms of the main thread.
The operation of lighting data is basically the most expensive part of the whole rendering pipeline. After all, SRP has encapsulated the logic of culling more complex scenes, so what we actually need to do is probably these custom data types, and the cone of some custom Volume components also need to be calculated manually, such as Fog Volume, which is mentioned in Froxel volumetric Rendering before, that is, the component that manages the local fog effect. We have also used Burst Compiler acceleration without giving way:
There are still many other large and small Job. The principle of Unity Job System is to accumulate tasks together and start to calculate all tasks at the same time by calling JobHandle.ScheduleBatchedJobs (). This is because starting the thread itself is a relatively expensive step, so we should try our best to pile more tasks into the Job Queue, and then let the execution process be responsible for starting the task, and finally ensure the completion of the current task through Complete. Ensure thread safety. To sum up, the use of Job System is based on the principle of "benefit more, benefit miscellaneous, not scatter". It is because each struct needs memcpy when joining the queue, so the size should not be too large. At the same time, more tasks are easier for Unity's Job Threads to choose and balance the core load. It means that many tasks, large and small, should be written as Job rather than stacked in the main thread as much as possible. If it is not beneficial, it reflects the principle of ScheduleBatchedJobs just mentioned. All tasks should be started at the same time as far as possible to ensure the unity of execution.
In addition to multithreading optimization, we have also completed a lot of work on memory optimization. First of all, it is worth saying that the use of Unity's Memory Allocator is worth mentioning. Friends who have learned the C language naturally know the usage of Malloc and Free. Needless to say here, Unity provides three kinds of Allocator, namely Temp, TempJob and Persistant.
Temp: open memory will be released after the end of the frame, suitable for use only in the current frame, but also the fastest to open, it can be said that its opening speed is second only to stack memory stackalloc, if you want to deal with the data transmitted within the frame, you should use this as much as possible, and do not need Free, after the end of the frame will be recovered by Allocator itself, there is no danger of leakage.
TempJob: second only to Temp, it can maintain 4 frames, and it will be automatically recycled after 4 frames. Frankly speaking, I personally don't like this setting very much, because the limit of "4 frames" seems a bit irrelevant, but existence is justice and must have a purpose.
Persistant: permanently open up memory, opening speed is the slowest, you must manually call Free, otherwise it will leak, this is also the most traditional malloc method, suitable for resident data structures.
With the generics limitations of unmanaged given in C # 7.3, you can use this memory management system to write generics that manage memory manually, truly achieving 0 GC. However, at present, the engine itself still has some limitations, such as SRP does not support access to the Component type of ECS, so the code is still difficult to achieve pure "performance-oriented programming", always seems a bit irrelevant, but also carries a lot of object-oriented baggage. I hope that in the next few years, Unity can gradually improve ECS and SRP, and gradually completely abandon the traditional object-oriented development.
At this point, the study on "how to use Unity 2018 to optimize the rendering pipeline in depth" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.