In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
With the deepening of the impact of the Internet in various industries, the scale of data is getting larger and larger, enterprises also pay more and more attention to the value of data. As a professional data intelligence company, Getui started from the message push service. After years of continuous hard work, it has accumulated and deposited a large amount of data, and has also carried out in-depth exploration and practice in the field of data visualization.
The exploration and application of data visualization based on the demand, from open source platform to customized development combined with personalized needs, to create excellent data visualization works such as push real-time message push map, crowd distribution heat map and so on. In this process, a push accumulates a large number of data visualization components and polishes its own data visualization technology ability. Among them, the push heat map is being used in smart cities, population spatial planning, public services and other fields to provide strong data support.
Send a picture with a tweet.
A population heat map of the lakeside business area
This article will share with you the practice of data visualization, the problems encountered and the ideas to solve them. I hope you can benefit from it.
I. the composition of data visualization
Data visualization consists of four types of visual elements: background information, ruler, coordinate system, and visual cues.
1.1 background information
Background information is information about additional classes such as titles, units of measurement, comments, and so on. The main purpose is to help the large screen audience better understand the relevant background information, that is, 5W information: who (who), what (what), when (when), where (where), why (why).
1.2 ruler
The ruler is mainly used to measure the size of data in different directions and dimensions, such as digital ruler, classification ruler, time ruler and so on, similar to the scale we are familiar with.
1.3 coordinate system
The coordinate system has a structured space, as well as rules that specify where graphics and colors are drawn. When encoding data, objects are placed at a specific location in that space, which gives meaning to X, Y coordinates or latitude and longitude. The common coordinate systems are Cartesian coordinate system, polar coordinate system and geographical coordinate system. The polar coordinate system is used in the pie chart; the X axis and Y axis are used in the column chart, which is the Cartesian coordinate system; and the geographical coordinate chart is used in the thermal map.
1.4 Visual cues
Visual cues are elements used to encode data, such as position, length, size, direction, and so on. In 1985, Bell Labs released a list of suggestive ranking of visual elements. As shown in the list, from top to bottom, the brain's perceptual system is sensitive to these symbols and location, from highest to lowest: position, length, angle, direction, shape, area / volume, hue and saturation.
Visual element hint sort list released by Bell Labs in 1985
Second, the application of data visualization
According to different types of data structure, the application of data visualization is also different, such as statistical data chart, relational data chart and geospatial data chart.
2.1 Statistics Chart
The commonly used statistical charts are line chart, bar chart, pie chart and radar chart. Among them, the visual element in the linear chart is the direction, from which we perceive the changing trend; the visual element in the bar chart is the length, from which we perceive the size of the value represented by the data; while the visual elements in pie chart and radar chart are angle and area respectively.
2.2 Relational data chart
The commonly used relational data charts are relationship diagrams, flow charts, tree diagrams and mulberry diagrams. The most important thing in relational data charts is relationships. From the rendering level, there are two most important difficulties in the diagram: layout and clustering. Layout means how to distribute the data to be displayed, diagrams, flowcharts, trees, etc., but the layout is different; clustering is to simulate and visualize the real relationship, for example, which entities belong to the same category, are close to each other, or have subordinate relationships, and so on.
2.3 Geospatial data chart
The visualization charts of geo-spatial data include scatter map, path map, thermal map, distribution map and so on. The characteristic of geospatial data chart is based on geographical coordinate system.
At present, there are many researches on geospatial data visualization in the industry, such as Loca by Amap and kepler.gl by Uber and mapbox, which are all excellent application cases of geospatial data visualization.
A commuter map of work and housing between cities in Britain uses visual cues of direction and color.
The seismic density map of a city shown in kepler.gl uses visual cues of location, time and color.
In addition to the above four commonly used data visualization charts, in fact, there are many other types of charts, such as word cloud charts, time series data charts and so on.
Third, the basic principle of the map
In the practice of visualization of geospatial data, map rendering is a very important step.
Map rendering steps
The above picture clearly shows the steps of map rendering:
First, the earth is projected into a flat map through Mercator.
Then, according to the real scene, the plane map is divided into layers of maps with different precision and arranged into a pyramid.
In the end, the details of the map will be divided into tiles.
Map rendering involves two important nouns: map projection and map tiles, which are explained in detail below:
3.1 Map projection
According to the different forms of projection, there are three kinds of map projection: conical projection, cylindrical projection and azimuth projection; according to the position of the projection direction, it can be divided into three kinds: positive axis projection, horizontal axis projection and oblique axis projection. What we want to say here is that because of the projection, the map can not be accurately restored, and the plane map after projection expansion will definitely have a deformation, which can be divided into isometric projection, equal area projection, arbitrary projection and so on.
According to different map scenarios, we need to choose different projection algorithms. Now many projection algorithms are ready-made and do not need to be written manually. Among them, isometric projection is a common one, among which Mercator projection is a map projection algorithm widely used by map manufacturers.
Different ways of map projection
3.2 Map tiles
After the Web Mercator projection, the map becomes a flat map. Because sometimes we need to look at macro map information (such as the national boundaries of each country in the world map), and sometimes we need to look at very micro map information (such as road conditions during navigation). To do this, we need to grade this map.
Pyramid coordinate system of map tiles
At the highest level (zoom=0), you need the least information and only need to retain the most important macro information, so you can use a picture of 256x256 pixels to represent it; at the next level (zoom=1), the amount of information increases, which is represented by a picture of 512x512 pixels; and so on, the lower the level of the pixel, the higher the pixel, and the next level of pixels is four times the current level. In this way, a pyramid coordinate system is formed from the highest level down to the lowest level.
For each picture, we cut it into 256x256 pictures and become Tile. Thus, at the highest level (zoom=0), there is only one tile; at the next level (zoom=1), there are 4 tiles; at the next level (zoom=2), there are 16 tiles, and so on.
Fourth, the visualization practice of push data.
The data visualization construction of a push includes issuing maps, thermal maps and so on.
1) A tweet chart shows in real time the cumulative number of messages posted on the day of a tweet and the portrait of the group sent by the app (including sex ratio, age distribution, Top5 of the city sent by the app, etc.).
Send a picture with a tweet.
2) the population thermal map of each region visually presents the data of regional population distribution, population sex ratio, population age and so on.
A population heat map of the lakeside business area
Next, take the following diagrams and thermal maps as examples to analyze the next data visualization practice.
4.1 Prophase technology selection
From the perspective of efficiency and economy, we first investigated whether the ready-made solution can meet the demand.
Plan 1: map application
As mentioned earlier, the map is rendered in the form of map tiles, and the map application can not achieve the effect in the design draft, so this scheme is not feasible.
Plan 2: chart application
Comprehensive chart libraries such as ECharts can basically achieve the effect of some maps, and can change the perspective, and the configuration is simple; but the effect of the middle line of ECharts is very limited, can not achieve the desired gradient and landing effect in the design draft, and can only be given up.
Option 3: D3.js
D3.js is very good, we call it the jQuery of the chart world, and it can basically achieve the effect we want. However, it also has a problem, that is, it uses SVG. SVG is a vector graphics format that protects images from distortion during rendering, but there are performance problems if used for animation.
Here, we compare the performance of SVG and Canvas: when the number of flights reaches 100, the number of animation frames of SVG is only 12-43, and the CPU usage of Canvas is much better. Basically, the CPU occupancy rate of 42-60 FPS is 20%, 30%, which is better than other aspects such as memory usage.
Performance comparison between SVG and Canvas when the number of flights reaches 100
Taken together, none of the above three schemes is perfect. So, in the end, we decided to do it in our own way.
4.2 step 1: layering
First of all, as shown in the following figure, before rendering the geographic data, we layer it according to the data type:
1) bottom layer of the map
2) Thermal layer
3) flying line layer
4) any other geospatial data layer, such as bar chart, traffic map, etc.
Layering according to data type
4.3 step 2: implementation of the bottom layer of the map
1) data-configuration: get the data of Chinese map from Aliyun DataV, and then get the converted data through Mercator projection algorithm.
2) Canvas rendering: render the data to Canvas. Here, the Mercator conversion function of D3.js is used, and then the .context method is used to render to Canvas.
3) adjust the effect: after rendering the map, adjust the effect, such as shadow, border, deformation and so on.
4.4 step 3: realization of thermal map
The heat tries to show the favorite page area and the geographical area of the visitor in a special highlighted form.
The thermal diagram has two important parameters: Max (threshold) and Radius (radius).
Max: the threshold, which is the ruler we just talked about, tells us the meaning of a color segment. In this picture, 0 indicates the lowest transparency value and the lightest color, and then 100 indicates that the transparency value is 1 and the color is the darkest.
Radius: the radius, which represents the effective scope and influence of the data.
For the specific implementation process of the thermal map, you can refer to an article pushed before a push: data visualization: talking about how to realize the thermal map in the front end.
4.5 step 4: the realization of the flying line layer
The implementation of the distribution layer can be divided into three parts: curve, animation and light effect.
On the specific implementation of the flying line layer, you can click to view: the practice of sending pictures under data visualization, the space is limited, so we will not repeat it here.
5. Problems encountered
In the process of carrying out the practice of data visualization, we also encountered some problems. Here we mainly share two problems: the rendering stutter problem of cross-level thermal map and the corresponding problem of data layer after style deformation.
Problem 1: rendering stutter problem of cross-level thermal map
Because the data of the thermal map itself is very large, when the view level span occurs, the order of magnitude of the data increases exponentially, which is a great test for the performance. Finally, the effect of data visualization will have the problem of stutter.
In order to solve this problem, we have made several optimizations:
Request optimization: first of all, we divide the request into 6 pieces and cut it according to the visual window, similar to the lazy loading of the picture.
Caching, anti-shaking: then caching and anti-shaking, we cache the converted thermal map data and prevent frequent operations to avoid request congestion.
Data aggregation: finally, we also do aggregation processing on the obtained data. The thermal map itself is a process of data fusion, so is it necessary for us to do another aggregation? Facts have proved that after we have done this aggregation, it is indeed effective for thermal maps with a large amount of data or too deep.
Among them, for data aggregation, we study four schemes: Kmeans, grid method, distance method, grid distance method.
Kmeans: first, randomly select n cluster centroids, then traverse the distance from each point to each cluster and classify them, and then iterate and classify again and again. However, this scheme is not suitable for thermal maps and is more suitable for diagrams.
Grid method: the grid method is relatively simple, the grid method is to draw each area of the screen into a grid, to see which data is in this grid, aggregate the points into the center of the grid, the deviation of individual points will be larger.
Distance method: the distance method is to collide by iterating each point and setting the outer square of the point. if it intersects, the point is aggregated into the aggregation point, so the result of each aggregation is different.
Grid distance method: another is the grid distance method, which, as the name implies, is the combination of the first two methods. First, iterate the lattice, calculate the grid centroid, iterate the aggregated points again, and calculate the centroid again by the distance method. Relatively speaking, the grid distance method will take a little more time in the algorithm than the grid method and the distance method, but its results will be more accurate. We also use this method to make the problem of data stutter less obvious.
Question 2: the corresponding problem of data layer after style deformation
The second problem is the correspondence of the data layer after the deformation of the style.
Because when we render the map, we use a CSS deformation to simulate a perspective effect, according to this effect, we render the effect as shown below.
Thermal maps and maps can use style deformation to simulate perspective because they are flat effects, but flying lines and points are 3D effects. Imagine, when watching the fireworks, whether the fireworks are in a straight line when they are facing us, and whether they can see the flying angle when they are at an angle of 90 degrees.
This actually confirms the cosine law, so from the simulation point of view, this effect has been achieved, as long as we convert the curvature of the curve according to the angle of view with the cosine law.
But this method is not accurate enough, for example, will the control points of the curve change with the change of perspective?
Let's take a look at another picture. We can simulate the 3D effect and render it on the screen because the eyes can deceive people. So, as long as we draw a picture that is the same as what we actually see, we think it is 3D.
In the map, we use style deformation, by setting rotate X, rotate Y, rotate Z and other three parameters for conversion, we can see that rotation is actually a series of trigonometric function transformations.
Perspective, that is, assuming that we are sitting in front of the screen at a certain distance, with this set value, we can simulate the style deformation of CSS.
Of course, the algorithm of perspective is very complex, such as single-point perspective, two-point perspective and scattered perspective. Here we simply map the model to the screen.
VI. Conclusion
Data visualization reveals the hidden rules behind the data to the audience in an intuitive and highly visual impact way, and conveys the value of the data. Behind the visual effect, the core of data visualization practice depends on the accumulation of massive data and the precipitation of data intelligence technology.
At present, the push heat map is being used in smart cities, population spatial planning, public services and other fields to provide strong data support. In the future, Getui will continue to explore the application of data intelligence technology to various vertical industries, and explore the use of data intelligence to bring about industrial intelligence.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.