Example Analysis of separated Architecture of Picture Server 04/15 Update SLTechnology News&Howtos

Example Analysis of separated Architecture of Picture Server

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

The editor would like to share with you an example analysis of the separation architecture of the picture server. I believe most people don't know much about it, so share this article for your reference. I hope you will learn a lot after reading this article. Let's learn about it!

1. Introduction

At present, a large number of pictures are used on many websites, and pictures account for the main amount of data in web page transmission, and it is also the main factor affecting the performance of the website. Therefore, many websites will separate the image storage from the website, construct one or more servers to store the pictures, put the pictures in a virtual directory, and the pictures on the web page all use a URL address to point to the address of the pictures on these servers, so the performance of the website will be significantly improved, and the concept of picture server (ImageServer) will come into being.

1.1 advantages of Picture Server

1. Share the load of the Web server-separate the resource-consuming image services and improve the performance and stability of the server.

2, can specially optimize the picture server-set up a targeted cache scheme for the picture service, reduce the bandwidth cost and improve the access speed.

3, improve the scalability of the website-improve the image handling capacity by adding a picture server.

1.2 points for attention of the picture server

1. Select the physical media and file system suitable for image storage

2. Use a physically independent server

3. If you have multiple picture servers, consider the problem of picture synchronization between the servers.

4. Use a separate domain name

5. Formulate a reasonable caching strategy

6. Use the picture processing module to reprocess the pictures uploaded by the user.

1.3 Architecture of the picture server

Picture is an essential part of the website, with the continuous development of the website, the processing of pictures will also increase with the growth of visits, the increase of pictures put forward the demand for continuous improvement, in the initial stage of the site, everything is simplified and the location where the pictures exist is usually in the Images folder under the site.

With the increase of access and the increase of IIS pressure, split is started. The image folder is extracted as a separate site. For example, http://images.***.com/( may be split into multiple image servers as needed, depending on the specific business environment. After the split, the pressure of a single IIS application pool is well shared to 2 or more, which greatly increases the access bottleneck. With the further increase of access, the pressure on the server can no longer be supported, so we need to exist the picture site as a stand-alone server. In the process of visiting a picture, we may be faced with the need for a picture to have multiple picture sizes. In the early stage, we usually save pictures of all sizes we need in the process of saving the page, but with the different sizes required, how do we deal with the need for more and more sizes when saving pictures?

The concurrent access to the IIS server means that with the further increase in the number of users, our single image server is no longer enough to cope with, how can we further expand at this time?

As shown in the figure above, we can make a unified solution to these two problems at this time, adding a squid cache server at the front end and adding one or more dynamic image capture servers. Squid or Nginx proxy cache server can greatly improve the concurrent access of the picture system and make the system break through the existing restrictions. The main function of the dynamic image cutting server is to temporarily generate and return the images that meet the requirements for accessing the original images of different sizes. The storage area of the original image can be placed with the picture service, or the picture can be placed on a separate server.

In this structure, the maximum concurrent access restriction will be the system bottleneck of squid or other proxy servers. When the pressure of cut image service increases, you only need to add the corresponding cut image server, and the growth of image storage area can also be solved by adding hard disk or server.

If the number of visits to your site is still growing further and the access bottleneck of squid is about to be broken, how should we deal with it?

As shown in the figure above, multiple Squid or Nginx servers are used, and F5 or LVS load balancing is added at the front end (caching can also be enabled). At this point, the concurrency of access will be greatly increased, and the server can be provisioned at any time according to the situation. Of course, there are some flaws at this time, that is, there may be the same picture on multiple Squid, because when accessing the picture, you may get squid1 for the first time, and visit squid2 or other things for the second time after the expiration of F5. Of course, this small amount of redundancy is completely within the scope of our permission to solve the concurrency problem. After doing a lot of work, if the conditions allow CDN to the image server, it will greatly improve the image access quality of your site.

1.4 Image storage architecture

1.4.1 necessity of deploying a stand-alone picture server

We know that for both Apache and IIS, pictures are always the most resource-consuming. If you put the picture service and the application service on the same server, the application server will easily collapse because of the high Imax O load of pictures, so for some large website projects, it is necessary to separate the picture server from the application server. Deploying independent image servers (or even server clusters) is the most basic solution for image storage on large websites, because only with independent picture servers can we optimize the performance of picture servers more specifically. For example, from a hardware point of view, picture servers can be configured with high-end hard drives, 7200 rpm to 15000 rpm, while CPU can be done as long as it is general. From the software point of view, we can configure a special file system for the picture server to meet the image TFS requests, such as Taobao's TFS, which can well solve the nightmare brought by large-scale small picture files. At the same time, we can also use nginx, squid to agent picture requests and so on.

1.4.2 use a separate domain name

Note that this refers to independent domain names, not subdomains. For example, the yahoo.com image server uses the domain name of yimg.com instead of the second-level domain name img.yahoo.com. Why? Personally, I think the main reasons are as follows:

1. The number of concurrent connections of browsers under the same domain name is limited, usually between 2 and 6. The following figure lists the number of concurrent connections of each browser (the following figure is for reference).

In this way, if we configure a separate domain name for the image server, when loading images on a page, we can break the limit on the number of browser connections, theoretically, add an independent domain name, and double the number of simultaneous connections.

2. Because of cookie, it is disadvantageous to cache.

For example, if there is an image http://www.test.com/img/xx.gif, when we make a request to it, we will bring the cookie under the www.test.com domain name. Since most web cache caches only requests without cookie, each image request fails to hit cache, but still has to go to the original server to obtain images, resulting in little significance in image caching. Therefore, it is better to create a separate picture independent domain name, of course, not only pictures, css and js files can also refer to this idea.

3. Facilitate CDN synchronization

1.4.3 how to upload and synchronize pictures after the separation of the picture server

Of course, everything has two sides, the separation of the picture server certainly improves the efficiency of picture access, and greatly alleviates the bottleneck of the server caused by pictures, but after the separation, the upload and synchronization of pictures has become a big problem. Let's talk about several solutions based on my personal thoughts.

1. NFS sharing method

If you don't want to synchronize all images on every image server, NFS sharing is the easiest and most practical way. NFS is a distributed client / server file system. The essence of NFS lies in the sharing of computers among users. Users can connect to the shared computer and access the files on the shared computer just like accessing the local hard disk.

The specific implementation idea is: the web server mounts the directory from multiple picture servers export through nfs, the user first uploads the picture to the web server, and then copies the uploaded picture to the mount directory through the program, so that the picture server can also access the picture just uploaded (note that it is only shared, and not really copied to the picture server). Bind independent domain names to those image servers, so that the browser can use separate domain names to access images. In this way, there is basically no delay caused by synchronization, but depending on nfs,nfs hanging up will affect the web server. The figure below is as follows

As for how to configure nfs, search for http://so.jb51.net/ on this site.

2. Use FTP to synchronize

Unlike the above nfs, users use ftp to synchronize to various image servers after uploading images. Php, java and asp.net are basically able to operate ftp. In this way, each picture server keeps a copy of the picture, which also plays the role of backup. But the disadvantage is that it is time-consuming to ftp pictures to the server, and there will be a delay if you synchronize them asynchronously, but the general small picture files are fine.

two。 Analysis of URL HASH Architecture of Picture Server

2.1 what is the url hash architecture

The url hash architecture performs a hash algorithm on url, and then finds the corresponding server through the hash result. Because the hash results for a single url are the same, in theory the url will be permanently assigned to a fixed server. In addition, because of the hash algorithm, the distribution of url is very uniform, and the traffic can be balanced at the same time.

2.2 Why use url hash architecture

1. The characteristic of the picture server is that the visit volume is very large, and the second is that the capacity is also very large. Through simple load balancing, the problem of large visit volume can be solved, but the problem of capacity has not been improved. So it will cause disaster recovery problems.

2. Disaster recovery problem: if the data accessed by the system in a certain period of time seriously exceeds the capacity of the minimum stand-alone in the cache cluster, disaster recovery will cause disaster recovery, which will cause a large number of single links to penetrate, which will have a great impact on the IO performance of the background.

3. Although the disaster recovery problem can be solved by increasing the configuration of cache capacity, the memory is always limited, and it is also very expensive to increase the cost of super memory for each machine. In addition, it is not suitable to configure a large disk cache in squid, otherwise the hash table in squid will be very large and the performance will be very poor.

4. Through the hash architecture, the memory of the cache cluster can be fully utilized, and the disaster recovery problem no longer depends on the capacity of the smallest single machine in the cache cluster, but on the sum of the capacity of all the machines in the cache cluster.

2.3 various url hash architectures

1) hash architecture based on dns.

2) automatic hash architecture based on nginx.

3) Manual hash architecture based on nginx.

2.3.1 hash architecture based on dns

Hash architecture diagram of dns

Hash architecture description of dns

This architecture is suitable for user-oriented picture systems, such as uploading pictures in forums, photo albums and blogs. In this way, it can ensure that the file name has a consistent specification.

The architecture map is divided into 36 domain names, and the image file name is based on the md5 value. Taking a letter in the MD5 value indicates which domain name it is in, and the domain name corresponds to the machine. When uploading and distributing, it is also distributed according to this letter.

Advantages and disadvantages of dns's hash architecture

Advantages:

1) using dns shunt, the cost is low, and the dns performance is high, no maintenance is required.

2) the default limit of 2 threads per host of IE can be broken.

Disadvantages:

1) in terms of availability, if a machine is down, the request to that machine cannot be read.

2) in the aspect of shunt, it can only be synchronized, and the cost is high.

3) only applicable to user-oriented systems

2.3.2 automatic manual hash architecture based on nginx

Automatic hash architecture diagram of nginx

Automatic hash architecture description for nginx

1, which is a new cache architecture, with nginx as the front end and proxy to the cache machine.

2. Nginx is followed by a cache group. After url hash, the nginx divides the request to the cache machine.

3. This architecture facilitates the upgrade of pure squid cache, and nginx can be installed on squid machines.

4. Nginx has a cache function, so you can cache some heavily visited links directly on nginx, so you don't have to go through one more proxy request. Such as favicon.ico and logo of the website.

Advantages and disadvantages of automatic hash architecture of nginx

Advantages

1) High performance

2) easy to use, it doesn't matter what the background is.

3) High availability

4) caching architecture, convenient for diversion

5) you can link directly in the nginx proxy cache.

Shortcoming

Due to the weak controllability of url shunting, the increase or decrease of cache machines will cause cache reallocation, which means that all caches are invalid.

Manual hash architecture description for nginx

1. This architecture diagram is the same as the architecture of automatic hash. The only difference is the change of hash algorithm. Automatic hash uses the hash algorithm of the nginx upstream hash module to achieve shunt. This manual architecture is implemented by designing an algorithm ourselves.

2. The design idea of the algorithm is to take a character from the url as the basis for diversion, such as defining the penultimate character of the link to split the stream, which can also be evenly distributed.

3. Manual architecture can avoid cache invalidation caused by adding or subtracting machines in automatic architecture, and you can know exactly which cache a link exists on.

Advantages and disadvantages of manual hash architecture of nginx

Advantages

1) basically inherit the advantages of automatic architecture

2) avoid the problem of adding or decreasing machines

3) know exactly which cache the link is stored on

Shortcoming

The configuration is complex, and it is not easy to distribute the uniform configuration.

Using Hash architecture to optimize bbs architecture

1. The bbs architecture mentioned earlier uses lvs+squid as the front end, so when squidclient updates the cache, it needs to update all squid, which is very inefficient. Using hash architecture can make squidclient only need to clean one squid at a time, which is much more efficient.

2. It is recommended to use the nginx manual hash architecture, which can know exactly on which machine the link will be stored, so that you can configure the exact backup machine.

The Architecture Scheme of 3.nginx Picture Server

Picture service usually has large data capacity and frequent access. In view of this, there are two kinds of problems in picture service, one is storage, the other is traffic.

Storage problem is the problem of hard disk capacity, spending money on hard disk is OK, it seems simple, but it is also the hardest problem. According to the current exploration, the best way is: when there is not enough space on the hard disk at any time, buy a hard disk to plug in, change the configuration at most, and you can use it immediately; in addition, the hard disk should be able to make full use of, otherwise the large amount of image storage plus backup, it is very scary, it is best to use 100% space on each hard disk.

The number of visits is also a big problem. If the service does not allow hotlink protection, then traffic will cause problems such as bandwidth and server pressure. If you have money, just throw it at CDN. If you have no money or have more money, do it yourself. According to the immutable truth that "the older the picture, the fewer visitors", it is divided into two parts, dealing with the latest pictures and the old ones at the same time. The latest picture has a large number of visits, but less storage; the old picture has a low number of visits, but a large amount of storage.

3.1 draw up a storage directory rule

Under the existing hash mode such as / a/b/abcde.jpg, add an extra date to the directory: / 200810/16/a/b/abcde.jpg or / 2008/10/16/a/b/abcde.jpg. After the catalogue rules are made according to the date, the machine can be dismantled by year and month.

3.2 machine, hard disk

According to the previous plan, it is divided into two groups. One group of servers uses lvs as load balancer to take charge of new images; the other group of servers accesses and backs up old images. New picture machine to find a few better servers, SCSI hard disk; the old picture machine does not require too much, PC, find enough hard disk can, now IDE 1T hard disk is not too expensive, it is best to build a raid to save trouble, the most important thing is that these machines are more. As shown below:

Let me explain:

1. The image service uses lvs as the entrance, and its processing capacity is guaranteed.

2. Use nginx to provide direct external services, without the need to use squid.

3. The red line in the figure means that the master nginx will proxy the / 2006 and / 2007 images to the two archiving servers respectively. If you find that the cpu of the master nginx is relatively large, you can consider using nginx's proxy_store to store the images on the master server and clean them up regularly.

4. There is a storage allocation server in the picture, which serves as a unified entrance for the picture service to update pictures. If there are new pictures or modified pictures, this server is responsible for putting the pictures on the correct server.

5. The old image server is currently divided by year. Add two servers every year, or add two hard drives. Be careful, do not trust raid. There must be two machines. Geographically, it is better to divide into two cities.

6. Because the old data for 2006 and 2007 are basically unchanged, if the hard drive is big enough, you can merge the two-year data together.

7. If customized carefully, it is OK for the hard disk of the old image server to be 100% full, and the capacity of the old data basically will not increase significantly, and a small reserve of 1-2G space will be fine.

The above is all the contents of the article "sample Analysis of Image Server Separation Architecture". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.