In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
How to practice outside the protocol layer of HTTPS? in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
Practice outside the protocol layer
Preface
There are not many articles about https on the Internet, and few of them share the practical experience of deploying https on large Internet sites, and we also have a lot of doubts when we consider deploying https.
This article introduces the practice of Baidu HTTPS and some tradeoffs, hoping to throw a brick to attract jade.
Practical work outside the protocol layer
The reason why the whole site covers https.
Many people who have just come into contact with https will wonder if I can just change the main domain name of the site to https. The answer is no.
The purpose of https is to ensure the security of the transmission process. What if only the primary domain name is on https, but the resources loaded by the primary domain name, such as js,css, are not on https?
In terms of effect, it does not achieve the purpose of ensuring the security of the website transmission process, because your js,css, images still have the possibility of being hijacked, if the content is tampered with / sniffed, then the meaning of https will be lost.
Browsers have long considered such a situation in the design, there will be corresponding hints. The specific implementation depends on the browser, such as the address bar lock mark changes from green to yellow, block the request, or directly pop up prompts that have a great impact on the user experience (mainly IE), users will feel bored, confused and worried about security.
Many users see the habitual "yes" point of this link, so that non-https resources are disabled from loading. Many non-ie browsers also prevent loading of more harmful non-https resources (such as js). We find that the restrictions on mobile browsers will be slightly looser at the moment.
So if it is not done well here, in many cases even the basic functions of the website can not be used normally.
The difference between sites
When many people first came into contact with https, they thought it was just to deploy the certificate and let webserver support https.
In fact, the deployment and difficulty of https vary greatly from site to site. For a large site, letting webserver support https and making some optimizations to webserver on https protocol features may only account for 20% and 40% of the migration work.
Let's consider the following scenarios for deploying https.
A simple personal site
Simple definition: resources are only loaded from the primary domain or subdomain of the primary domain of this site.
For example, axyz's personal blog, the domain name is axyzblog.com. Load the js and pictures under the primary domain name.
When such a station deploys https, if you already have a certificate and webserver supports it, you only need to replace the primary domain name with https access, and then change the resource connection to https:// or / /.
Complex personal sites
Complex definition: resources need to be loaded from an external domain name.
This is more troublesome, the main domain resources are easy to adapt to https, and the resources loaded on cdn also need cdn service providers to support https. At present, the major cdn service providers are gradually providing https support, friends who need to migrate can see whether their own cdn provides this capability. Some cdn will charge extra for https traffic.
Common scenarios for Cdn to use https are:
The master of the website provides the private key to cdn, and the origin-pull uses http.
Cdn uses a public domain name, a public certificate, so the domain name of the resource cannot be customized. Origin-pull uses http.
Only dynamic acceleration is provided, and cdn acts as a tcp proxy without caching content.
CloudFlare provides Keyless SSL services to support sites that do not want to provide private keys and do not want to use public domain names and certificates but need to use cdn.
A simple large site
Simple definition: resources are only loaded from the main domain of this site, subdomains of the main domain, or self-built / controllable cdn domain names, with almost no third-party resources. If this is the nature of the site itself, or if you are willing to transform it into such a type, it is relatively easy to deploy https. Google Twitter is a very good example. Pros: with a site like this, it's easier to replace https. Disadvantages: if you need to change, then you need a lot of determination, after all, it is almost impossible to use a variety of third-party resources.
Large sites with complex and less important access speed
Complex definition: from the non-primary domain of this site, or the domain name of a third-party site, there are a large number of third-party resources to load, often appear in some platform classes, or sites with complex content presentation.
Access speed requirements: users stay for a long time or strong demand, users have a high degree of tolerance to access speed. Such as portals, videos, online transactions (such as train tickets, air tickets mall) sites.
Such sites can strive to promote all relevant domain names to be upgraded to support https. Let's use the following figure to illustrate how such a change can change the link to a website.
The team responsible for traffic access will transform the controllable access environment to support both http and https, so that the front-end engineering work is relatively less. Most of the time, just replace the link from http:// to / /. If the primary domain name is https, other resources can be loaded automatically from the https protocol. What about some third-party resources? Generally speaking, there are only two options: one is to migrate to your own cdn or idc, and the other is to force third parties to support https.
Take the facebook connected to the https of the whole station as an example. Third-party vendors want to launch a game on facebook. Facebook: please provide https access. The third party thought: if you can make money, you'd better provide https access. Therefore, if it is strong and attractive enough, and the partner also has the ability to provide https, it is completely feasible. If your platform is connected to individual developers and doesn't make much money, it won't work.
Advantages: the front-end changes are relatively simple, and it is not easy to have the resource problem of http under https.
Disadvantages: usually under this implementation, the access speed of users will slow down, for example, from 5 seconds to 3 seconds, such as the above reasons, users can still accept. High requirements for third parties.
Complex, large sites with strict access speed
Complex definition: ditto.
Access speed requirements: stay time is not long, users' psychological expectation of access speed is high. But if users use the site as a tool and need you to give a quick response, this implementation is not good. In the next few parts, we will introduce these optimization choices.
The choice of domain name
The influence of domain name on access speed has two sides: more domain names, more time for domain name resolution and connection; less domain names, but not enough download concurrency.
The time cost of rebuilding a connection under https is higher than that of http. For the simple large sites mentioned above, you can meet the demand with only 1-3 domain names. For search engines like Baidu, which are rich in presentation styles, the page may display too many kinds of resources. While different types of resources are services provided by different domain names (different products or third-party products), a new word search may need to re-establish ssl links to some resources, which will make users feel stuttered.
If the domain name is limited to a limited range (usually about 2-6), maintaining the connection with these domain names, merging some data, and having spdy,http2.0 to ensure concurrency can meet our needs. Our current situation is: Baidu search has hundreds of resource domain names loading all kinds of resources. This becomes how to solve the problem of how to provide hundreds of domain name services with 2-6 limited domain names, which involves the next section, proxy access and cdn.
Agent access
When domain names are reduced from hundreds to single digits, it is inevitable to talk about unified access, traffic forwarding and scheduling. Most of the site resources are loaded from the main domain name + cdn, so we can divide the domain name into these two categories and replace them.
Several cdn domain names after replacement all point to the same cname, which means that the way the user accesses it becomes as follows.
In this way, the handshake of ssl is only carried out between the user and the two types of nodes, it is relatively easy to maintain the connection, and it is not necessary for every domain name to apply for a certificate and deploy https access.
This method will encounter a series of problems, such as domain name conversion, data transmission, traffic scheduling and so on, which requires overall design and architecture, optimizes many details, and has a lot of investment in operation and R & D.
Ideal way: this only needs https handshake with cdn nodes, which greatly shortens the rtt time of handshakes (cdn nodes are generally widely distributed close to users, while primary domain nodes are generally limited). This deployment will have higher requirements for cdn's operation and R & D capabilities.
Have you found that such access turns a complex site into a simple site?
Connection multiplexing
The connection reuse rate can be divided into different levels, such as tcp and ssl, which need to be analyzed and counted separately.
The significance of connection reuse
The HTTP Protocol (RFC2616) stipulates that no more than 2 TCP connections can be established in a domain name. However, with the development of the Internet, there are more and more elements in a web page, and the transmission content is getting larger and bigger. the limit of two connections in one domain name has been far from being able to meet the current demand of web page loading speed.
Currently, no browsers comply with this rule. The number of TCP connections established by each browser for a single domain name is as follows:
Table 1 maximum number of concurrent connections established by a single domain name in a browser
As can be seen from the table above, the number of connections to a single domain name is basically 6. Therefore, the number of concurrent connections can only be increased by adding domain names. In the HTTP scenario, there is nothing wrong with this approach. However, under HTTPS connections, because the cost of establishing TLS connections is relatively high, increasing the number of concurrent connections itself will bring greater delay, so the number of domain names needs to be carefully controlled.
In particular, HTTP2 is about to be used on a large scale, and the biggest feature of HTTP2 is multiplexing. The use of multiple domain names and multiple connections can not effectively play the characteristics of multiplexing and compression.
Under the HTTPS agreement, how many domain names should there be on a web page? This is inconclusive, depending on the number of elements that need to be loaded on the page.
Pre-built connection
Since the impact of handshake on speed cannot be reduced from a protocol point of view, can a connection be established in advance to reduce the user-perceived handshake delay? Of course you can. The idea is to predict the current user's next access to URL and establish a connection in advance. When the user initiates a real request, the TCP and TLS handshakes have been completed, and only the application layer data needs to be sent on the connection.
The simplest and most effective way is to pre-build the connection under the main domain, by requesting some static resources. However, it is not easy to go to the extreme, because which connection is used and how much concurrency is controlled by the browser. For example, when you request a picture for a domain name, the browser establishes two connections, and then requests another picture, the browser will most likely be able to reuse the connection, but when domain a needs to load 10 images, the browser will probably create a new connection.
The influence of Spdy
Spdy is very effective in improving the reuse rate of connections because it supports concurrent requests on connections, so browsers try to keep reuse on this link.
Other
You can also try some other ways to make browsers establish https connections before visiting your website, so that session can be reused. HSTS can also effectively reduce jump time, but for complex websites, there are a lot of issues to consider.
Effect of optimization
From Baidu's optimization experience, if you do not open HSTS, users directly access the main domain name in the browser, and then jump to HTTPS through 302. The average time increase will be 400 msmiles, of which 302 jump and ssl handshake account for half. But for subsequent requests, we are almost unaware of the vast majority of users.
There is still a lot of room for optimization in this 400ms +, and we will continue to optimize the user experience.
Some common problems encountered in HTTPS migration.
Transfer Referrer
We can replace our website with https, but most sites have external chain, so it is not realistic to make external chain https at present. Many websites need to determine the source of traffic from referrer, so for sites such as search engines, the transmission of referer is still more important. If you do not make any settings, you will find that clicking on the outer link in the https site does not bring referrer into the header of the http request (http://tools.ietf.org/html/rfc7231#section-5.5.2). Modern browsers can use meta tags to pass refer. (http://w3c.github.io/webappsec/specs/referrer-policy)
Pass only the site, without paths and parameters, and so on.
What do we do for browsers that don't support meta passing referrer, such as IE8?
We can use the method of jumping again. Since we cannot pass referer to HTTP under HTTPS, we can first access a controllable http site from HTTPS, put the content to be passed into the url of this http site, and then jump to the destination address.
Form submission
Sometimes the form needs to be submitted to a third-party site, and the third-party site is the address of the http, and the browser will have an unsafe warning. You can take a similar logic to the jump pass of referrer.
But this is not a perfect solution for content such as referer and form, because it still increases insecurity (hijacking, privacy disclosure, etc.). Ideally, users need to upgrade browsers that meet the latest specifications and promote the migration of more sites to https.
Video playback
To put it simply, if you use http's protocol to play videos, browsers will still have unsafe prompts. So you have two options: 1 let the video source provide https. 2 use a protocol that is not http, such as rtmp.
User exception
In the process of https migration, there will be a lot of enthusiastic users feedback to us on the various problems encountered. The common situations are as follows:
The user's system time is set incorrectly, which prompts the certificate to expire.
Users use agents such as fiddler for debugging, but do not add the root certificate of these software, resulting in a prompt that the certificate is illegal.
The Dns used by users sets dns for public dns or cross-network, and some requests are blocked by operators as cross-network traffic.
There is a problem with connectivity. We find that a small operator has a very high https failure rate, and we can't contact them, so we don't have to convert them to https.
Slow. Sometimes, due to the network environment, it is slow for users to open other websites. Any website in ping needs 500-2000ms. At this time, https will naturally be very slow.
For large and complex websites, the deployment of HTTPS has a lot of work to do.
In the face of difficulties and challenges, there is plenty of motivation to support us: after the launch of https, abnormal user functions caused by hijacking and other reasons have greatly reduced the feedback on privacy disclosure.
Enthusiastic users often give us feedback on all kinds of problems they encounter. In the past, sometimes even if we determined that it was a hijacking problem, the solution to the problem was very limited. Whenever this kind of time, I always have some sense of powerlessness.
The site-wide deployment of HTTPS provides us with options that can solve most of the problems. It is the best harvest for a technologist to see that his efforts have solved the user's problems.
HTTPS is not as difficult or scary as you might imagine, but it is not optimized. Share with you.
This is the end of the answer to the practical questions about how to carry out HTTPS outside the protocol layer. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.