In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to design a short link generation system with Java". The explanation in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian slowly and deeply to study and learn "how to design a short link generation system with Java" together!
introduction
I believe that everyone will receive a lot of short messages in their lives, especially during the recent Double Eleven activities, and those short messages have two characteristics. The first is almost all spam messages, which can be ignored here. The second characteristic is that the link is very short, such as the following:
We know that some text messages have word limits. It is not appropriate to put a link full of various parameters directly. Another point is that you do not want to expose parameters. The benefits are as follows:
Links that are too long are easily limited in length
Short links look simple, long links look easy to confuse
Safe, don't want to expose parameters
Can unify link conversion, of course, can also achieve statistics click times and other operations
What is the principle behind that? How did that happen? How would you design such a system? [From an interviewer at a goose farm]
Principles of Short Links
Logic of short link presentation
The most important knowledge point here is redirection. Let's review the http status code first:
1** The server received the request and needs the requester to continue the operation 2** Success, the operation was successfully received and processed 3** Redirection, further action is required to complete the request 4** Client error, the request contains syntax errors or could not be completed 5** Server error, the server encountered an error processing the request
Then status codes starting with 3 are all about redirection:
300: Multiple options, can exist in multiple locations
301: Permanent redirect, browser caches, automatically redirects to new address
302: Temporary redirect, client will continue to use old URL
303: See other addresses, similar to 301
304: Not modified. The requested resource has not been modified, and when the server returns this status code, no resource is returned.
305: Proxy required to access resource
306: Obsolete status code
307: Temporary redirect, use Get request redirect
The whole jump process:
1. User visits short link, request reaches server
2. The server replaces the short link with a long link and returns the redirected status code 301/302 to the browser.
301 Permanent redirection will cause the browser cache to redirect the address, and the short link system will count the number of visits incorrectly
302 Temporary redirection can solve the problem of inaccurate times, but each time it will be converted to a short link system, and the server pressure will increase.
3. The browser gets the redirected status code, and the address that really needs to be accessed, and redirects to the real long link.
As can be seen from the following figure, the link is indeed redirected to the new address by 302. There is a field Location in the returned header, which is the address to be redirected:
How are short links designed?
global marker
Certainly the first thing we think of is compression, like file compression, decompression back to the original link, redirection back to the original link, but unfortunately, this doesn't work, have you ever seen compression that can compress such a long number directly to such a short one? Actually impossible. Just like Huffman tree, it can only compress strings with more repeated characters efficiently. Like links, it may have many parameters, and various irregular situations exist. Direct compression algorithm is unrealistic.
What about https://dx.10086.cn/tzHLFw and https://gd.10086.cn/gmccapp/webpage/payPhonemoney/index.html? What is the change between channel=? The front path is unchanged, and the change is the back, that is, tzHLFw and gmccapp/webpage/payPhonemoney/index.html? channel= conversion between.
In fact, it is also very simple. It is a piece of data in the database, and an id corresponds to a long link (equivalent to a global marker, a globally unique ID):
idurl1gd.10086.cn/gmccapp/web…
The one used here, that is, the distributed global unique ID we mentioned earlier, if we directly use id as a parameter, it seems that it can also be: https://dx.10086.cn/1, when visiting this link, go to the database to query to get the real url, and then redirect.
The unique ID of a single machine is very simple. It can be used with atomic class AtomicLong, but distributed ID is not good. Simple point can be redis, or database self-increment, or Zookeeper and the like can be considered.
id conversion strategy
But there are two disadvantages to using increasing numbers:
When the numbers are big, they're still long.
Increasing numbers, unsafe, too regular
Obviously, the links we usually see are not numerical, usually upper and lower case letters plus numbers. In order to shorten the length of the link, we must convert the id, for example, our short link consists of a-z,A-Z,0-9, which is equivalent to 62 digits, and convert the id to 62 digits:
public class ShortUrl { private static final String BASE = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; public static String toBase62(long num) { StringBuilder result = new StringBuilder(); do { int i = (int) (num % 62); result.append(BASE.charAt(i)); num /= 62; } while (num > 0); return result.reverse().toString(); } public static long toBase10(String str) { long result = 0; for (int i = 0; i < str.length(); i++) { result = result * 62 + BASE.indexOf(str.charAt(i)); } return result; } public static void main(String[] args) { // tzHLFw System.out.println(toBase10("tzHLFw")); System.out.println(toBase62(27095455234L)); }}
ID to 62-bit key or key to id has been implemented, but the calculation is still more time-consuming, it is better to add a field to save, so the database becomes:
idkeyurl27095455234tzHLFwgd.10086.cn/gmccapp/web…
But this is still very easy to guess the id and key correspondence, if traversed access, it is still very unsafe, if worried, you can randomly scramble the short link character order, or add some randomly generated characters in the appropriate position, such as the first, fourth, and fifth bits are random characters, other positions remain unchanged, as long as we calculate, it corresponds to the relationship stored in the database, we can find the corresponding url through the key of the connection. (Note that the key must be globally unique and must be regenerated if there is a conflict)
Generally, short links have expiration time, so we must also add corresponding fields in the database. When accessing, we must first judge whether they expire, and they will not be redirected after expiration.
performance considerations
If there are a lot of short links exposed, there are a lot of data in the database, this time you can consider using cache optimization, generate the cache by the way, and then read the cache, because the relationship between short links and long links will not be modified, even if modified, it is also a very low frequency thing.
What if the system runs out of id? This probability is very small, if it really happens, you can reuse the old id number that has expired.
What if I get a crazy request for short links that don't exist? In fact, this is cache penetration, cache penetration means that the cache and database do not have data, a large number of requests, such as order number can not be-1, but the user requested a large number of order number-1, because the data does not exist, cache will not exist, all requests will directly penetrate to the database. If used by malicious users, crazy requests for non-existent data will lead to excessive pressure on the database, or even collapse.
In this case, we can generally use Bloom filter to filter out non-existent data requests, but we here id is incremental and orderly, in fact, our range is roughly known, it is easier to judge, beyond the certain does not exist, or when the request arrives, put an empty object in the cache is also no problem.
Thank you for reading, the above is "how to use Java to design a short link generation system" content, after the study of this article, I believe we have a deeper understanding of how to use Java to design a short link generation system, the specific use of the situation also needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.