Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Knowledge summary of cloud computing interview questions and explanation of cloud computing interview experience

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Cloud computing job interview is actually not as complicated as many people think, mainly telephone interview, estimated to be less people interviewed, simple asked some technical questions, asked some business docking questions in the first round, technical side, asked the three levels of cloud computing, cloud computing development now, business side, asked how to effectively carry out business docking; The second round, mainly asked what projects have been done, how to do the project, the following to share a few practical cloud computing interview questions knowledge.

1. Massive log data, extract the IP with the most visits to Baidu on a certain day.

IP is 32 bits, with a maximum of 2^32 IPs. The same mapping method can be used, such as modulo 1000, mapping the whole large file into 1000 small files, and then finding the IP with the largest frequency in each small file (you can use hash_map for frequency statistics, and then find out the largest frequency) and the corresponding frequency. Then, among the 1000 largest IPs, find the IP with the largest frequency, which is what you want.

The search engine will record all the search strings used by the user every time through the log file, and the length of each query string is 1-255 bytes.

Suppose there are currently 10 million records (these query strings have a high degree of duplication, although the total number is 10 million, but if you remove duplicates, there are no more than 3 million records). The higher the repetition of a query string, the more users query it, that is, the more popular.), Please count the 10 most popular query strings, requiring that the memory used should not exceed 1G.

The first step is to use hash statistics for preprocessing: first preprocess this batch of massive data (maintain a Key for Query string, Value for the number of occurrences of the Query, i.e. Hashmap(Query, Value), read a Query each time, if the string is not in Table, then add the string, and set the Value value to 1; if the string is in Table, then add the count of the string by one. Finally, we complete the statistics with Hash table in O(N) time complexity (N is 10 million, because we have to traverse the entire array to count the number of times each query occurs);

The second step borrows heap sort to find the top 10 query strings: time complexity N'*logK. Maintain a small root heap of size K(10 in this topic), and then traverse 3 million queries, compare them with the root elements (compare the value of value), and find the 10 queries with the largest value.

The final time complexity is O (N)+ N'*O (logK), where N is 10 million and N' is 3 million.

Or: trie tree is used, keyword field stores the number of times the query string appears, and no occurrence is 0. Finally, the frequency of occurrence is sorted by the smallest push of 10 elements.

3, there is a 1G size of a file, each line is a word, the size of the word does not exceed 16 bytes, the memory limit size is 1M. Returns the 100 words with the highest frequency.

The first step is to divide and conquer/hash map to sequential read files. For each word x, take hash(x)%5000, and then save to 5000 small files according to this value (note x0, x1,...). x4999). Each file is approximately 200k. If some of the files exceed 1M in size, you can continue to divide them in a similar way until the size of the small files obtained by decomposition does not exceed 1M.

The second step hash statistics for each small file, statistics of the words appearing in each file and the corresponding frequency (can use trie tree/hash_map, etc.), and take out the largest frequency of 100 words (can use the smallest heap containing 100 nodes), and 100 words and corresponding frequency stored in the file, so that another 5000 files.

The third step heap/merge sort is the process of merging these 5000 files (heap sort can also be used). (If memory allows you to merge all the elements in these 5000 files, use the heap to get top 100)

4. Given two files, a and b, each storing 5 billion urls, each url accounting for 64 bytes, and the memory limit is 4G, let you find the url common to files a and b?

It can be estimated that the size of each file is 5G×64= 320G, which is much larger than the memory limit of 4G. So it's impossible to load it completely into memory. Consider a divide-and-conquer approach.

Traverse file a, find hash(url)00 for each url, and then store the url to 1000 small files according to the obtained value (note a0, a1,..., A999). This makes each small file about 300 megabytes.

Traverse file b, store url to 1000 small files (marked as b0, b1,..., b999)。After this processing, all possible identical urls are in the corresponding small files (a0vsb0, a1vsb1,..., A999 vs. SB999), it is impossible for small files that do not correspond to the same URL. Then we just ask for 1000 pairs of the same url in the small file.

To find the same url in each pair of small files, you can store the url of one of the small files in hash_set. Then traverse each url of another small file to see if it is in the hash_set just constructed. If so, then it is a common url, which can be stored in the file.

5. Tencent interview question: Give 4 billion unsigned int integers that are not repeated, and then give a number, how to quickly determine whether this number is among the 4 billion numbers?

Scenario 1: Request 512 MB of memory (2^32/8=512MB), where one bit represents an unsigned int value. Read in 4 billion numbers, set the corresponding bit, read in the number to be queried, check whether the corresponding bit is 1, 1 means existence, 0 means non-existence.

Scenario 2: Since 2^32 is more than 4 billion, a given number may or may not be in it; here we represent each of the 4 billion numbers in 32-bit binary, assuming that the 4 billion numbers begin in a file.

Then divide the 4 billion into two categories: 1. The highest bit is 0 2. The highest bit is 1.

And write these two categories into two files respectively, where the number of numbers in one file = 2 billion (which is equivalent to half); compare with the highest digit of the number to be searched and then enter the corresponding file to search again.

Then divide this file into two categories: 1. The next highest bit is 0 2. The next highest bit is 1.

And write these two categories into two files respectively, where the number of numbers in one file = 1 billion (which is equivalent to half); compare with the next highest bit of the number to be found and then enter the corresponding file to search again. ....... And so on and so forth, we can find it in O(logn) time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report