Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the actual combat of Spark3 big data dealing with Streaming+Structured Streaming in real time

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

How to analyze the actual combat of Spark3 big data dealing with Streaming+Structured Streaming in real time, I believe that many inexperienced people do not know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Spark3 big data real-time processing-Streaming+Structured Streaming actual combat requests simulation login 123061, opening chat

Yesterday afternoon, when I was going to look for high-speed rail tickets at random, I suddenly became interested in the login of 12306, so I studied it and explained two points first:

Verification code

This part calls an off-the-shelf API of a big guy. If you are interested in identifying the CAPTCHA part, I'm sorry I can't help you.

Login form

In fact, the 12306 login form is very simple, and it is basically not difficult to have few JS codes in the fields, so the sister who wants to see the reverse of the high-level JS may be disappointed.

Since there are no difficulties, then someone may ask why there is still an article.

The reason is that there are still many holes in the login process in 12306. Although there are few request fields for a single page, there are more than a dozen request pages for a login request. In this process, you can set Cookie automatically and manually. If you foolishly only request to login to url, it will always fail (don't ask me how I know! ), wasting me several hours.

2. Prepare tools

Browser developer tools

Bag grabbing tools such as Fiddler or Charles

It is recommended to use the package grabbing tool, because the console is not very intuitive when there are too many requests, and the console is mainly used for JS debugging, although debugging is basically not needed to log in 12306.

3. Request analysis

First of all, run the package capture tool to monitor, and manually log in once in the browser to open the home page of 12306. After the login is successful, you can capture the package to monitor. I use Charles here.

There are many results displayed, but don't be afraid, most of them are static files, except for static files, there are only 15 or 16 other related requests.

At first, I was blamed by N login failures. I always felt that 12306 of the requests were buried in the request, so I actually wrote all the way according to these more than a dozen requests.

Later, it turns out that there is no need to do this, just pay attention to the Cookie field sources in the main login requests and then specifically issue the request, and it turns out that it only takes six requests to log in successfully.

So in this article I will start to analyze several key login requests without repeating the pain of the groping process.

3.1 get the CAPTCHA

The URL that generates the CAPTCHA is easy to find.

Https://kyfw.12306.cn/passport/captcha/captcha-image64?login_site=E&module=login&rand=sjrand&1589797925252&callback=jQuery191020097810026343454_1589797924079&_=1589797924080

Request field

The response image is the base64 verification code image, which can be obtained from here.

To focus on Cookie, although there are a bit many fields, you only need to pay attention to the fields manually set using JS, because Session will help us deal with the Cookie in the response header. The key point is to determine which fields are not set through the response header, and you can search the fields directly, such as the first _ passport_session.

This field appears in a JS file, which indicates that it was added manually and needs to be paid attention to. Since the field added manually is either generated by JS calculation or server response, you can find the field value first.

First of all, the analysis response has three fields: exp, cookieCode and dfp. After comparison, exp and dfp are the RAIL_EXPIRATION and RAIL_DEVICEID in the CAPTCHA request Cookie.

Second, look at the request itself:

Https://kyfw.12306.cn/otn/HttpZF/logdevice?algID=y6fvmhGlLP&hashCode=WRXET1wCtYsDWujgBBDiq2A4aqJOy-G6t5VK5OI0wNY&FMQw=0&q4f3=zh-CN&VPIf=1&custID=133&VEek=unknown&dzuS=0&yD16=0&EOQP=c227b88b01f5c513710d4b9f16a5ce52&jp76=52d67b2a5aa5e031084733d5006cc664&hAqN=MacIntel&platform=WEB&ks0Q=d22ca0b81584fbea62237b14bd04c866&TeRS=709x1280&tOHY=24xx800x1280&Fvje=i1l1o1s1&q5aJ=-8&wNLf=99115dfb07133750ba677d055874de87&0aew=Mozilla/5.0 (Macintosh; Intel Mac OS X 10: 14) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36&E3gR=92271eade53193a7130e280652b8e939×tamp=1589810100160

Although there are many request parameters, the browser can directly respond to the results, so ignore it for the time being. In fact, the generation process of these parameters is a bit interesting. In two days, I will write an article specifically analyzing these JS parameters. Now we can directly use the results obtained from the visit to pass.

Back to the request cookie for obtaining the verification code, after analysis, it is found that all the fields except RAIL_EXPIRATION and RAIL_DEVICEID are set in the response header, so we only need to make two requests in the step of obtaining the verification code image:

Import reimport requestsheaders = {'User-Agent':' Mozilla/5.0 (Macintosh Intel Mac OS X 10: 14: 6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36','Referer': 'https://kyfw.12306.cn/otn/resources/login.html','Host':' kyfw.12306.cn'} session = requests.Session () # p1 these fields are basically to obtain device information and then encrypt it p1 = {'algID':' y6fvmhGlLPoperations. 'FMQw': 0quotation q4f3jp76jp76jp76:' 52d67b2a5a5a5e031084733d5006c664fbea6237b14bd04c884fbea62237b14bd04c864fbea62237b14bd04c8664fbea62237b14bd04c8664fbea62237b14bd04c866fbea62237b14bd04c866fbea622ca0b884fbea62237b14bd04c866fbea62237b14bd04c866 'wNLf':' 99115dfb07133750ba677d055874de874de877pendant: headers ['User-Agent'],' E3gRipe: '92271eade53193a7130e280652b8e939The exp and dfpr1 = session.get (' https://kyfw.12306.cn/otn/HttpZF/logdevice', params=p1, headers=headers) exp = re.search (r'exp ":" (\ d +) ",''for the first time R1.text) .group (1) dfp = re.search (r'dfp ":" (. +?)', r1.text) .group (1) cookieCode = re.search (r'cookieCode ":" (. +?)', r1.text) .group (1) # this field is temporarily uncertain about the role session.cookies.update ({'RAIL_DEVICEID': dfp) 'RAIL_EXPIRATION': exp}) # add cookie# manually and get the verification code for the second request:' login','rand': 'sjrand',str (int (time.time () * 1000)):''} R2 = session.get ('https://kyfw.12306.cn/passport/captcha/captcha-image64', params=p2) image = re.search (r'image ": (. +?)",' R2.text) .group (1)

The CAPTCHA picture was successfully obtained.

3.2 submit verification results

First click casually on the CAPTCHA picture, and then right-click to check the elements

It is found that when a point is marked on the picture, the page will generate a div tag. It is easy to infer that randcode is to submit the answer, and its value is the plane location point where the picture is located. We can randomly take one of the eight pictures as the mapping of the first picture, such as

Position = {1:'49, 48, 2: '124, 52, 3: 200, 43, 4:' 259, 47, 48, 5: 50113,6: 101102, 7: 198112' 8: '250127'} def getVerifyResult (path: str): "" call the API API to get the verification code result: param path: verification code image path: return: "url =" http://littlebigluo.qicp.net:47720/"ret = [] # send a post request to bring the image data to file = open (path,' rb') response = requests.post (url, data= {"type": "1",} Files= {'pic_xxfile': file}) file.close () # returns the recognition result for i in re.findall ((. *), response.text) [0] .split (""): ret.append (position [int (I)]) return ret

Then call the recognition function above to get the answer.

Import base64code_path = "code.jpg" imgdata = base64.b64decode (image) with open (code_path, 'wb') as f: f.write (imgdata) capchat = getVerifyResult (code_path) answer =', '.join (capchat)

Cookie and response are old things that don't need to be paid attention to.

After reading the above, have you mastered how to analyze the actual combat method of Spark3 big data dealing with Streaming+Structured Streaming in real time? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report