In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail the example analysis that can be read from JSON to Python in JavaScript. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.
Problem reproduction
The data is passed through JS code in a rough format (for example only, it is easy to view the hierarchy, actually on the same line):
Function (a, b, c, d) {return {title: a, data: [{data: B}, {data: C}, {data: d}]}} ('title', 2,3 4)
What I want to extract is the entire data in JSON format.
If you want to extract directly, you can use
Re.findall ('return (\ {. *?\})\}\ (', content)
Get the result, but if you want to parse the data, the following error will be reported:
The key in the key-value pair needs to be enclosed in double quotes.
So if we want to complete the task, we need to solve the following problems:
The key needs to be included with "".
The formal parameters a, b, c, d need to be converted to the actual parameters "title", 2, 3, 4.
Solution.
The corresponding relationship of shape participation parameters is easy to solve, so solve this problem first.
Shape participates in the corresponding relation of actual parameters
You can use the following code to get formal parameters.
'' .join (re.findall ('function\ ((. *?)\)\ {, content)) .split (',')
Use the following code to get the actual parameters.
'.join (re.findall ('}\ ((. *?)\), content)) .join (',')
Because the number of formal parameters is the same as the actual number of parameters, the corresponding relationship can be established according to the list index, and the list can be merged horizontally using np.c_.
The code is as follows:
# get the actual parameter Argument = '.join (re.findall ('}\ ((. *)\), content)). Split (',) # obtain the formal parameter Formal_parameter = '.join (re.findall (' function\ ((. *))\)\ {', content). Split (',) # establish the correspondence mapping = join (np.c_ [Formal_parameter, Argument]) # use the formal parameter as the index mapping.set_index (0 Drop=True, inplace=True)
The results are as follows:
Format JSON
To solve this problem, we need to extract the JSON string first, as follows:
String = '.join (re.findall (' return (\ {. *?\})\}\ (', content)
The results are as follows:
Then, we need to have an idea, as follows:
{or, after,: the previous part is the key and needs to be enclosed in double quotation marks.
:, [or {after,], the previous parts are all values and need to be identified and replaced.
Enclose the key in double quotation marks
Because only lists can handle this task in Python when it comes to inserting elements, we need to convert the string to a list first, as follows:
String = list ('.join (re.findall (' return (\ {. *?\})\}\ (', content)
Then, we need to set a variable to set 1 when it is identified as a key, otherwise 0.
Key_flag = 0
If we identify the key according to the rules above, we should start from the current bit until the previous part is enclosed in double quotes.
There are also some special circumstances, such as
Nested dictionaries, such as a list where the values are all dictionaries.
Empty dictionary, {followed by}.
Considering the special circumstances, the code is as follows:
Index = 0while True: # if the index is out of range, jump out of the loop if index > = len (string): break # enclose the key in double quotes if key_flag: # insert a double quotation mark string.insert (index,'') index + = 1 # read while True: # in the loop until ":" appears The part read in the loop is the key # add a double quotation mark if string [index] =':': string.insert (index) at the end of the key '") # reset key_flag key_flag = 0 # end loop break # read the next bit index + = 1 # the current character is" {"or", " If the key if string [index] in'{,': key_flag = 1 # is nested, the index will be moved to the next bit if string [index+1] in'{': index+ = 1 # if the dictionary is empty, reset key_value if string [index+1] in'}': key_flag = 0 index+ = 1
The result is as follows (the code cannot be truncated due to space constraints):
As you can see, all the keys have been enclosed in double quotes.
Identify and replace values
This part is still a little difficult.
First of all, as above, we still need a variable to record the status of the current identified value. 1 means recognized and 0 means no.
Value_flag = 0
But there are also special circumstances:
The value is already a string, but there is: in the string.
The value is the js statement, but it is not the data we want to extract.
Considering the special circumstances, the code is as follows:
While True: if index > = len (string): break # detected the value if value_flag: # take out the current character value = string.pop (index) # if the character is a number, "[" and "{" or the string if value in'"1234567890 [{'or is_value: value_flag = 0 string.insert (index)" Value) index + = is_value # does not match the above situation else: # Loop fetch value string while True: # if it is "," or "}" Then it means that if string [index] in',}': break value + = string.pop (index) # if the value string is in the corresponding relationship, replace if value in mapping.index: # because it is constantly inserted in the current bit So to reverse the data trans = mapping.loc [value] [::-1] # if not in Then directly replace it with the empty string else: trans ='""'# to calculate the index to move several bits length = len (trans) # to insert the corresponding data for c in trans: string.insert (index C) # Index mobile index + = length value_flag = 0 continue # if ":" and the ":" is not in the value string if string [index] in': 'and not is_value: value_flag = 1 # if the value is a string Then set is_value if string [index+1] in'"': is_value = 1 # identify the end of the value string, and reset is_value elif string [index] in'" 'and is_value: is_value = 0 index+ = 1 "
The results are as follows:
As you can see, the conversion is quite successful.
The total code content = 'function (Areco brecedicjd) {return {title:a,data: [{data:b}, {data:c}, {data:d}]} ("title", 2)' 'string = list (' '.join (re.findall (' return (\ {. *?\})\}\)\ (' Content)) is_value = 0key_flag = 0value_flag = 0index = 0while True: if index > = len (string): break if key_flag: string.insert (index,'") index + = 1 while True: if string [index] = =':': string.insert (index) '"') key_flag = 0 break index + = 1 elif value_flag: value = string.pop (index) if value in'" 1234567890 [{'or is_value: value_flag = 0 string.insert (index Value) index + = is_value else: storage = index while True: if string [index] in' }': break value + = string.pop (index) if value in mapping.index: trans = mapping.loc [value] [::-1] else: trans ='"'" length = len (trans) for c in trans: string.insert (index C) index + = length value_flag = 0 continue if string [index] in'{ ': key_flag = 1 if string [index+1] in' {': index+ = 1 if string [index+1] in'}': key_flag = 0 elif string [index] in': 'and not is_value: value_flag = 1 if string [index+1] in' "': is_value = 1 elif string [index] in'" 'and is _ value: is_value = 0 index + = 1'. Join (string)
The results are as follows:
At this point, you can use json.loads to extract data.
The results are as follows:
Deficiency
In fact, the above code is inadequate. Because this js code is special, it is on one line, and there are no extra spaces.
But there is a solution:
What if there is a space? -> use strip to remove the spaces on both sides when extracting formal parameters and using them to convert actual parameters.
What if the code is not on one line? -> use .replace ('\ tasking,''). Replace ('\ nmarker,'') to remove line feeds and tabs before formatting.
In fact, if it is just the problem that the code is not on one line, js.loads will help us remove\ nand directly extract the important parts.
However, if you want to use my code, only one-line js code is currently supported.
In addition, some set the value of the key-value pair to a logical expression, such as a | ", which is not easy to extract and has to be adjusted according to the problem.
These are just ideas, and you can try them on your own, and if you have any questions, you can raise them in time.
This is the end of the article on "sample Analysis of JSON in JavaScript to Python readable". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.