In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article will explain in detail how to use Python to climb JD.com 's price and title and evaluation and other commodities, the content of the article is of high quality, so the editor will share it for you to do a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
Preface
Code implementation
Import requestsfrom lxml import etreeimport timeimport randomimport pandas as pdimport jsonfrom sqlalchemy import create_enginefrom sqlalchemy.dialects.oracle import DATE,FLOAT,NUMBER,VARCHAR2 import cx_Oracle
Import the packages you need first
Def create_table (table_name): conn = cx_Oracle.connect ('user/password@IP:port/database') cursor = conn.cursor () create_shouji =' 'CREATE TABLE {} (merchandise ID VARCHAR2, price number), store name VARCHAR2, store attribute VARCHAR2, title VARCHAR2, comment NUMBER (19) Excellent comments NUMBER (19)''.format (table_name) cursor.execute (create_shouji) cursor.close () conn.close ()
Build a table
Def mapping_df_types (df_pro): dtypedict = {} for I, j in zip (df_pro.columns, df_pro.dtypes): if "object" >
Define the mapping of a type
Def sava_oracle (df_pro): engine = create_engine ('oracle://user:password@ip:port/database') dtypedict = mapping_df_types (df_pro) df_pro.to_sql ("shouji", con=engine,index=False,if_exists='append',dtype=dtypedict)
Define request headers and request methods
Headers= {'user-agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 Edg/83.0.478.37'} def requesturl (url): session = requests.Session () rep = session.get (url,headers=headers) return rep
Parsing the url of comments
Def commreq (url_comm): dd_commt = pd.DataFrame (columns= ['commodity ID',' review', 'excellent review']) session = requests.Session () rep_comm = session.get (url_comm,headers=headers) comment = json.loads (rep_comm.text) ['CommentsCount'] comment_list = [] for i in comment: comment_list.append ({' commodity ID':str (I ['ProductId'])) 'comments': I ['CommentCount'],' excellent comments': I ['GoodCount']}) dd_commt = dd_commt.append (comment_list) return dd_commt
Subject analysis
Def parse (rep): df = pd.DataFrame (columns= ['ID',' price', 'store name', 'store attribute', 'title']) html = etree.HTML (rep.text) all_pro = html.xpath ("/ / ul [@ class='gl-warp clearfix'] / li") proid =' '.join (html.xpath ("/ / li/@data-sku")) # Commodity Evaluation after url # referenceIds= and before & callback are all id of merchandise. You only need to obtain the id stitching of merchandise in the merchandise list. Comment parsing url_comm = r 'https://club.jd.com/comment/productCommentSummaries.action?referenceIds={}'.format(proid) dd_commt = commreq (url_comm) # 2. Product list information parsing pro_list = [] for product in all_pro: proid =''.join (product.xpath ("@ data-sku")) price =' '.join (product.xpath ("div [@ class='gl-i-wrap'] / / strong/i/text ()")) target =' '.join (product.xpath ("div [@ class='gl-i") -wrap'] / / a/em//text ()) .replace ('\ t\ n') ''). Replace ('\ u2122') shopname = '.join (product.xpath ("div [@ class='gl-i-wrap'] / / span/a/@title")) shoptips = product.xpath ("div [@ class='gl-i-wrap'] / / I [contains (@ class)") 'goods-icon')] / text () ") if' proprietary'in shoptips: shoptips=' self-operated 'else: shoptips=' non-proprietary' pro_list.append (dict (commodity ID=proid, price = price, store name = shopname, store attribute = shoptips, title = target) df = df.append (pro_list) # 3. Merge product comments and lists df_pro = pd.merge (df,dd_commt,on=' goods ID') return df_pro
Join the main program
If _ _ name__ = = "_ _ main__": create_table ('shouji') for i in range (1PM81): url =' https://search.jd.com/s_new.php?keyword= Mobile & wq Mobile & ev=3613_104528%5E&page= {0} & s=30'.format (I) rep = requesturl (url) df_pro = parse (rep) Sava_oracle (df_pro) time.sleep (random.randrange (1Power4)) print ('done:' I) about how to use Python to climb JD.com 's price, title and evaluation, etc., so much for sharing here. I hope the above content can be of some help to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.