Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize the Fuzzy matching of two tables in Python pandas

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

How to carry on Python pandas two table content fuzzy matching realization, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.

1. Method 2

This method is that two tables build a certain same field, then join them all, and filter the matching results. When the amount of data is small, the logic is relatively simple, but the memory consumption is large.

1. Import library import pandas as pdimport numpy as npimport re2. Construction keywords # keyword data df_keyword = pd.DataFrame ({"keyid": np.arange (5), "keyword": ["numpy", "pandas", "matplotlib", "sklearn", "tensorflow"]}) df_keyword

3. Construct the sentence df_sentence = pd.DataFrame ({"senid": np.arange (10jue 17), "sentence": ["how to implement merge with pandas?" , "detailed Numpy tutorials for Python", "how to use Pandas to split and merge Excel files in batches?" , "how do I use the map and apply functions of pandas?" , "introduction to tensorflow for Deep Learning", "relationship between tensorflow and numpy", "some machine learning code based on sklearn"]}) df_sentence

4. Set up a unified index df_keyword ['match'] = 1df_sentence [' match'] = 15. Table join df_merge = pd.merge (df_keyword, df_sentence) df_merge

6. Keywords match def match_func (row): return re.search (row ["keyword"], row ["sentence"], re.IGNORECASE) is not Nonedf_ merge [DF _ merge.apply (match_func, axis = 1)]

The matching results are as follows

2. Method 2

This method requires programming ability, and the amount of calculation on big data set is much less than that of the method.

1. Build the dictionary key_word_dict = {row.keyword: row.keyid for row in df_keyword.itertuples ()} key_word_dict {'numpy': 0,' pandas': 1, 'matplotlib': 2,' sklearn': 3, 'tensorflow': 4} 2. Keyword matching def merge_func (row): # add a column indicating that keyid row ["keyids"] = [keyid for key_word, keyid in key_word_dict.items () if re.search (key_word, row ["sentence"], re.IGNORECASE)] return rowdf_merge = df_sentence.apply (merge_func, axis = 1) 3. Results display df_merge

4. Expand the matching result to df_result = pd.merge (left = df_merge.explode ("keyids"), right = df_keyword,left_on = "keyids", right_on = "keyid") df_result

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report