Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

SQL Server calculates Jaccard coefficient-sim (iMagazine j)

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

A few days ago, I saw someone in the Q group asking such a question: how to use SQL to implement the following calculations in SQL Server

It is known from the figure that the problem is how to calculate the Jaccard coefficient. Jaccard coefficient, also known as Jaccard similarity coefficient (Jaccard similarity coefficient), is used to compare the similarities and differences between finite sample sets. The higher the Jaccard coefficient, the higher the sample similarity.

SQL Server implements intersection through intersect, and union implements union, as follows:

Intersect intersection

The code snippet for calculating the intersection is as follows: (1 intersect 0 = null,1 intersect 1 = 1 intersect 0 = 0)

Union union

The code segment is calculated and assembled as follows: (1 union 0 = 10 union 1 = 1 union 0 = 0)

Introduction to the principle:

Get the field name and field ID and circular field name of the table through the sys.columns table, and get the corresponding value in the table according to the field name: if the field name is A, then take the value of ID as 1: select A from test where id=1, and take the value of ID as 2: select A from test where id=2

Then the two values are intersected and merged. At the end of the field loop, we get @ str_intersect,@str_union, and calculate the ratio: len (@ str_intersect) * 1.0/len (@ str_union).

The final results are as follows:

The whole code is as follows: (build the table, see the comments section)

-create table Test (id int,An INT,B INT,C INT,D INT,E INT,F INT)-- INSERT INTO TEST SELECT 1, 1, 1, 0, 5, 5, 0, 0, 5, 5, 0, 5, 5, 1, 5, 5, 5, 5, 5, 5, _ 2 _ numeric (10L4), _ 3 _ numeric (10LEC4), _ 4 _ numeric (10L4), _ 5 _ numeric (10L4), _ 6 _ numeric (10PM4)-- insert into Test_result select 1 nullcock nulllLind nullLectrine nullLectre insert into Test_result select 2pr nullLLLLLLLLINL LINLLY into Test_result select 2LINLLLINLLINLLINLLINLLINLLINL nullLINLLINLLY nulls, nullLINLLINLLINLY nulls, nullLLINLLINOLINOLINE insert into Test_result select 4, nullLLOGONLYLINOLINOLINOLY nulls Null--insert into Test_result select 5 dbo.test' nullrect nullrect # test FROM SYS.COLUMNS WHERE object_id = object_id ('dbo.test') and column_id > 1declare @ id_1 int=0,@id_2 int=0,@str_union varchar (max), @ a_union int,@sql_union varchar (max), @ str_intersect varchar (max), @ a_intersect int Sql_intersect varchar (max) declare @ name varchar (20), @ column_id intcreate table # a_union (num int) create table # a_intersect (num int) declare @ min_id int=0,@max_id int=0,@global_min_id int=0,@global_max_id int=0select @ min_id=min (id), @ max_id=max (id), @ global_min_id=min (id), @ global_max_id=max (id) from Testwhile (@ min_id1--select @ str_union,@str_intersect,@column_id @ id_1,@id_2if (@ id_2=1) update Test_result set _ 1records = convert (numeric (10Power4), len (@ str_intersect) * 1.0/len (@ str_union)) where id=@id_1if (@ id_2=2) update Test_result set _ 2records = convert (numeric (10Power4) Len (@ str_intersect) * 1.0/len (@ str_union) where id=@id_1if (@ id_2=3) update Test_result set _ 3) = convert (numeric (10L4), len (@ str_intersect) * 1.0/len (@ str_union)) where id=@id_1if (@ id_2=4) update Test_result set _ 4) convert (numeric (10L4) Len (@ str_intersect) * 1.0/len (@ str_union) where id=@id_1if (@ id_2=5) update Test_result set _ 5) = convert (numeric (10L4), len (@ str_intersect) * 1.0/len (@ str_union)) where id=@id_1if (@ id_2=6) update Test_result set _ 6) = convert (numeric (10L4) Len (@ str_intersect) * 1.0/len (@ str_union) where id=@id_1 set @ id_2=@id_2+1endset @ min_id=@min_id+1enddrop table # testdrop table # a_uniondrop table # a_intersect-create table test_str_column (id int,str_columns varchar (max)) Null--insert into test_str_column select 4 into # temp_test_table from testselect name,column_id into # tmp_test_columns from sys.columns where object_id=object_id ('dbo.test') and column_id > 1declare @ id int,@col_name varchar (20), @ col_id int,@string_columns varchar (max), @ sql_rs varchar (max) Num_1 intcreate table # tmp_rs (num int) while (select count (1) from # temp_test_table) > 0beginselect top 1 @ id=id from # temp_test_table order by idset @ string_columns=''while (select count (1) from # tmp_test_columns) > 0beginselect top 1 @ col_name=name,@col_id=column_id from # tmp_test_columns order by column_idselect @ sql_rs='select'+ @ col_name+' from test where id='+convert (varchar @ id) insert into # tmp_rsexec (@ sql_rs) select @ num_1=num from # tmp_rsif (@ num_1=1) set @ string_columns=@string_columns+@col_namedelete from # tmp_test_columns where @ col_name=name and @ col_id=column_iddelete from # tmp_rsendinsert into # tmp_test_columns select name Column_id from sys.columns where object_id=object_id ('dbo.test') and column_id > 1update test_str_column set str_columns=@string_columns where id= @ iddelete from # temp_test_table where @ id=idenddrop table # temp_test_tabledrop table # tmp_test_columnsdrop table # tmp_rsselect * from testselect * from test_str_columnselect * from Test_result

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 262

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report