What are the methods of de-duplicating java List collection objects and deduplicating them by attributes 07/17 Update SLTechnology News&Howtos

What are the methods of de-duplicating java List collection objects and deduplicating them by attributes

2025-07-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "java List collection object de-weight and attribute de-weight method". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. An outline of this article

In this article I want to write about 8 ways to remove duplicates from List set elements. In fact, through flexible use, permutations and combinations are not necessarily 8, there may be 18 ways.

Four methods to remove the weight of object elements as a whole

Four methods to remove duplicates according to object attributes

In order to explain the test content below, let's first do some initialization data.

Public class ListRmDuplicate {private List list; private List playerList; @ BeforeEach public void setup () {list = new ArrayList (); list.add ("kobe"); list.add ("james"); list.add ("curry"); list.add ("zimug"); list.add ("zimug"); playerList= new ArrayList (); playerList.add (new Player ("kobe", "10000")) / / Bryant long live playerList.add (new Player ("james", "32"); playerList.add (new Player ("curry", "30")); playerList.add (new Player ("zimug", "27")); / / notice that the name repeats playerList.add ("zimug", "18"); / / notice that the name and age repeat playerList.add ("zimug", "18"). / / notice that the name and age repeat here}}

The Player object is a normal java object with two member variables, name and age, that implement the constructor with parameters, the toString, equals and hashCode methods, and the GET/SET method.

Second, the collection elements are deduplicated as a whole

In the following four methods, the String type in List is de-duplicated as a whole in units of collection element objects. If your List is an Object object, you need to implement the object's equals and hashCode methods. The code implementation method of de-duplication is the same as List de-duplication.

The first method

It is the easiest thing for everyone to think of, first put the List data into Set, because the Set data structure itself has the function of de-duplication, so the result of de-duplication is to convert SET into List. This method changes the original order of List elements after deweighting, because HashSet itself is unordered, and TreeSet sorting is not the original order of List elements.

[@ Test] (https://my.oschina.net/azibug)void testRemove1 () {/ * Set set = new HashSet (list); List newList = new ArrayList (set); * / / the method of removing duplicates and sorting (in the case of strings, sort alphabetically). If it is an object, sort it by Comparable API) / / List newList = new ArrayList (new TreeSet (list)); / / abbreviated method List newList = new ArrayList (new HashSet (list)); System.out.println ("deduplicated collection:" + newList);}

The console print results are as follows:

Deduplicated set: [kobe, james, zimug, curry]

The second method

It is relatively easy to use, first using the stream method to convert the collection into a stream, then distinct to remove duplicates, and finally collecting the Stream stream collect into List.

[@ Test] (https://my.oschina.net/azibug)void testRemove2 () {List newList = list.stream () .distinct () .collect (Collectors.toList ()); System.out.println ("deduplicated set:" + newList);}

The console print results are as follows:

Deduplicated set: [kobe, james, curry, zimug]

The third method takes advantage of set.add (T), which returns false if the T element already exists in the collection. This method is used to determine whether the data is duplicated or not. If it is not repeated, it is put into a new newList, and the newList is the final de-duplicated result.

/ / three collection classes list, newList, set, which can guarantee the order [@ Test] (https://my.oschina.net/azibug)void testRemove3 () {Set set = new HashSet (); List newList = new ArrayList (); for (String str: list) {if (set.add (str)) {/ / return false newList.add (str) if repeated;}} System.out.println ("deduplicated set:" + newList);}

The console print results are consistent with the second method.

The fourth method this method has broken away from the idea of using Set sets for deduplication, but uses the newList.contains (T) method to determine whether the data already exists when adding data to the new List, and if so, do not add it, so as to achieve the effect of de-duplication.

/ / optimize List, newList, set to ensure the order [@ Test] (https://my.oschina.net/azibug)void testRemove4 () {List newList = new ArrayList (); for (String cd:list) {if (! newList.contains (cd)) {/ / actively judge whether it contains repeating elements newList.add (cd);}} System.out.println ("deduplicated set:" + newList);}

The console print results are consistent with the second method.

Third, it is duplicated according to the object attributes of the collection elements.

In fact, in the actual work, the application of de-weighting according to the collection element object as a whole is less, and more requires us to de-weight according to some attributes of the element object. When you see this, please go back and take a look at the initialization data playerList constructed above, paying special attention to some of the repeating elements and duplicates of member variables.

The first method implements the Comparator interface for TreeSet, and if we want to de-duplicate according to the name property of Player, compare the name in the Comparator interface. Two ways to implement the Comparator interface are written below:

Lambda expression: (o1, O2)-> o1.getName () .compareTo (o2.getName ())

Method reference: Comparator.comparing (Player::getName)

@ Testvoid testRemove5 () {/ / Set playerSet = new TreeSet ((o1, O2)-> o1.getName () .compareTo (o2.getName ()); Set playerSet = new TreeSet (Comparator.comparing (Player::getName)); playerSet.addAll (playerList); / * new ArrayList (playerSet) .forEach (player- > {System.out.println (player.toString ());}) * / / print out the deduplicated result new ArrayList (playerSet) .forEach (System.out::println);}

The output is as follows: three zimug are duplicated because the name is duplicated, and the other two are deduplicated. But the elements are reordered because of the use of TreeSet,list.

Player {name='curry', age='30'} Player {name='james', age='32'} Player {name='kobe', age='10000'} Player {name='zimug', age='27'}

The second method is used in a lot of online articles to show that they are very good, but in the author's opinion, it is unnecessary to take off their pants and fart. Since everyone says that there is such a way, I will not write as if I am not good. Why do I say this method is "take off your pants and fart"?

First convert the list collection to a stream with stream ()

Then use collect and toCollection to convert the stream into a collection.

And then the rest is the same as the first method.

Didn't you take off your pants and fart in the first two steps? Just take a look, there is not much practical significance, but if it is to learn how to use Stream streams, it is advisable to come up with such an example.

@ Testvoid testRemove6 () {List newList = playerList.stream () .collect (Collectors .directingAndThen (Collectors.toCollection (()-> new TreeSet (Comparator.comparing (Player::getName), ArrayList::new)); newList.forEach (System.out::println);}

The console printout is the same as the first method.

The third method

This method is also suggested by the author to use a method, at first glance, it seems that the amount of code is larger, but in fact this method is a relatively simple method.

Predicate (some people call this an assertion, which can be translated as a predicate as a noun and as a verb as an assertion). Predicate is used to modify the subject, such as: like singing birds, like singing is the predicate, used to limit the scope of the subject. So we are here for filter filtering, but also to limit the scope of the subject, so I think it is more appropriate to translate it as a predicate. Whatever, depending on how you think it is reasonable and easy to remember, you can come the way you want.

First, we define a predicate Predicate for filtering, and the condition for filtering is distinctByKey. The predicate returns the ture element reserved, and the return false element is filtered out.

Of course, our requirement is to filter out repetitive elements. Our de-duplication logic is implemented through map's putIfAbsent. The putIfAbsent method adds a key-value pair. If there is no corresponding value for the key in the map collection, it will be added directly, and null will be returned. If the corresponding value already exists, it will remain the original value.

If putIfAbsent returns null, it means that the data was added successfully (not repeated), and if putIfAbsent returns value (value==null: false), then the conditional elements that satisfy the distinctByKey predicate are filtered out.

Although this approach seems to increase the amount of code, the distinctByKey predicate method only needs to be defined once to be infinitely reused.

@ Testvoid testRemove7 () {List newList = new ArrayList (); playerList.stream () .filter (distinctByKey (p-> p.getName () / / filter retains the value of true. ForEach (newList::add); newList.forEach (System.out::println);} static Predicate distinctByKey (Function keyExtractor) {Map seen = new ConcurrentHashMap () The / / putIfAbsent method adds a key-value pair. If there is no corresponding value for the key in the map collection, it will be directly added and null will be returned. If the corresponding value already exists, it will remain the original value. / / if null is returned, the data is added successfully (no repetition), and (null==null: TRUE) return t-> seen.putIfAbsent (keyExtractor.apply (t), Boolean.TRUE) = = null;}

The output is as follows: three zimug are duplicated because the name is duplicated, and the other two are deduplicated. And did not disrupt the original order of List

Player {name='kobe', age='10000'} Player {name='james', age='32'} Player {name='curry', age='30'} Player {name='zimug', age='27'}

The fourth method the fourth method is not actually a new method, the above examples are all de-duplicated according to a certain object attribute, if we want to deweight according to a few elements, we need to modify the above three methods. I only modify one of them, and the other several are based on the same principle, which is to add up multiple comparison attributes and compare them as one String attribute.

@ Testvoid testRemove8 () {Set playerSet = new TreeSet (Comparator.comparing (o-> (o.getName () + "+ o.getAge (); playerSet.addAll (playerList); new ArrayList (playerSet) .forEach (System.out::println);}" java List collection object de-weight and attribute de-weight methods "is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.