Java stream удалить дубликаты

Удаление дубликатов по полю объекта Stream API

У нас есть список объектов доменной области ( Person ). У объекта Person есть 3 поля: id , имя , фамилия . Задача: найти дубликаты и создать из них список (или множество, не важно), остальные объекты отбросить. Дубликатами являются объекты, у которых совпадает поле имя . Реализовать механизм необходимо через стримы. Решение:

private static Predicate distinctByKey(Function keyExtractor) < Setseen = new HashSet<>(); return t -> seen.add(keyExtractor.apply(t)); > persons.stream().filter(distinctByKey(Person::getName)) 
  1. Судя по всему HashSet создается всего лишь единожды. Но почему? Изначально все же ожидается будто множество будет создаваться при каждой итерации. Хотелось бы увидеть развернутый ответ на этот счет.
  2. Вызывает вопрос строка return. Откуда у нас берется реализация метода apply() ? Явно ведь я нигде не реализую Function и метод apply() соответственно.

1 ответ 1

Java для каждой лямбды и ссылки на метод в момент выполнения создаёт прокси-класс, реализующий функциональный интерфейс. Метод distinctByKey объявлен как принимающий функциональный интерфейс Function, поэтому при его вызове виртуальная машина создаст прокси-класс

class Example$$Lambda$1 implements Function  < public String apply(Person person) < return person.getName(); >> 

и передаст его вместо ссылки на метод Person::getName . Если убрать весь сахар, может стать понятнее, почему множество seen создаётся один раз. Ваш код в рантайме преобразовывается в приблизительно эквивалентный этому:

class Example$$Lambda$2 implements Predicate  < private final java.util.Set arg$1; public boolean test(String name) < return arg$1.add(name); >> private static Example$$Lambda$2 distinctByKey(Example$$Lambda$1 keyExtractor) < Setseen = new HashSet<>(); // Здесь seen "магическим" образом присваивается // полю arg$1 возвращаемого объекта return new Example$$Lambda$2(); > Example$$Lambda$1 keyExtractor = new Example$$Lambda$1(); Example$$Lambda$2 predicate = distinctByKey(keyExtractor); Iterator stream = persons.iterator(); List result = new ArrayList<>(); while (stream.hasNext()) < Person person = stream.next(); String key = keyExtractor.apply(person); boolean duplicate = predicate.test(key); if (!duplicate) < result.add(key); >> 

И кстати, операция filter оставляет в потоке элементы соответствующие предикату, а метод множества add возвращает true для тех элементов, которых в множестве не было. То есть вы наоборот убираете дубликаты из потока. Вам надо инвертировать предикат:

persons.stream() .filter(distinctByKey(Person::getName).negate()); 

Источник

Java Stream – Find, Count and Remove Duplicates

Few simple examples to find and count the duplicates in a Stream and remove those duplicates since Java 8. We will use ArrayList to provide a Stream of elements including duplicates.

1. Stream.distinct() – To Remove Duplicates

1.1. Remove Duplicate Strings

The distinct() method returns a Stream consisting of the distinct elements of the given stream. The object equality is checked according to the object’s equals() method.

List list = Arrays.asList("A", "B", "C", "D", "A", "B", "C"); // Get list without duplicates List distinctItems = list.stream() .distinct() .collect(Collectors.toList()); // Let's verify distinct elements System.out.println(distinctItems);

1.2. Remove Duplicate Custom Objects

The same syntax can be used to remove the duplicate objects from List. To do so, we need to be very careful about the object’s equals() method, because it will decide if an object is duplicate or unique.

Consider the below example where two Person instances are considered equal if both have the same id value.

Let us see an example of how we can remove duplicate Person objects from a List.

//Add some random persons Collection list = Arrays.asList(p1, p2, p3, p4, p5, p6); // Get distinct people by id List distinctElements = list.stream() .distinct() .collect( Collectors.toList() );

To find all unique objects using a different equality condition, we can take the help of the following distinctByKey() method. For example, we are finding all unique objects by Person’s full name.

//Add some random persons List list = Arrays.asList(p1, p2, p3, p4, p5, p6); // Get distinct people by full name List distinctPeople = list.stream() .filter( distinctByKey(p -> p.getFname() + " " + p.getLname()) ) .collect( Collectors.toList() ); //********The distinctByKey() method need to be created********** public static Predicate distinctByKey(Function keyExtractor) < Mapmap = new ConcurrentHashMap<>(); return t -> map.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null; >

2. Collectors.toSet() – To Remove Duplicates

Another simple and very useful way is to store all the elements in a Set . Sets, by definition, store only distinct elements. Note that a Set stores distinct items by comparing the objects with equals() method.

Here, we cannot compare the objects using a custom equality condition.

ArrayList numbersList = new ArrayList<>(Arrays.asList(1, 1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8)); Set setWithoutDuplicates = numbersList.stream() .collect(Collectors.toSet()); System.out.println(setWithoutDuplicates);

3. Collectors.toMap() – To Count Duplicates

Sometimes, we are interested in finding out which elements are duplicates and how many times they appeared in the original list. We can use a Map to store this information.

We have to iterate over the list, put the element as the Map key, and all its occurrences in the Map value.

// ArrayList with duplicate elements ArrayList numbersList = new ArrayList<>(Arrays.asList(1, 1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8)); Map elementCountMap = numbersList.stream() .collect(Collectors.toMap(Function.identity(), v -> 1L, Long::sum)); System.out.println(elementCountMap);

Источник

Java Stream distinct() Function to Remove Duplicates

Java Stream distinct() Function to Remove Duplicates

While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Java Stream distinct() method returns a new stream of distinct elements. It’s useful in removing duplicate elements from the collection before processing them.

Java Stream distinct() Method

  • The elements are compared using the equals() method. So it’s necessary that the stream elements have proper implementation of equals() method.
  • If the stream is ordered, the encounter order is preserved. It means that the element occurring first will be present in the distinct elements stream.
  • If the stream is unordered, then the resulting stream elements can be in any order.
  • Stream distinct() is a stateful intermediate operation.
  • Using distinct() with an ordered parallel stream can have poor performance because of significant buffering overhead. In that case, go with sequential stream processing.

Remove Duplicate Elements using distinct()

Let’s see how to use stream distinct() method to remove duplicate elements from a collection.

jshell> List list = List.of(1, 2, 3, 4, 3, 2, 1); list ==> [1, 2, 3, 4, 3, 2, 1] jshell> List distinctInts = list.stream().distinct().collect(Collectors.toList()); distinctInts ==> [1, 2, 3, 4] 

Java Stream Distinct Example

Processing only Unique Elements using Stream distinct() and forEach()

Since distinct() is a intermediate operation, we can use forEach() method with it to process only the unique elements.

jshell> List list = List.of(1, 2, 3, 4, 3, 2, 1); list ==> [1, 2, 3, 4, 3, 2, 1] jshell> list.stream().distinct().forEach(x -> System.out.println("Processing " + x)); Processing 1 Processing 2 Processing 3 Processing 4 

Java Stream Distinct ForEach Example

Stream distinct() with custom objects

Let’s look at a simple example of using distinct() to remove duplicate elements from a list.

package com.journaldev.java; import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; public class JavaStreamDistinct < public static void main(String[] args) < ListdataList = new ArrayList<>(); dataList.add(new Data(10)); dataList.add(new Data(20)); dataList.add(new Data(10)); dataList.add(new Data(20)); System.out.println("Data List = "+dataList); List uniqueDataList = dataList.stream().distinct().collect(Collectors.toList()); System.out.println("Unique Data List = "+uniqueDataList); > > class Data < private int id; Data(int i) < this.setId(i); >public int getId() < return id; >public void setId(int id) < this.id = id; >@Override public String toString() < return String.format("Data[%d]", this.id); >> 
Data List = [Data[10], Data[20], Data[10], Data[20]] Unique Data List = [Data[10], Data[20], Data[10], Data[20]] 

The distinct() method didn’t remove the duplicate elements. It’s because we didn’t implement the equals() method in the Data class. So the superclass Object equals() method was used to identify equal elements. The Object class equals() method implementation is:

public boolean equals(Object obj)

Since the Data objects had the same ids’ but they were referring to the different objects, they were considered not equal. That’s why it’s very important to implement equals() method if you are planning to use stream distinct() method with custom objects. Note that both equals() and hashCode() methods are used by Collection classes API to check if two objects are equal or not. So it’s better to provide an implementation for both of them.

@Override public int hashCode() < final int prime = 31; int result = 1; result = prime * result + id; return result; >@Override public boolean equals(Object obj)

Tip: You can easily generate equals() and hashCode() method using “Eclipse > Source > Generate equals() and hashCode()” menu option. The output after adding equals() and hashCode() implementation is:

Data List = [Data[10], Data[20], Data[10], Data[20]] Data equals method Data equals method Unique Data List = [Data[10], Data[20 

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Источник

Читайте также:  Проверка строки на соответствие регулярному выражению python
Оцените статью