347. Top K Frequent Elements
Description
See: https://leetcode.com/problems/top-k-frequent-elements/
Given an integer array
nums
and an integerk
, return thek
most frequent elements. You may return the answer in any order.Constraints:
1 <= nums.length <= 105
k
is in the range[1, the number of unique elements in the array]
.It is guaranteed that the answer is unique.
Solution
There are various solutions to this problem. The simple approach is to use the built in sort function. But, the follow up to question is to provide a solution that performs in under O(n log n) time. I did it in O(n log k) time using a heap. However, there are two approaches that give a O(n) solution.
The bucket sort solution is the best answer for this problem as it is fast (O(n)) and the code is straight forward.
In all solutions, the first step is to create a dict that maps the numbers to their frequencies.
Solution 1 - Sort - O(n log n)
This approach uses the built in sort function. The frequency dict is created. The built in sorted function is used to sort on the dict keys, using their values as the sort key.
Time complexity: O(n log n) - This is the time complexity of the built in sort function. There is also the O(n) time to build the hash and reverse the sorted array. But since these are additions, the complexity reduces to O(n log n)
Space complexity: O(n) - Building the hash is O(n) and the sorted function creates a new list which is also O(n). O(n) + O(n) reduces to O(n)
Solution 2 - Heap - O(n log k)
Use a heap as the max frequency will be at the root. Popping the root k times will give the k most frequent items.
Python has the built in heapq module which makes working with heaps easy. It has the heapq.nlargest function which does which this question asks for.
By doing the len(nums) == k
check at the start ensures that the code below it will operate in k time, since k will be less than n.
Time Complexity: O(n log k) - heapify and retrieve the k elements. The n scenario is covered by the len(nums) == k
check, which runs in O(1) time.
Space Complexity: O(n + k) - dict of n
elements and Heap of k
elements
Solution 3 - Bucket Sort - O(n)
In this solution an array where the indexes represent the frequencies is created. The numbers are sorted into these buckets. This is known as a bucket sort.
Example: nums = [1, 4, 1, 3, 4, 2, 1, 2]
buckets = [[], [3], [4, 2], [1]]
index 0 - will always be empty, as it represents frequency 0. We do not consider numbers that are non-existent in the array.
index 1 - represents the numbers that have a frequency of 1. In this case the number 3
index 2 - represents the numbers that have a frequency of 2. In this case the numbers 4 and 2
index 3 - represent the nubers that have a frequency of 3. In this case the number 1.
The size of the array is the size of the nums array + 1. Why? Because index 0 will never be used. For the case where all the numbers are the same, example nums = [5, 5, 5] the frequency will be len(nums) + 1 (3 in this case).
Time Complexity: O(n) - The nested for loops are not O(n^2) because they are iterating over different things. The outer loop is iterating over the bucket array. The inner loop is iterating over the array in each bucket. See Big O Notation
Space Complexity: O(n) - Requires space for the freq dict, the buckets array and the result array
Solution 4 - Quickselect - O(n)
This solution uses the quickselect algorithm.
Last updated