347. Top K Frequent Elements

Description

See: https://leetcode.com/problems/top-k-frequent-elements/

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Constraints:

  • 1 <= nums.length <= 105

  • k is in the range [1, the number of unique elements in the array].

  • It is guaranteed that the answer is unique.

Solution

There are various solutions to this problem. The simple approach is to use the built in sort function. But, the follow up to question is to provide a solution that performs in under O(n log n) time. I did it in O(n log k) time using a heap. However, there are two approaches that give a O(n) solution.

The bucket sort solution is the best answer for this problem as it is fast (O(n)) and the code is straight forward.

In all solutions, the first step is to create a dict that maps the numbers to their frequencies.

Solution 1 - Sort - O(n log n)

This approach uses the built in sort function. The frequency dict is created. The built in sorted function is used to sort on the dict keys, using their values as the sort key.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        freq = Counter(nums)
        result = sorted(freq.keys(), key=lambda n: freq[n], reverse=True)
        return result[:k]

Time complexity: O(n log n) - This is the time complexity of the built in sort function. There is also the O(n) time to build the hash and reverse the sorted array. But since these are additions, the complexity reduces to O(n log n)

Space complexity: O(n) - Building the hash is O(n) and the sorted function creates a new list which is also O(n). O(n) + O(n) reduces to O(n)

Solution 2 - Heap - O(n log k)

Use a heap as the max frequency will be at the root. Popping the root k times will give the k most frequent items.

Python has the built in heapq module which makes working with heaps easy. It has the heapq.nlargest function which does which this question asks for.

By doing the len(nums) == k check at the start ensures that the code below it will operate in k time, since k will be less than n.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # This is the case where nums is unique numbers, so everything has a frequency of 1.
        # Example nums=[1,2,3,4], k=4
        if len(nums) == k:
            return nums
        freq = Counter(nums)
        # return k elements for the iterable freq.keys() (numbers), sort by freq values (which is the frequency of each number)
        return heapq.nlargest(k, freq.keys(), key=freq.get)
        

Time Complexity: O(n log k) - heapify and retrieve the k elements. The n scenario is covered by the len(nums) == k check, which runs in O(1) time.

Space Complexity: O(n + k) - dict of n elements and Heap of k elements

Solution 3 - Bucket Sort - O(n)

In this solution an array where the indexes represent the frequencies is created. The numbers are sorted into these buckets. This is known as a bucket sort.

Example: nums = [1, 4, 1, 3, 4, 2, 1, 2]

buckets = [[], [3], [4, 2], [1]]

  • index 0 - will always be empty, as it represents frequency 0. We do not consider numbers that are non-existent in the array.

  • index 1 - represents the numbers that have a frequency of 1. In this case the number 3

  • index 2 - represents the numbers that have a frequency of 2. In this case the numbers 4 and 2

  • index 3 - represent the nubers that have a frequency of 3. In this case the number 1.

The size of the array is the size of the nums array + 1. Why? Because index 0 will never be used. For the case where all the numbers are the same, example nums = [5, 5, 5] the frequency will be len(nums) + 1 (3 in this case).

class Solution:
    def topKFrequent_bucket_sort(self, nums: List[int], k: int) -> List[int]:
        if len(nums) == k:
            return nums
        freq = Counter(nums)
        buckets = [[] for i in range(len(nums) + 1)]
        for num, count in freq.items():
            buckets[count].append(num)

        result = []
        for i in range(len(buckets) - 1, 0, -1):
            for num in buckets[i]:
                result.append(num)
                if len(result) == k:
                    return result

Time Complexity: O(n) - The nested for loops are not O(n^2) because they are iterating over different things. The outer loop is iterating over the bucket array. The inner loop is iterating over the array in each bucket. See Big O Notation

Space Complexity: O(n) - Requires space for the freq dict, the buckets array and the result array

Solution 4 - Quickselect - O(n)

This solution uses the quickselect algorithm.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        if len(nums) == k:
            return nums
        counts = Counter(nums)
        unique_nums = list(counts.keys())

        def partition(left, right):
            pivot_index = (left + right) // 2
            unique_nums[right], unique_nums[pivot_index] = unique_nums[pivot_index], unique_nums[right]
            mark = left
            for i in range(left, right):
                if counts[unique_nums[i]] < counts[unique_nums[right]]:
                    unique_nums[i], unique_nums[mark] = unique_nums[mark], unique_nums[i]
                    mark += 1
            unique_nums[mark], unique_nums[right] = unique_nums[right], unique_nums[mark]
            return mark

        def select(left, right, kth_position):
            val = partition(left, right)
            if val < kth_position:
                select(val + 1, right, kth_position)
            elif val > kth_position:
                select(left, val - 1, kth_position)
            else:
                return

        k_position = len(unique_nums) - k
        select(0, len(unique_nums) - 1, k_position)
        return unique_nums[k_position:]
        

Last updated