🤖
Data Structures and Algorithms
  • CS Theory And Problems
  • Big O
    • Big O Notation
  • Algorithms
    • Binary Search
    • Breadth First Search (BFS)
    • Depth First Search (DFS)
    • Dynamic Programming
    • Kadane's Algorithm
    • Kahn's Algorithm
    • Quickselect
    • Recursion
    • Sorting Algorithms
    • Sliding Window
  • Data Structures
    • Binary Heap
    • Graph
    • Linked List
    • Trees
  • Problems
    • LeetCode
      • 1. Two Sum
      • 3. Longest Substring Without Repeating Characters
      • 5. Longest Palindromic Substring
      • 11. Container With Most Water
      • 15. 3 Sum
      • 19. Remove Nth Node From End of List
      • 20. Valid Parentheses
      • 33. Search in Rotated Sorted Array
      • 49. Group Anagrams
      • 53. Maximum Subarray
      • 55. Jump Game
      • 56. Merge Intervals
      • 76. Minimum Window Substring
      • 98. Validate Binary Search Tree
      • 105. Construct Binary Tree from Preorder and Inorder Traversal
      • 121. Best Time to Buy and Sell Stock
      • 133. Clone Graph
      • 141. Linked List Cycle
      • 152. Maximum Product Subarray
      • 153. Find Minimum in Rotated Sorted Array
      • 200. Number of Islands
      • 206. Reverse Linked List
      • 207. Course Schedule
      • 217. Contains Duplicate
      • 226. Invert Binary Tree
      • 238. Product of Array Except Self
      • 242. Valid Anagram
      • 297. Serialize and Deserialize Binary Tree
      • 347. Top K Frequent Elements
      • 417. Pacific Atlantic Water Flow
      • 424. Longest Repeating Character Replacement
      • 435. Non-overlapping Intervals
      • 647. Palindromic Substrings
Powered by GitBook
On this page
  • Description
  • Solution
  • Solution 1 - Sort - O(n log n)
  • Solution 2 - Heap - O(n log k)
  • Solution 3 - Bucket Sort - O(n)
  • Solution 4 - Quickselect - O(n)
  1. Problems
  2. LeetCode

347. Top K Frequent Elements

Previous297. Serialize and Deserialize Binary TreeNext417. Pacific Atlantic Water Flow

Last updated 3 years ago

Description

See:

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Constraints:

  • 1 <= nums.length <= 105

  • k is in the range [1, the number of unique elements in the array].

  • It is guaranteed that the answer is unique.

Solution

There are various solutions to this problem. The simple approach is to use the built in sort function. But, the follow up to question is to provide a solution that performs in under O(n log n) time. I did it in O(n log k) time using a . However, there are two approaches that give a O(n) solution.

The is the best answer for this problem as it is fast (O(n)) and the code is straight forward.

In all solutions, the first step is to create a dict that maps the numbers to their frequencies.

Solution 1 - Sort - O(n log n)

This approach uses the built in sort function. The frequency dict is created. The built in sorted function is used to sort on the dict keys, using their values as the sort key.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        freq = Counter(nums)
        result = sorted(freq.keys(), key=lambda n: freq[n], reverse=True)
        return result[:k]

Time complexity: O(n log n) - This is the time complexity of the built in sort function. There is also the O(n) time to build the hash and reverse the sorted array. But since these are additions, the complexity reduces to O(n log n)

Space complexity: O(n) - Building the hash is O(n) and the sorted function creates a new list which is also O(n). O(n) + O(n) reduces to O(n)

Solution 2 - Heap - O(n log k)

By doing the len(nums) == k check at the start ensures that the code below it will operate in k time, since k will be less than n.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # This is the case where nums is unique numbers, so everything has a frequency of 1.
        # Example nums=[1,2,3,4], k=4
        if len(nums) == k:
            return nums
        freq = Counter(nums)
        # return k elements for the iterable freq.keys() (numbers), sort by freq values (which is the frequency of each number)
        return heapq.nlargest(k, freq.keys(), key=freq.get)
        

Time Complexity: O(n log k) - heapify and retrieve the k elements. The n scenario is covered by the len(nums) == k check, which runs in O(1) time.

Space Complexity: O(n + k) - dict of n elements and Heap of k elements

Solution 3 - Bucket Sort - O(n)

In this solution an array where the indexes represent the frequencies is created. The numbers are sorted into these buckets. This is known as a bucket sort.

Example: nums = [1, 4, 1, 3, 4, 2, 1, 2]

buckets = [[], [3], [4, 2], [1]]

  • index 0 - will always be empty, as it represents frequency 0. We do not consider numbers that are non-existent in the array.

  • index 1 - represents the numbers that have a frequency of 1. In this case the number 3

  • index 2 - represents the numbers that have a frequency of 2. In this case the numbers 4 and 2

  • index 3 - represent the nubers that have a frequency of 3. In this case the number 1.

The size of the array is the size of the nums array + 1. Why? Because index 0 will never be used. For the case where all the numbers are the same, example nums = [5, 5, 5] the frequency will be len(nums) + 1 (3 in this case).

class Solution:
    def topKFrequent_bucket_sort(self, nums: List[int], k: int) -> List[int]:
        if len(nums) == k:
            return nums
        freq = Counter(nums)
        buckets = [[] for i in range(len(nums) + 1)]
        for num, count in freq.items():
            buckets[count].append(num)

        result = []
        for i in range(len(buckets) - 1, 0, -1):
            for num in buckets[i]:
                result.append(num)
                if len(result) == k:
                    return result

Space Complexity: O(n) - Requires space for the freq dict, the buckets array and the result array

Solution 4 - Quickselect - O(n)

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        if len(nums) == k:
            return nums
        counts = Counter(nums)
        unique_nums = list(counts.keys())

        def partition(left, right):
            pivot_index = (left + right) // 2
            unique_nums[right], unique_nums[pivot_index] = unique_nums[pivot_index], unique_nums[right]
            mark = left
            for i in range(left, right):
                if counts[unique_nums[i]] < counts[unique_nums[right]]:
                    unique_nums[i], unique_nums[mark] = unique_nums[mark], unique_nums[i]
                    mark += 1
            unique_nums[mark], unique_nums[right] = unique_nums[right], unique_nums[mark]
            return mark

        def select(left, right, kth_position):
            val = partition(left, right)
            if val < kth_position:
                select(val + 1, right, kth_position)
            elif val > kth_position:
                select(left, val - 1, kth_position)
            else:
                return

        k_position = len(unique_nums) - k
        select(0, len(unique_nums) - 1, k_position)
        return unique_nums[k_position:]
        

Use a as the max frequency will be at the root. Popping the root k times will give the k most frequent items.

Python has the built in module which makes working with heaps easy. It has the function which does which this question asks for.

Time Complexity: O(n) - The nested for loops are not O(n^2) because they are iterating over different things. The outer loop is iterating over the bucket array. The inner loop is iterating over the array in each bucket. See

This solution uses the algorithm.

heap
heapq
heapq.nlargest
Big O Notation
quickselect
https://leetcode.com/problems/top-k-frequent-elements/
heap
bucket sort solution