Given a list of reviews
, a list of keywords
and an integer k
. Find the most popular k
keywords in order of most to least frequently mentioned.
The comparison of strings is case-insensitive.
Multiple occurances of a keyword in a review should be considred as a single mention.
If keywords are mentioned an equal number of times in reviews, sort alphabetically.
Example 1:
Input: k = 2keywords = ["anacell", "cetracular", "betacellular"] reviews = [ "Anacell provides the best services in the city", "betacellular has awesome services", "Best services provided by anacell, everyone should use anacell", ] Output: ["anacell", "betacellular"] Explanation:"anacell" is occuring in 2 different reviews and "betacellular" is only occuring in 1 review.
Example 2:
Input: k = 2keywords = ["anacell", "betacellular", "cetracular", "deltacellular", "eurocell"] reviews = [ "I love anacell Best services; Best services provided by anacell", "betacellular has great services", "deltacellular provides much better services than betacellular", "cetracular is worse than anacell", "Betacellular is better than deltacellular.", ] Output: ["betacellular", "anacell"] Explanation:"betacellular" is occuring in 3 different reviews. "anacell" and "deltacellular" are occuring in 2 reviews, but "anacell" is lexicographically smaller.
Related problems:
k = 2 keywords = ["anacell", "betacellular", "cetracular", "deltacellular", "eurocell"] reviews = [ "I love anacell Best services; Best services provided by anacell", "betacellular has great services", "deltacellular provides much better services than betacellular", "cetracular is worse than anacell", "Betacellular is better than deltacellular.", ] import re from collections import Counter import heapq import collections class Element: def __init__(self, word, freq): self.word = word self.freq = freq def __lt__(self, other): if self.freq == other.freq: return self.word > other.word return self.freq < other.freq def topKFrequent(k, keywords, reviews): ''' k: int keywwords: list of string reviews: list of string ''' word_list = [] for review in reviews: word_list += set(review.lower().replace('[^a-z0-9]', '').split()) table = collections.defaultdict(int) for word in word_list: table[word] += 1 heap = [] for word, freq in table.items(): if word in keywords: heapq.heappush(heap, Element(word, freq)) if len(heap) > k: heapq.heappop(heap) return [heapq.heappop(heap).word for i in range(k)][::-1] print(topKFrequent(k, keywords, reviews))