python - Why is using a dictionary slower than sorting a list to generate frequency array?

admin 管理员组

文章数量: 1087131

I was doing this question on Codeforces (2057B - Gorilla and the Exam) where I had to use the frequencies of different numbers in a list. First I tried calculating it with a dictionary but I got TLE verdict (Time limit exceeded). After reading the editorial I implemented it by sorting the input list and then using comparisons with the previous element to generate the frequency array. This was faster than the approach using the dictionary but sorting is an O(nlogn) operation which should take more time than creating a dictionary which only requires O(n) time.

Dictionary approach (slower)

t= int(input())
for _ in range(t):
    n, k = map(int, input().split())
    arr = map(int, input().split())
    frequence_table = {}
    for num in arr:
        try:
            frequence_table[num] += 1
        except:
            frequence_table[num] = 1
    freq = list(frequence_table.values())
    freqs = sorted(freq, reverse=True)
    while k>0:
        if k>=freqs[-1]:
            k -= freqs.pop()
        else:
            break
    print(max(len(freqs), 1))

Sorting based approach (faster)

t= int(input())
for _ in range(t):
    n, k = map(int, input().split())
    arr = map(int, input().split())
    arr = sorted(arr)
    freq = [1]
    for i in range(1, len(arr)):
        if arr[i] == arr[i-1]:
            freq[-1] += 1
        else:
            freq.append(1)
    freqs = sorted(freq, reverse=True)
    while k>0:
        if k>=freqs[-1]:
            k -= freqs.pop()
        else:
            break
    print(max(len(freqs), 1))

Dictionary approach (slower)

t= int(input())
for _ in range(t):
    n, k = map(int, input().split())
    arr = map(int, input().split())
    frequence_table = {}
    for num in arr:
        try:
            frequence_table[num] += 1
        except:
            frequence_table[num] = 1
    freq = list(frequence_table.values())
    freqs = sorted(freq, reverse=True)
    while k>0:
        if k>=freqs[-1]:
            k -= freqs.pop()
        else:
            break
    print(max(len(freqs), 1))

Sorting based approach (faster)

t= int(input())
for _ in range(t):
    n, k = map(int, input().split())
    arr = map(int, input().split())
    arr = sorted(arr)
    freq = [1]
    for i in range(1, len(arr)):
        if arr[i] == arr[i-1]:
            freq[-1] += 1
        else:
            freq.append(1)
    freqs = sorted(freq, reverse=True)
    while k>0:
        if k>=freqs[-1]:
            k -= freqs.pop()
        else:
            break
    print(max(len(freqs), 1))

Share Improve this question edited Mar 27 at 9:58 Kelly Bundy 27.7k7 gold badges34 silver badges73 bronze badges asked Mar 27 at 9:33 User 234 bronze badges

1 I dont think its slower, you are converting dictionary to list also. freq = list(frequence_table.values()) which also takes time. Do the benchmarking with timeit module. – Albin Paul Commented Mar 27 at 9:38

Add a comment |

2 Answers 2

Sorted by: Reset to default 1

Why is using a dictionary slower than sorting a list to generate frequency array?

Apparently because one input is specially crafted to hack Python's dict implementation, so that building your dict doesn't take the expected linear time. It can take up to quadratic time, and likely they did that. See Python's TimeComplexity page (showing O(n) worst case time for each dict access) and Tutorial: How to hack hash table in Python 3 on CodeForces itself.

One way to defeat it is to simply add 1 to all the numbers, i.e., change

    for num in arr:

to this:

    for num in arr:
        num += 1

Then the solution doesn't get stopped and rejected at the 1 second time limit but gets accepted with around 0.18 seconds. Despite doing more work, and without really changing the data like randomizing or sorting would. The effect is simply that Python's collision resolution takes different paths then, which is enough to thwart the brittle attack.

Another way, also accepted in around 0.18 seconds, is to simply not convert to ints, instead counting the strings. I.e., change

    arr = map(int, input().split())

to this:

    arr = input().split()

That's also safer, as Python deliberately and unpredictably randomizes string hashes exactly to defeat such attacks:

the __hash__() values of str and bytes objects are “salted” with an unpredictable random value. [...] This is intended to provide protection against a denial-of-service caused by carefully chosen inputs that exploit the worst case performance of a dict insertion, O(n²) complexity.

Modifying the frequency list (pop) is unnecessary and may be impacting on runtime performance.

You can optimise your code (to a certain extent) by:

from collections import Counter

t = int(input())

for _ in range(t):
    _, k = map(int, input().split())
    arr = map(int, input().split())
    freqs = sorted(Counter(arr).values())
    for i, f in enumerate(freqs):
        if k <= 0:
            break
        k -= f
    print(max(len(freqs)-i, 1))

本文标签： pythonWhy is using a dictionary slower than sorting a list to generate frequency arrayStack Overflow

版权声明：本文标题：python - Why is using a dictionary slower than sorting a list to generate frequency array? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://roclinux.cn/p/1744099861a2533449.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

python - Why is using a dictionary slower than sorting a list to generate frequency array? - Stack Overflow

2 Answers 2

更多相关文章