inp sizes 1024 10240 102400 1024000 10240000
add numpy 0.009 0.043 1.217 12.059 158.210
add corepy 1 0.050 0.100 1.312 11.653 171.419
add corepy 2 0.080 0.112 1.070 9.352 109.685
add corepy 4 0.179 0.206 1.053 9.355 98.217
REDUCE
add numpy 0.009 0.042 0.377 4.154 41.192
add corepy 1 0.049 0.057 0.212 3.901 37.618
add corepy 2 0.068 0.071 0.178 3.084 23.808
add corepy 4 0.155 0.156 0.255 2.950 22.188
inp sizes 1024 10240 102400 1024000 10240000
mul numpy 0.008 0.047 1.078 12.415 162.024
mul corepy 1 0.057 0.122 1.040 8.687 173.505
mul corepy 2 0.066 0.098 0.424 7.309 126.512
mul corepy 4 0.124 0.173 0.625 8.103 115.533
REDUCE
mul numpy 0.013 0.058 0.528 5.486 54.841
mul corepy 1 0.041 0.056 0.152 1.078 10.369
mul corepy 2 0.068 0.085 0.129 0.751 6.586
mul corepy 4 0.158 0.156 0.221 0.849 5.797
Times are in milliseconds, and are average of 10 runs. For the CorePy cases, I'm running with 1/2/4 cores (basic parallelization over cores was pretty easy!). CorePy's overhead is visible, as mentioned in my previous post. There's some unexpected things in these numbers too -- for example, multiplication is faster than addition for CorePy but not NumPy. I dont know why; multiply instructions take longer to execute, and I am otherwise doing the same thing in both cases.
While testing ufuncs with various inputs (and input sizes) I came across some differences in behavior when values reach/exceed 2**63/64 (I'm using dtype=int64). This sent me off trying to understand what NumPy's behavior is. What I expected was for integers to wrap at 2**63 (signed) or 2**64 (unsigned), but what I see is this:
>>> x = numpy.array((2**62,) * 1)
>>> hex(numpy.add.reduce(x))
'0x4000000000000000L'
>>> x = numpy.array((2**62,) * 2)
>>> hex(numpy.add.reduce(x))
'-0x8000000000000000L'
Huh? Last time I checked, 4 + 4 = 8, not -8... am I missing something? Will have to check with the NumPy people to see what's going on.
No comments:
Post a Comment