inp sizes 1024 10240 102400 1024000 10240000

add numpy 0.009 0.043 1.217 12.059 158.210

add corepy 1 0.050 0.100 1.312 11.653 171.419

add corepy 2 0.080 0.112 1.070 9.352 109.685

add corepy 4 0.179 0.206 1.053 9.355 98.217

REDUCE

add numpy 0.009 0.042 0.377 4.154 41.192

add corepy 1 0.049 0.057 0.212 3.901 37.618

add corepy 2 0.068 0.071 0.178 3.084 23.808

add corepy 4 0.155 0.156 0.255 2.950 22.188

inp sizes 1024 10240 102400 1024000 10240000

mul numpy 0.008 0.047 1.078 12.415 162.024

mul corepy 1 0.057 0.122 1.040 8.687 173.505

mul corepy 2 0.066 0.098 0.424 7.309 126.512

mul corepy 4 0.124 0.173 0.625 8.103 115.533

REDUCE

mul numpy 0.013 0.058 0.528 5.486 54.841

mul corepy 1 0.041 0.056 0.152 1.078 10.369

mul corepy 2 0.068 0.085 0.129 0.751 6.586

mul corepy 4 0.158 0.156 0.221 0.849 5.797

Times are in milliseconds, and are average of 10 runs. For the CorePy cases, I'm running with 1/2/4 cores (basic parallelization over cores was pretty easy!). CorePy's overhead is visible, as mentioned in my previous post. There's some unexpected things in these numbers too -- for example, multiplication is faster than addition for CorePy but not NumPy. I dont know why; multiply instructions take longer to execute, and I am otherwise doing the same thing in both cases.

While testing ufuncs with various inputs (and input sizes) I came across some differences in behavior when values reach/exceed 2**63/64 (I'm using dtype=int64). This sent me off trying to understand what NumPy's behavior is. What I expected was for integers to wrap at 2**63 (signed) or 2**64 (unsigned), but what I see is this:

>>> x = numpy.array((2**62,) * 1)

>>> hex(numpy.add.reduce(x))

'0x4000000000000000L'

>>> x = numpy.array((2**62,) * 2)

>>> hex(numpy.add.reduce(x))

'-0x8000000000000000L'

Huh? Last time I checked, 4 + 4 = 8, not -8... am I missing something? Will have to check with the NumPy people to see what's going on.

## No comments:

## Post a Comment