Diving into NumPy

name: inverse
layout: true
class: center, middle, inverse
---
background-image:url(pictures/diving_baby.jpg)

# Diving into NumPy

### .white[[3 hours of breaking and fixing NumPy]]

.footnote[.up20[.white[By David Cournapeau & Stéfan van der Walt]]]

---

class: center, bottom, inverse
background-image:url(pictures/tinguely1.jpg)

# Learn how NumPy works internally ...

---
background-image:url(pictures/Tinguely.jpg)
class:right,bottom

.up30[
# ... so that you can improve it ?
]

---
# What will you learn ?
---

## Find your way into the codebase

---

## work efficiently on numpy

---

## Implement array method, ufunc, dtype

---

## and some tools on the way

---

# Beware

---

## most features best implemented outside numpy

---

# When is modifying NumPy appropriate ?

---

## to fix a numpy bug

---

## to improve some existing NumPy features

---

## to follow this tutorial
---
layout:false

.left-column[
## Setting up vagrant
]
.right-column[
Strongly advised to do the exercises:

- install virtualbox (included in usb)
- install vagrant 1.2.1 (included in usb)

Creating the VM:

```bash
mkdir numpy_tuto 
cd numpy_tuto
cp /usbdisk/Vagrantfile .
cp /usbdisk/numpy_tuto.box .
vagrant box add numpy_tuto numpy_tuto.box
vagrant up
# You can ssh into it as followed
vagrant ssh -- -A
```
]
---
---
template: inverse

# 1. Concept overview

### Data-types, strides, broadcasting, indexing

---
layout: false

.left-column[
  ## Data types and types

**Scalar types**: ``np.float64``, ``np.int32``, etc.

**Array scalars** (e.g. ``x[0]``) are somewhat special: each array scalar has
its own type (``np.float32``, ``np.uint32``, etc.) -- but also has an
attached dtype.  Acts as a bridge between arrays and Python scalars.

]
.right-column[
.up30[
![Scalar hierarchy](pictures/sctype-hierarchy.png)

- From this, we can build a dtype:
  ```
  # Build a 32-bit, Big Endian, floating point dtype
  d = np.dtype(np.float32).newbyteorder('>')
  ```
- Dtypes describe memory layout of arrays; i.e, all arrays are of type
  ``ndarray``, with attached data-type.
]]

---

.left-column[
 ## Strides
 <img src="pictures/array_memory_dtype.png" width="430%"/>
]

---
.left-column[
 ## Strides - transpose
 <img src="pictures/array_memory_strides_transpose.png" width="500%"/>
]

---

.left-column[
  ## Broadcasting
]

.right-column[

Combine arrays of different shapes sensibly:

```
>>> x = np.zeros((3, 5))
>>> y = np.zeros(8)
>>> (x[..., np.newaxis] + y).shape
(3, 5, 8)
```

![broadcasting two vectors to form a higher-dimensional array](pictures/array_3x5x8.png)

]

---

.left-column[
  ## Broadcasting compatibility
]

.right-column[

When combining two arrays of different shapes, shapes are matched from right to
left.  Match when:

- Dimensions are equal.
- One dimension is either None or 1.

```
   (5, 10)      (5, 10)    (5, 10, 1)
(3, 5, 10)      (6, 10)       (10, 5)
----------      -------    ----------
(3, 5, 10) OK   BAD        (5, 10, 5) OK
```
]

---

.left-column[
  ## Indexing with broadcasting
]

.right-column[

```
>>> x = np.array([[1, 2], [3, 4]])
array([[1, 2],
       [3, 4]])

>>> ix0 = np.array([0, 0, 1, 1])
>>> ix1 = np.array([[1], [0]])
array([[1],
       [0]])

>>> x[ix0, ix1]
array([[2, 2, 4, 4],
       [1, 1, 3, 3]])
```
```python
>>> np.broadcast_arrays(ix0, ix1)
[array([[0, 0, 1, 1],
       [0, 0, 1, 1]]),
 array([[1, 1, 1, 1],
       [0, 0, 0, 0]])]

```

.tip[.red[TIP] Best to avoid ``:`` and ``...`` in broadcasting--output shape is
sometimes hard to predict.]

]
---
template: inverse

# 2. Building NumPy

---
layout:false

.left-column[
  ## Building with distutils
]

.right-column[

Simple in-place build with default compiler

```bash
$ python setup.py build_ext -i
```

Running the test suite on numpy.core

```bash
$ nosetests numpy/core
```
]

---
template:inverse

# That was not too hard !

---
template:inverse
# Building with Bento

---

layout:false
.left-column[
 ## Building with Bento
 ### Installation and configuration
]

.right-column[

Setting up Bento requires a few steps:

(It is recommended to do this in a virtualenv)

```bash
git clone https://github.com/cournape/Bento \
	~/src/bento-git

cd ~/src/bento-git
python setup.py install --user

git clone https://code.google.com/p/waf ~/src/waf-git
# Tells bento where to look for waf
# (waf has no setup.py)
export WAFDIR=~/src/waf-git

# In NumPy source tree, do an in-place build
bentomaker build -i
```

Set up the Python path (only done once):

```bash
export PYTHONPATH=$PYTHONPATH:~/src/numpy-git
```

]

---

.left-column[
  ## Building with Bento
  ### Features for development
]

.right-column[

Bento is nifty for NumPy development.

- Parallel builds
```bash
bentomaker build -i -j4
```

- Reliable partial rebuilds:
```bash
bentomaker build -i -j4
# Hack to bypass autoconfigure
bentomaker --disable-autoconfigure build -i -j4
```

- Easy compilation customization:
```bash
CC=clang CFLAGS="-O0 -g -Wall -Wextra" \
	bentomaker build -i -j4
```
]

---
template:inverse
# Practice
---

layout:false
.left-column[
 ## A first exercise
]

.right-column[

After setting up Bento, build NumPy with warning on

```
CFLAGS="-O0 -g -Wall -Wextra -W" bentomaker build -i
```

Lots of warnings of the type:

```
../numpy/linalg/lapack_litemodule.c:863:58: warning: \
	unused parameter 'args' [-Wunused-parameter]
lapack_lite_xerbla(PyObject *NPY_UNUSED(self), \
	PyObject *args)
1 warning generated.
```

NumPy has a special macro to decorate unused argument and give an error if they are used

```
void foo(PyObject *NPY_UNUSED(self), ...)
```

Try fixing one .c file so that there is no warning anymore
]
---
template:inverse

background-image:url(pictures/KyotoFushimiInariLarge.jpg)

# 3 Finding your way

---
template:inverse

background-image:url(pictures/pollock.jpg)

# 3.1 Code organization

---

layout:false

.left-column[
  ## Main sub-packages
]

.right-column[
.red[numpy/core]: the meat of NumPy (focus of the tutorial):

- code for multiarray (src/multiarray), ufunc extensions (src/umath)

- various support libraries (npymath, npysort)

- public headers in include

Other important parts:

- .red[numpy/lib]: various tools on top of core

- .red[numpy/fft], .red[numpy/linalg], .red[numpy/random]

Other parts not on topic for this tutorial
]

---
layout:false

.left-column[
  ##npymath
]

.right-column[
.red[npymath] is a small C99 abstraction for cross platform math operations

- static library linked into the extensions that need it

- implement fundamental IEEE 754-related features (npy_isnan/npy_isinf/etc...)

- half float implementation

- C99 layer for functions, macros and constant definitions

npy_* functions should be used throughout numpy C code (e.g. npy_exp, not exp)

```
#include <numpy/npy_math.h>

void foo()
{
	double x = npy_exp(2.0);
}
```

API "documented" in numpy/npy_math.h header
]

---
layout:false

.left-column[
  ## Multiarray extension bird's eye view
]

.right-column[Contain the implementation of the array, dtype and iterators
objects:

- .red[PyArrayObject] struct (numpy/ndarraytypes.h): array object (hidden in recent versions)
- .red[PyArray_Descr] struct (ditto): dtype object
- .red[PyArrayMultiIterObject] struct (ditto): iterator object (used in broadcasting)

```
/* in numpy/ndarraytypes.h  */
struct PyArrayObject {
  char *data; /* Pointer to the raw data buffer */
  int nd; /* number of dimensions */
  npy_intp *dimensions; /* The size in each dimension */
  npy_intp *strides; /* strides array */
  PyObject *base;
  PyArray_Descr *descr; /* Pointer to type structure */
  int flags;
  PyObject *weakreflist;
}
```

One numpy array -> one PyArrayObject instance

]

---
layout:false

.left-column[
  ## dtype implementation details
]

.right-column[
PyArray_Descr contains the instance-spefic data of a dtype

```c
/* in numpy/ndarraytypes.h  */
struct PyArray_Descr {
  ...
  char kind;
  ...
  /* used for structured array */
  struct _arr_descr *subarray;
  PyObject *fields;
  PyObject *names;
  /* a table of functions specific for each */
  PyArray_ArrFuncs *f;
  /* Metadata about this dtype */
  PyObject *metadata;
  NpyAuxData *c_metadata;
}
```

One dtype object -> one PyArray_Descr instance.

```c
/* each dtype has its own set of function pointers */
PyArray_ArrFuncs {
  ...
  PyArray_CopySwapNFunc *copyswapn;
  PyArray_CopySwapFunc *copyswap;
  ...
}
```

]

---

.left-column[
 ## PyArray_Type: your main ticket to follow code flow
]

.right-column[
.up30[
PyArrayType is an extension type (singleton) which defines the array behavior

```c
// in src/multiarray/arrayobject.c
// code simplified for presentation
PyTypeObject PyArray_Type = {
   ...
   array_repr, /* __repr__ */
   &array_as_number, /* number protocol */
   &array_as_sequence, /* sequence protocol */
   &array_as_mapping, /* mapping protocol */
   ...
   array_str, /* __str__ */
   &array_as_buffer, /* buffer protocol */
   ...
   array_iter, /* iter protocol */
   ...
   array_methods, /* array methods */
   ...
}
```

Critical to understand code flow of an operation, e.g.:

```
a = np.random.randn(100)
# How to find below op entry point in the C code ?
b = a + a
```

Addition is part of the number protocol -> look into array_as_number array.
]
]

---
layout:false

.left-column[
  ## Example
]

.right-column[
First, let`s compile numpy in debug mode:

```
# Create a virtualenv with debug python
virtualenv debug-env -p python2.7-dbg
source debug-env/bin/activate
# install bentomaker in that venv
cd $bento_sources && python setup.py install
# Build numpy in debug mode
CFLAGS="-O0 -g" LDFLAGS="-g" bentomaker build -i
```

We want to look into the snippet below to 'catch' the main entry point for '+'

```
# test_add.py
import numpy as np
a = np.array([1, 2, 3, 4])
b = a + a
```

In gdb

```
gdb python
> break array_add
> run test_add.py
> p ((int*)PyArray_DATA(m1))[0]
```

]

---
layout:false

.left-column[
.down50[
.down50[
.down50[
.rot90[
  ## PyArrayDescr_Type
]]]]]

.right-column[
PyArrayDescr_Type is an extension type (singleton) which defines the *dtype* class

```
/* in src/multiarray/descriptor.c */
/* code simplified for presentation */
PyTypeObject PyArrayDescr_Type = {
    ...
    "numpy.dtype",
    ...
    /* sequence protocol 
       used e.g. in structured dtype */
    &descr_as_sequence,
    /* mapping protocol 
       used e.g. in structured dtype */
    &descr_as_mapping,
    ...
    arraydescr_methods, /* methods */
    arraydescr_members, /* members */
    arraydescr_getsets, /* getset (properties) */
}
```

Less critical to understand Python <-> C layer.
]

<!--
---
template:inverse
background-image:url(pictures/confusing_2.jpg)

## What if you can`t find your way ?

---

.left-column[
  ## Poor man's callgraphs
]

.right-column[
Sometimes, you have no clue where to start (or just lazy):

- you can use dtrace to get calltrace at runtime (dapptrace in dtrace toolkit)

- poor man's replacement with Linux's perf tool:

```
# included in the USB key, also avail at 
# http://bit.ly/10OI2UD
import numpy as np
from minilib import under_perf

a = np.random.randn(10)

while True
    with under_perf():
        a + a
```

```bash
$ python test.py
$ Ctrl+c
# you should now get a perf.data file
```
]

---

.left-column[
  ## Poor man's callgraphs (Cont.)
]

.right-column[
To look into the data:

```
$ perf -G -g --stdio
```

This will produce something like:

```
# condensed version
3.96%   python  libm-2.17.so
         |
         --- __libc_start_main
             main
    	 ...
             PyNumber_Add
             binary_op1
             array_add
             PyArray_GenericBinaryFunction
             _PyObject_CallFunction_SizeT
             call_function_tail
             PyObject_Call
             ufunc_generic_call
             PyUFunc_GenericFunction
	     ...
```

`
Caveat: sampling based (hence while True hack for short-lived functions).
Better support for dynamic probes in user space on the way.
`
]

---

.left-column[
  ## Practice
]

.right-column[
Try to fix https://github.com/numpy/numpy/issues/2592

> numpy.fromiter with a count argument hides any exception that might be raised
> by the iterator as: ValueError: iterator too short

```
# test_fromiter_bug.py
import numpy as np
def load_data(n):
    for e in xrange(n):
        if e == 42:
            raise Exception('42 is really a bad value')
        yield e

a = np.fromiter(load_data(50), dtype=int, count=50)
```

Questions:

- find out where C is called
- can you understand the bug ?
- fix it !
]

---
template: inverse

# 4. Universal Functions

---

.left-column[
  ## API
]

.right-column[
.up30[
``numpy/core/include/numpy/ufuncobject.h``

- Vectorized function that takes a fixed number of scalar inputs and produces a
fixed number of scalar outputs.

- Supports array broadcasting, type casting, and other standard features

Pure Python ufunc:

```python
def range_sum(a, b):
    return np.arange(a, b).sum()

rs = np.frompyfunc(range_sum, 2, 1)

x = np.array([[1, 2, 3, 4]])

>>> rs(x, x + 1)
array([[1, 2, 3, 4]], dtype=object)

>>> rs(x, x.T)
array([[0, 0, 0, 0],
       [1, 0, 0, 0],
       [3, 2, 0, 0],
       [6, 5, 3, 0]], dtype=object)
```

- Note that broadcasting is supported
- Unfortunately, the output is always an object array.  Slow, because
  we're wrapping a Python call.
]
]

---

.left-column[
  ## API: Calling
]

.right-column[

Keywords:

- ``out`` : Output array, useful for in-place computation.
- ``where`` : Ufunc only calculated where ``broadcast(mask, inputs) == True``.
  Careful! Undefined where no elements are encountered.
- ``casting`` : Casting behavior (more later).
- ``order`` : Calculation iteration order and memory layout of output array.
- ``dtype`` : Output *and calculation* dtype.  Often important for
  accumulators.
- ``sig`` : Data-type or signature string; indicates which underlying 1-D loop
  is executed (typically the loops are found automatically).  See ``types``
  attribute.  Example: ``ff->f``.
- ``extobj`` : Specify ufunc buffer size, error mode integer, error call-back
  function.

]

---

.left-column[
  ## Ufunc output type
]

.right-column[

- Determined by input class with highest `__array_priority__`.

```
class StrongArray(np.ndarray):
    __array_priority__ = 10

class WeakArray(np.ndarray):
    __array_priority__ = 1

>>> s = StrongArray([2, 2]); w = WeakArray([2, 2])

>>> type(s + w).__name__
StrongArray
```
]

---

.left-column[
  ## Ufunc output type (continued)
]

.right-column[
.up20[
Otherwise, by ``output`` parameter.  Output class may have following methods:

- ``__array_prepare__`` :
  Called before ufunc. Knows some context about the ufunc, and may be used to
  add, e.g., meta-data.  Output then passed to ufunc.

- ``__array_wrap__`` : Called after execution of ufunc.

```python
In [159]: class MyArray(np.ndarray):
     ...:     def __array_prepare__(self,
                     array, (ufunc, inputs, domain)):
     ...:         print 'Array:', array
     ...:         print 'Ufunc:', ufunc
     ...:         print 'Inputs:', inputs
     ...:         print 'Domain:', domain
     ...:         return array
     ...:
     ...: m = MyArray((1, 2))

In [160]: np.add([1, 2], [3, 4], out=m)
Array: [[ 6.93023165e-310 1.33936849e-316]]
Ufunc: <ufunc 'add'>
Inputs: ([1, 2], [3, 4], MyArray([[ 6.93023165e-310,
 1.33936849e-316]]))
Domain: 0

Out[160]: MyArray([[ 4.,  6.]])
```

<!-- Internally, buffers are used for misaligned data, swapped data, and data
that has to be converted from one data type to another. The size of internal
buffers is settable on a per-thread basis. There can be up to buffers of the
specified size created to handle the data from all the inputs and outputs of a
ufunc. The default size of a buffer is 10,000 elements. Whenever buffer-based
calculation would be needed, but all input arrays are smaller than the buffer
size, those misbehaved or incorrectly-typed arrays will be copied before the
calculation proceeds. Adjusting the size of the buffer may therefore alter the
speed at which ufunc calculations of various sorts are completed. A simple
interface for setting this variable is accessible using the function

setbufsize(size)        Set the size of the buffer used in ufuncs.
-->
]]

---

.left-column[
  ## Type handling
]

.right-column[

``types`` attribute of ufunc, e.g. ``np.add.types``.

Common types:

```
 * byte -> b
 * short -> h
 * intc -> i
 * double -> d
 * single -> f
 * longfloat -> g
 * complex double -> D
```

See also ``np.sctypeDict``.

If no suitable ufunc loop exists, try to find one to which can be cast safely
(``np.can_cast``).  Can specify casting policy to ufunc via ``casting`` keyword
(``no``, ``equiv``, ``same_kind``, or ``unsafe``--default is the safe 
``same_kind``).  Linear search through available functions--use first match.
]

---

.left-column[
  ## Other ufunc ops
]

.right-column[
.up30[

- ``ufunc.reduce(a[, axis, dtype, out, keepdims])`` 
 Reduces a‘s dimension by one, by applying ufunc along one axis.

- ``ufunc.accumulate(array[, axis, dtype, out])`` 
 Accumulate the result of applying the operator to all elements.

- ``ufunc.reduceat(a, indices[, axis, dtype, out])`` 
 Performs a (local) reduce with specified slices over a single axis.

```python
  >>> x = np.arange(12).reshape((3, 4))
  array([[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]])

>>> np.add.reduceat(x, [0, 1, 3], axis=1)
  array([[ 0,  3,  3],
         [ 4, 11,  7],
         [ 8, 19, 11]])
  ```

- ``ufunc.outer(A, B)`` 
 Apply the ufunc op to all pairs (a, b) with a in A and b in B.

]]

---

.left-column[
  ## Implementing a ufunc - the kernel
]

.right-column[
.up20[
```
static void square_d(char** args,
                     npy_intp* dimensions,
                     npy_intp* steps,
                     void *data)
{
    npy_intp n = dimensions[0]; /* input length */

char *input = args[0];
    char *output = args[1];

npy_intp in_step = steps[0]; /* input stride */
    npy_intp out_step = steps[1]; /* output stride */

double tmp;
    npy_intp i;

for (i = 0; i < n; i++) {
 tmp = *( (double *)input );
 tmp = tmp * tmp;

*((double *)output) = tmp;

input += in_step;
        output += out_step;
    }
}
```

*Credit:* Chris Jordan-Squire, who wrote the numpy user guide entry.
]]

---

.left-column[
  ## Implementing a ufunc - slotting into NumPy
]

.right-column[
``PyUFunc_FromFuncAndData`` - return a universal function based on a
kernel + meta-data
```
// Set up the numpy structures
import_array();
import_umath();

PyUFuncGenericFunction funcs[1] = {&square_d};
void *data[1] = {NULL};
char types[2] = {NPY_DOUBLE, NPY_DOUBLE};

// The above structures are not copied
//    -- must remain visible --

PyObject* ufunc =
  PyUFunc_FromFuncAndData(funcs, data, types,
                          1, /* nr of data types */
                          1, 1, /* nr in & out */
                          PyUFunc_None, /* identity */
                          "square", /* function name */
                          "docstring", /* docstring */
                          0); /* unused */
```
Use the resulting ``ufunc`` inside your module.  See the
[full source](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/my_ufunc.c).
]

---
.left-column[
  ## Implementing a ufunc - provided inner loops
]

.right-column[
.up20[.up30[
```
double kernel(double d) {
    ...;
}

static void *data[] = {&square_d};
static char types[] = {NPY_DOUBLE, NPY_DOUBLE};
static PyUFuncGenericFunction funcs[sizeof(data)];

PyMODINIT_FUNC initmy_ufunc_noloop(void) {
...
    funcs[0] = &PyUFunc_d_d;
    ufunc = PyUFunc_FromFuncAndData(
                    funcs, data, types,
                    1,    /* nr of data types */
                    1, 1, /* nr in & out */
                    PyUFunc_None, /* identity */
                    "square", /* function name */
                    "docstring", /* docstring */
                    0); /* unused */
...
}
```
Examples ([full source](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/my_ufunc_noloop.c)):
```
PyUfunc_f_f(args, dimensions, steps, func)
  float elementwise_func(float in1)
PyUfunc_dd_d
  double elementwise_func(double in1, double in2)

PyUFunc_ff_f_As_dd_d
  double elementwise_func(double in1, double in2)
```
]]]

---

.left-column[
  ## Implementing a ufunc - multiple types
]

.right-column[.down30[.down30[
You probably need a templating engine.

You probably don't need NumPy's templating engine.

Here's [an example using Tempita](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/my_ufunc_types.c.tmpl)
(as well as the
[setup.py](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/setup.py)).

```python
{{
# *NOTE*: Order of this list is important.  First match
#         (even with casting) will be used.
ctypes = ('int8_t', 'int32_t', 'int64_t',
          'float', 'double')
dtypes = ('NPY_INT8', 'NPY_INT32', 'NPY_INT64',
          'NPY_FLOAT', 'NPY_DOUBLE')
}}

{{for type in ctypes}}
static void square_{{type}}(char** args,
             npy_intp* dimensions, npy_intp* steps,
             void *NPY_UNUSED(data))
...
```
]]]

---

.left-column[
  ## Hands-on
]

.right-column[

1. Construct a custom ufunc ``poly(x, a)`` to evaluate the polynomial
  ``(x**2 + 3) - a*x + 5``.
2. Do a timing comparison with a NumPy implementation using IPython's
  ``%timeit``.

]
---
template:inverse

# 5. Writing a custom dtype

---
.left-column[
## Synopsis
]
.right-column[
We will create the skeleton for quad precision dtype:

- IEEE-754 2008 float128 standard

- available in recent gcc with __float128

Creating a new dtype involves:

- wrapping the basic type as a python object (no numpy support), implement number protocol

- creating a new dtype on top of it

- add some basic ufunc for it
]

---

.left-column[
## A few words on quadruple precision
]

.right-column[
Specified in the recent IEEE-754 standard:

- few CPU support it natively

- 113 effective bits of precision (~36 decimal digits)

- gcc >= 4.6 + libquadmath add software support for it for Intel architectures

Example

```c
#include <quadmath.h>

int main()
{
	__float128 a = 1.0;
	char str[1024];
	int st;

a *= 2.3;

st = quadmath_snprintf(str, sizeof(str), "%Qf", q);
	if (st < 0) {
		printf("Error while converting to string\n");
	}
}
```
]
---
template:inverse
## 5.1 Wrapping a float128 into a python object

---
.left-column[
## A simple quad object
]
.right-column[
The smallest thing we need:

- create a qdouble object type
- allow to create one (from a string, a double, etc...)
- allow to print it

```
typedef __float128 qdouble;
/* boxed qdouble */
typedef struct {
        PyObject_HEAD
        qdouble obval;
} PyQuad;

static PyTypeObject PyQuad_Type = {
...
pyquad_repr,
...
pyquad_new,
}

PyMODINIT_FUNC init_quad(void) {
 ...
 if (PyType_Ready(&PyQuad_Type) < 0) {
 return;
 }
 Py_INCREF(&PyQuad_Type);
 PyModule_AddObject(m, "qdouble",
		(PyObject*)&PyQuad_Type);
}
```

]

---
.left-column[
## Exercise
]

.right-column[
See usb/material/examples/quad_dtype/step1:

- compile the extension module

- confirms you can create a quad object, and display it

- add new features such as hashing support, more types for ctor, or comparison
]
---

.left-column[
## Number protocol
]
.right-column[
Arithmetic operations are supported through the number protocol.

```c
PyObject*
pyquad_add(PyObject *a, PyObject *b)
{
   qdouble x, y;
   x = PyQuad_AsQuad(a);
   y = PyQuad_AsQuad(b);
   return PyQuad_FromQuad(x + y);
}

PyNumberMethods pyquad_as_number {
    pyquad_add,          /* nb_add */
    ...
}
```

This can then be registered into PyQuad_Type tb_as_number:

```c
PyTypeObject PyQuad_Type = {
    ..
    0,
    pyquad_repr,
    &pyquad_as_number,
    0,
}
```
]
---
.left-column[
## Exercise
]

.right-column[
See usb/material/examples/quad_dtype/step2:

- compile the extension module

- confirms you can add 2 quad objects

- add a few more operations
]

---
template:inverse
## 5.2 Wrapping a python quad into a dtype

---

.left-column[
## Registering a new dtype
]
.right-column[
The fundamental structure is npyquad_descr:

```c
PyArray_ArrFuncs NpyQuad_ArrFuncs;

typedef struct { char c; qdouble q; } align_test;

PyArray_Descr npyquad_descr = {
   PyObject_HEAD_INIT(0)
   &PyQuad_Type, /* typeobj */
   'f', /* kind (matters for coercion) */
   'q', /* type */
   '=', /* byteorder */
   NPY_NEEDS_PYAPI | NPY_USE_GETITEM | \
	NPY_USE_SETITEM, /* hasobject */
   0,                      /* type_num */
   sizeof(qdouble),       /* elsize */
   offsetof(align_test, q), /* alignment */
   ...
   &NpyQuad_ArrFuncs,  /* f */
}
```

Registered as followed:

```
npyquad_descr.ob_type = &PyArrayDescr_Type;
npy_registered_quadnum = \
	PyArray_RegisterDataType(&npyquad_descr);
```
]
---

.left-column[
## Registering a new dtype
]
.right-column[
NpyQuad_ArrFuncs is an array of function pointers doing most of the
implementation:

```
/* pick up one item from data buffer into a scalar */
static PyObject *
npyquad_getitem(char *data, PyArrayObject arr)
{
    qdouble q;

memcpy(&q, data, sizeof(q));
    return PyQuad_FromQuad(q);
}

init_quad_descriptor(PyObject *np_module)
{
    PyArray_InitArrFuncs(&NpyQuad_ArrFuncs);
    NpyQuad_ArrFuncs.getitem = \
	(PyArray_GetItemFunc*)npyquad_getitem;
    NpyQuad_ArrFuncs.setitem = \
	(PyArray_SetItemFunc*)npyquad_setitem;
    NpyQuad_ArrFuncs.copyswap = \
	(PyArray_CopySwapFunc*)npyquad_copyswap;

...
}
```
]

---
.left-column[
## Example of usage
]
.right-column[
In material/quad_dtype/step3, example of usage:

```python
import numpy as np

import _quad

a = np.array([1, 2, 3, 4], np.float)
b = a.astype(_quad.qdouble)

print b[0] + b[1]
```

But the following fails
```
a = np.array([1, 2, 3, 4], _quad.qdouble)
```

Can you fix it ?
]
---
.left-column[
## Adding ufunc
]
.right-column[
Conceptually, a ufunc needs at least the following:

- a core loop that is given a 1d buffer (but not contiguous)

- to be registered

```c
void
quad_ufunc_add(char** args, npy_intp* dimensions,
	       npy_intp* steps, void* data);

int
register_ufuncs(PyObject* np_module,
	        int npy_registered_quadnum)
{
    PyUFuncObject* ufunc = (PyUFuncObject*)
		PyObject_GetAttrString(np_module, "add");
    int args[3] = {npy_registered_quadnum,
		   npy_registered_quadnum,
		   npy_registered_quadnum};

...
 if (PyUFunc_RegisterLoopForType(ufunc,
		 npy_registered_quadnum,
		 quad_ufunc_add, args, 0) < 0) {
 return -1;
 }

return 0;
}
```
]

---
.left-column[
## Exercise
]
.right-column[
Try to add a few more ufuncs
]
---
template:inverse

background-image:url(pictures/flying_baby.jpg)