name: inverse layout: true class: center, middle, inverse --- background-image:url(pictures/diving_baby.jpg) # Diving into NumPy ### .white[[3 hours of breaking and fixing NumPy]] .footnote[.up20[.white[By David Cournapeau & Stéfan van der Walt]]] --- class: center, bottom, inverse background-image:url(pictures/tinguely1.jpg) # Learn how NumPy works internally ... --- background-image:url(pictures/Tinguely.jpg) class:right,bottom .up30[ # ... so that you can improve it ? ] --- # What will you learn ? --- ## Find your way into the codebase --- ## work efficiently on numpy --- ## Implement array method, ufunc, dtype --- ## and some tools on the way --- # Beware --- ## most features best implemented outside numpy --- # When is modifying NumPy appropriate ? --- ## to fix a numpy bug --- ## to improve some existing NumPy features --- ## to follow this tutorial --- layout:false .left-column[ ## Setting up vagrant ] .right-column[ Strongly advised to do the exercises: - install virtualbox (included in usb) - install vagrant 1.2.1 (included in usb) Creating the VM: ```bash mkdir numpy_tuto cd numpy_tuto cp /usbdisk/Vagrantfile . cp /usbdisk/numpy_tuto.box . vagrant box add numpy_tuto numpy_tuto.box vagrant up # You can ssh into it as followed vagrant ssh -- -A ``` ] --- --- template: inverse # 1. Concept overview ### Data-types, strides, broadcasting, indexing --- layout: false .left-column[ ## Data types and types **Scalar types**: ``np.float64``, ``np.int32``, etc. **Array scalars** (e.g. ``x[0]``) are somewhat special: each array scalar has its own type (``np.float32``, ``np.uint32``, etc.) -- but also has an attached dtype. Acts as a bridge between arrays and Python scalars. ] .right-column[ .up30[ ![Scalar hierarchy](pictures/sctype-hierarchy.png) - From this, we can build a dtype: ``` # Build a 32-bit, Big Endian, floating point dtype d = np.dtype(np.float32).newbyteorder('>') ``` - Dtypes describe memory layout of arrays; i.e, all arrays are of type ``ndarray``, with attached data-type. ]] --- .left-column[ ## Strides
] --- .left-column[ ## Strides - transpose
] --- .left-column[ ## Broadcasting ] .right-column[ Combine arrays of different shapes sensibly: ``` >>> x = np.zeros((3, 5)) >>> y = np.zeros(8) >>> (x[..., np.newaxis] + y).shape (3, 5, 8) ``` ![broadcasting two vectors to form a higher-dimensional array](pictures/array_3x5x8.png) ] --- .left-column[ ## Broadcasting compatibility ] .right-column[ When combining two arrays of different shapes, shapes are matched from right to left. Match when: - Dimensions are equal. - One dimension is either None or 1. ``` (5, 10) (5, 10) (5, 10, 1) (3, 5, 10) (6, 10) (10, 5) ---------- ------- ---------- (3, 5, 10) OK BAD (5, 10, 5) OK ``` ] --- .left-column[ ## Indexing with broadcasting ] .right-column[ ``` >>> x = np.array([[1, 2], [3, 4]]) array([[1, 2], [3, 4]]) >>> ix0 = np.array([0, 0, 1, 1]) >>> ix1 = np.array([[1], [0]]) array([[1], [0]]) >>> x[ix0, ix1] array([[2, 2, 4, 4], [1, 1, 3, 3]]) ``` ```python >>> np.broadcast_arrays(ix0, ix1) [array([[0, 0, 1, 1], [0, 0, 1, 1]]), array([[1, 1, 1, 1], [0, 0, 0, 0]])] ``` .tip[.red[TIP] Best to avoid ``:`` and ``...`` in broadcasting--output shape is sometimes hard to predict.] ] --- template: inverse # 2. Building NumPy --- layout:false .left-column[ ## Building with distutils ] .right-column[ Simple in-place build with default compiler ```bash $ python setup.py build_ext -i ``` Running the test suite on numpy.core ```bash $ nosetests numpy/core ``` ] --- template:inverse # That was not too hard ! --- template:inverse # Building with Bento --- layout:false .left-column[ ## Building with Bento ### Installation and configuration ] .right-column[ Setting up Bento requires a few steps: (It is recommended to do this in a virtualenv) ```bash git clone https://github.com/cournape/Bento \ ~/src/bento-git cd ~/src/bento-git python setup.py install --user git clone https://code.google.com/p/waf ~/src/waf-git # Tells bento where to look for waf # (waf has no setup.py) export WAFDIR=~/src/waf-git # In NumPy source tree, do an in-place build bentomaker build -i ``` Set up the Python path (only done once): ```bash export PYTHONPATH=$PYTHONPATH:~/src/numpy-git ``` ] --- .left-column[ ## Building with Bento ### Features for development ] .right-column[ Bento is nifty for NumPy development. - Parallel builds ```bash bentomaker build -i -j4 ``` - Reliable partial rebuilds: ```bash bentomaker build -i -j4 # Hack to bypass autoconfigure bentomaker --disable-autoconfigure build -i -j4 ``` - Easy compilation customization: ```bash CC=clang CFLAGS="-O0 -g -Wall -Wextra" \ bentomaker build -i -j4 ``` ] --- template:inverse # Practice --- layout:false .left-column[ ## A first exercise ] .right-column[ After setting up Bento, build NumPy with warning on ``` CFLAGS="-O0 -g -Wall -Wextra -W" bentomaker build -i ``` Lots of warnings of the type: ``` ../numpy/linalg/lapack_litemodule.c:863:58: warning: \ unused parameter 'args' [-Wunused-parameter] lapack_lite_xerbla(PyObject *NPY_UNUSED(self), \ PyObject *args) 1 warning generated. ``` NumPy has a special macro to decorate unused argument and give an error if they are used ``` void foo(PyObject *NPY_UNUSED(self), ...) ``` Try fixing one .c file so that there is no warning anymore ] --- template:inverse background-image:url(pictures/KyotoFushimiInariLarge.jpg) # 3 Finding your way --- template:inverse background-image:url(pictures/pollock.jpg) # 3.1 Code organization --- layout:false .left-column[ ## Main sub-packages ] .right-column[ .red[numpy/core]: the meat of NumPy (focus of the tutorial): - code for multiarray (src/multiarray), ufunc extensions (src/umath) - various support libraries (npymath, npysort) - public headers in include Other important parts: - .red[numpy/lib]: various tools on top of core - .red[numpy/fft], .red[numpy/linalg], .red[numpy/random] Other parts not on topic for this tutorial ] --- layout:false .left-column[ ##npymath ] .right-column[ .red[npymath] is a small C99 abstraction for cross platform math operations - static library linked into the extensions that need it - implement fundamental IEEE 754-related features (npy_isnan/npy_isinf/etc...) - half float implementation - C99 layer for functions, macros and constant definitions npy_* functions should be used throughout numpy C code (e.g. npy_exp, not exp) ``` #include
void foo() { double x = npy_exp(2.0); } ``` API "documented" in numpy/npy_math.h header ] --- layout:false .left-column[ ## Multiarray extension bird's eye view ] .right-column[Contain the implementation of the array, dtype and iterators objects: - .red[PyArrayObject] struct (numpy/ndarraytypes.h): array object (hidden in recent versions) - .red[PyArray_Descr] struct (ditto): dtype object - .red[PyArrayMultiIterObject] struct (ditto): iterator object (used in broadcasting) ``` /* in numpy/ndarraytypes.h */ struct PyArrayObject { char *data; /* Pointer to the raw data buffer */ int nd; /* number of dimensions */ npy_intp *dimensions; /* The size in each dimension */ npy_intp *strides; /* strides array */ PyObject *base; PyArray_Descr *descr; /* Pointer to type structure */ int flags; PyObject *weakreflist; } ``` One numpy array -> one PyArrayObject instance ] --- layout:false .left-column[ ## dtype implementation details ] .right-column[ PyArray_Descr contains the instance-spefic data of a dtype ```c /* in numpy/ndarraytypes.h */ struct PyArray_Descr { ... char kind; ... /* used for structured array */ struct _arr_descr *subarray; PyObject *fields; PyObject *names; /* a table of functions specific for each */ PyArray_ArrFuncs *f; /* Metadata about this dtype */ PyObject *metadata; NpyAuxData *c_metadata; } ``` One dtype object -> one PyArray_Descr instance. ```c /* each dtype has its own set of function pointers */ PyArray_ArrFuncs { ... PyArray_CopySwapNFunc *copyswapn; PyArray_CopySwapFunc *copyswap; ... } ``` ] --- .left-column[ ## PyArray_Type: your main ticket to follow code flow ] .right-column[ .up30[ PyArrayType is an extension type (singleton) which defines the array behavior ```c // in src/multiarray/arrayobject.c // code simplified for presentation PyTypeObject PyArray_Type = { ... array_repr, /* __repr__ */ &array_as_number, /* number protocol */ &array_as_sequence, /* sequence protocol */ &array_as_mapping, /* mapping protocol */ ... array_str, /* __str__ */ &array_as_buffer, /* buffer protocol */ ... array_iter, /* iter protocol */ ... array_methods, /* array methods */ ... } ``` Critical to understand code flow of an operation, e.g.: ``` a = np.random.randn(100) # How to find below op entry point in the C code ? b = a + a ``` Addition is part of the number protocol -> look into array_as_number array. ] ] --- layout:false .left-column[ ## Example ] .right-column[ First, let`s compile numpy in debug mode: ``` # Create a virtualenv with debug python virtualenv debug-env -p python2.7-dbg source debug-env/bin/activate # install bentomaker in that venv cd $bento_sources && python setup.py install # Build numpy in debug mode CFLAGS="-O0 -g" LDFLAGS="-g" bentomaker build -i ``` We want to look into the snippet below to 'catch' the main entry point for '+' ``` # test_add.py import numpy as np a = np.array([1, 2, 3, 4]) b = a + a ``` In gdb ``` gdb python > break array_add > run test_add.py > p ((int*)PyArray_DATA(m1))[0] ``` ] --- layout:false .left-column[ .down50[ .down50[ .down50[ .rot90[ ## PyArrayDescr_Type ]]]]] .right-column[ PyArrayDescr_Type is an extension type (singleton) which defines the *dtype* class ``` /* in src/multiarray/descriptor.c */ /* code simplified for presentation */ PyTypeObject PyArrayDescr_Type = { ... "numpy.dtype", ... /* sequence protocol used e.g. in structured dtype */ &descr_as_sequence, /* mapping protocol used e.g. in structured dtype */ &descr_as_mapping, ... arraydescr_methods, /* methods */ arraydescr_members, /* members */ arraydescr_getsets, /* getset (properties) */ } ``` Less critical to understand Python <-> C layer. ] --- template: inverse # 4. Universal Functions --- .left-column[ ## API ] .right-column[ .up30[ ``numpy/core/include/numpy/ufuncobject.h`` - Vectorized function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs. - Supports array broadcasting, type casting, and other standard features Pure Python ufunc: ```python def range_sum(a, b): return np.arange(a, b).sum() rs = np.frompyfunc(range_sum, 2, 1) x = np.array([[1, 2, 3, 4]]) >>> rs(x, x + 1) array([[1, 2, 3, 4]], dtype=object) >>> rs(x, x.T) array([[0, 0, 0, 0], [1, 0, 0, 0], [3, 2, 0, 0], [6, 5, 3, 0]], dtype=object) ``` - Note that broadcasting is supported - Unfortunately, the output is always an object array. Slow, because we're wrapping a Python call. ] ] --- .left-column[ ## API: Calling ] .right-column[ Keywords: - ``out`` : Output array, useful for in-place computation. - ``where`` : Ufunc only calculated where ``broadcast(mask, inputs) == True``. Careful! Undefined where no elements are encountered. - ``casting`` : Casting behavior (more later). - ``order`` : Calculation iteration order and memory layout of output array. - ``dtype`` : Output *and calculation* dtype. Often important for accumulators. - ``sig`` : Data-type or signature string; indicates which underlying 1-D loop is executed (typically the loops are found automatically). See ``types`` attribute. Example: ``ff->f``. - ``extobj`` : Specify ufunc buffer size, error mode integer, error call-back function. ] --- .left-column[ ## Ufunc output type ] .right-column[ - Determined by input class with highest `__array_priority__`. ``` class StrongArray(np.ndarray): __array_priority__ = 10 class WeakArray(np.ndarray): __array_priority__ = 1 >>> s = StrongArray([2, 2]); w = WeakArray([2, 2]) >>> type(s + w).__name__ StrongArray ``` ] --- .left-column[ ## Ufunc output type (continued) ] .right-column[ .up20[ Otherwise, by ``output`` parameter. Output class may have following methods: - ``__array_prepare__`` : Called before ufunc. Knows some context about the ufunc, and may be used to add, e.g., meta-data. Output then passed to ufunc. - ``__array_wrap__`` : Called after execution of ufunc. ```python In [159]: class MyArray(np.ndarray): ...: def __array_prepare__(self, array, (ufunc, inputs, domain)): ...: print 'Array:', array ...: print 'Ufunc:', ufunc ...: print 'Inputs:', inputs ...: print 'Domain:', domain ...: return array ...: ...: m = MyArray((1, 2)) In [160]: np.add([1, 2], [3, 4], out=m) Array: [[ 6.93023165e-310 1.33936849e-316]] Ufunc:
Inputs: ([1, 2], [3, 4], MyArray([[ 6.93023165e-310, 1.33936849e-316]])) Domain: 0 Out[160]: MyArray([[ 4., 6.]]) ``` ]] --- .left-column[ ## Type handling ] .right-column[ ``types`` attribute of ufunc, e.g. ``np.add.types``. Common types: ``` * byte -> b * short -> h * intc -> i * double -> d * single -> f * longfloat -> g * complex double -> D ``` See also ``np.sctypeDict``. If no suitable ufunc loop exists, try to find one to which can be cast safely (``np.can_cast``). Can specify casting policy to ufunc via ``casting`` keyword (``no``, ``equiv``, ``same_kind``, or ``unsafe``--default is the safe ``same_kind``). Linear search through available functions--use first match. ] --- .left-column[ ## Other ufunc ops ] .right-column[ .up30[ - ``ufunc.reduce(a[, axis, dtype, out, keepdims])``
Reduces a‘s dimension by one, by applying ufunc along one axis. - ``ufunc.accumulate(array[, axis, dtype, out])``
Accumulate the result of applying the operator to all elements. - ``ufunc.reduceat(a, indices[, axis, dtype, out])``
Performs a (local) reduce with specified slices over a single axis. ```python >>> x = np.arange(12).reshape((3, 4)) array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> np.add.reduceat(x, [0, 1, 3], axis=1) array([[ 0, 3, 3], [ 4, 11, 7], [ 8, 19, 11]]) ``` - ``ufunc.outer(A, B)``
Apply the ufunc op to all pairs (a, b) with a in A and b in B. ]] --- .left-column[ ## Implementing a ufunc - the kernel ] .right-column[ .up20[ ``` static void square_d(char** args, npy_intp* dimensions, npy_intp* steps, void *data) { npy_intp n = dimensions[0]; /* input length */ char *input = args[0]; char *output = args[1]; npy_intp in_step = steps[0]; /* input stride */ npy_intp out_step = steps[1]; /* output stride */ double tmp; npy_intp i; for (i = 0; i < n; i++) { tmp = *( (double *)input ); tmp = tmp * tmp; *((double *)output) = tmp; input += in_step; output += out_step; } } ``` *Credit:* Chris Jordan-Squire, who wrote the numpy user guide entry. ]] --- .left-column[ ## Implementing a ufunc - slotting into NumPy ] .right-column[ ``PyUFunc_FromFuncAndData`` - return a universal function based on a kernel + meta-data ``` // Set up the numpy structures import_array(); import_umath(); PyUFuncGenericFunction funcs[1] = {&square_d}; void *data[1] = {NULL}; char types[2] = {NPY_DOUBLE, NPY_DOUBLE}; // The above structures are not copied // -- must remain visible -- PyObject* ufunc = PyUFunc_FromFuncAndData(funcs, data, types, 1, /* nr of data types */ 1, 1, /* nr in & out */ PyUFunc_None, /* identity */ "square", /* function name */ "docstring", /* docstring */ 0); /* unused */ ``` Use the resulting ``ufunc`` inside your module. See the [full source](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/my_ufunc.c). ] --- .left-column[ ## Implementing a ufunc - provided inner loops ] .right-column[ .up20[.up30[ ``` double kernel(double d) { ...; } static void *data[] = {&square_d}; static char types[] = {NPY_DOUBLE, NPY_DOUBLE}; static PyUFuncGenericFunction funcs[sizeof(data)]; PyMODINIT_FUNC initmy_ufunc_noloop(void) { ... funcs[0] = &PyUFunc_d_d; ufunc = PyUFunc_FromFuncAndData( funcs, data, types, 1, /* nr of data types */ 1, 1, /* nr in & out */ PyUFunc_None, /* identity */ "square", /* function name */ "docstring", /* docstring */ 0); /* unused */ ... } ``` Examples ([full source](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/my_ufunc_noloop.c)): ``` PyUfunc_f_f(args, dimensions, steps, func) float elementwise_func(float in1) PyUfunc_dd_d double elementwise_func(double in1, double in2) PyUFunc_ff_f_As_dd_d double elementwise_func(double in1, double in2) ``` ]]] --- .left-column[ ## Implementing a ufunc - multiple types ] .right-column[.down30[.down30[ You probably need a templating engine. You probably don't need NumPy's templating engine.
Here's [an example using Tempita](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/my_ufunc_types.c.tmpl) (as well as the [setup.py](https://github.com/enthought/davidc-scipy-2013/blob/master/examples/ufuncs/setup.py)). ```python {{ # *NOTE*: Order of this list is important. First match # (even with casting) will be used. ctypes = ('int8_t', 'int32_t', 'int64_t', 'float', 'double') dtypes = ('NPY_INT8', 'NPY_INT32', 'NPY_INT64', 'NPY_FLOAT', 'NPY_DOUBLE') }} {{for type in ctypes}} static void square_{{type}}(char** args, npy_intp* dimensions, npy_intp* steps, void *NPY_UNUSED(data)) ... ``` ]]] --- .left-column[ ## Hands-on ] .right-column[ 1. Construct a custom ufunc ``poly(x, a)`` to evaluate the polynomial ``(x**2 + 3) - a*x + 5``. 2. Do a timing comparison with a NumPy implementation using IPython's ``%timeit``. ] --- template:inverse # 5. Writing a custom dtype --- .left-column[ ## Synopsis ] .right-column[ We will create the skeleton for quad precision dtype: - IEEE-754 2008 float128 standard - available in recent gcc with __float128 Creating a new dtype involves: - wrapping the basic type as a python object (no numpy support), implement number protocol - creating a new dtype on top of it - add some basic ufunc for it ] --- .left-column[ ## A few words on quadruple precision ] .right-column[ Specified in the recent IEEE-754 standard: - few CPU support it natively - 113 effective bits of precision (~36 decimal digits) - gcc >= 4.6 + libquadmath add software support for it for Intel architectures Example ```c #include
int main() { __float128 a = 1.0; char str[1024]; int st; a *= 2.3; st = quadmath_snprintf(str, sizeof(str), "%Qf", q); if (st < 0) { printf("Error while converting to string\n"); } } ``` ] --- template:inverse ## 5.1 Wrapping a float128 into a python object --- .left-column[ ## A simple quad object ] .right-column[ The smallest thing we need: - create a qdouble object type - allow to create one (from a string, a double, etc...) - allow to print it ``` typedef __float128 qdouble; /* boxed qdouble */ typedef struct { PyObject_HEAD qdouble obval; } PyQuad; static PyTypeObject PyQuad_Type = { ... pyquad_repr, ... pyquad_new, } PyMODINIT_FUNC init_quad(void) { ... if (PyType_Ready(&PyQuad_Type) < 0) { return; } Py_INCREF(&PyQuad_Type); PyModule_AddObject(m, "qdouble", (PyObject*)&PyQuad_Type); } ``` ] --- .left-column[ ## Exercise ] .right-column[ See usb/material/examples/quad_dtype/step1: - compile the extension module - confirms you can create a quad object, and display it - add new features such as hashing support, more types for ctor, or comparison ] --- .left-column[ ## Number protocol ] .right-column[ Arithmetic operations are supported through the number protocol. ```c PyObject* pyquad_add(PyObject *a, PyObject *b) { qdouble x, y; x = PyQuad_AsQuad(a); y = PyQuad_AsQuad(b); return PyQuad_FromQuad(x + y); } PyNumberMethods pyquad_as_number { pyquad_add, /* nb_add */ ... } ``` This can then be registered into PyQuad_Type tb_as_number: ```c PyTypeObject PyQuad_Type = { .. 0, pyquad_repr, &pyquad_as_number, 0, } ``` ] --- .left-column[ ## Exercise ] .right-column[ See usb/material/examples/quad_dtype/step2: - compile the extension module - confirms you can add 2 quad objects - add a few more operations ] --- template:inverse ## 5.2 Wrapping a python quad into a dtype --- .left-column[ ## Registering a new dtype ] .right-column[ The fundamental structure is npyquad_descr: ```c PyArray_ArrFuncs NpyQuad_ArrFuncs; typedef struct { char c; qdouble q; } align_test; PyArray_Descr npyquad_descr = { PyObject_HEAD_INIT(0) &PyQuad_Type, /* typeobj */ 'f', /* kind (matters for coercion) */ 'q', /* type */ '=', /* byteorder */ NPY_NEEDS_PYAPI | NPY_USE_GETITEM | \ NPY_USE_SETITEM, /* hasobject */ 0, /* type_num */ sizeof(qdouble), /* elsize */ offsetof(align_test, q), /* alignment */ ... &NpyQuad_ArrFuncs, /* f */ } ``` Registered as followed: ``` npyquad_descr.ob_type = &PyArrayDescr_Type; npy_registered_quadnum = \ PyArray_RegisterDataType(&npyquad_descr); ``` ] --- .left-column[ ## Registering a new dtype ] .right-column[ NpyQuad_ArrFuncs is an array of function pointers doing most of the implementation: ``` /* pick up one item from data buffer into a scalar */ static PyObject * npyquad_getitem(char *data, PyArrayObject arr) { qdouble q; memcpy(&q, data, sizeof(q)); return PyQuad_FromQuad(q); } init_quad_descriptor(PyObject *np_module) { PyArray_InitArrFuncs(&NpyQuad_ArrFuncs); NpyQuad_ArrFuncs.getitem = \ (PyArray_GetItemFunc*)npyquad_getitem; NpyQuad_ArrFuncs.setitem = \ (PyArray_SetItemFunc*)npyquad_setitem; NpyQuad_ArrFuncs.copyswap = \ (PyArray_CopySwapFunc*)npyquad_copyswap; ... } ``` ] --- .left-column[ ## Example of usage ] .right-column[ In material/quad_dtype/step3, example of usage: ```python import numpy as np import _quad a = np.array([1, 2, 3, 4], np.float) b = a.astype(_quad.qdouble) print b[0] + b[1] ``` But the following fails ``` a = np.array([1, 2, 3, 4], _quad.qdouble) ``` Can you fix it ? ] --- .left-column[ ## Adding ufunc ] .right-column[ Conceptually, a ufunc needs at least the following: - a core loop that is given a 1d buffer (but not contiguous) - to be registered ```c void quad_ufunc_add(char** args, npy_intp* dimensions, npy_intp* steps, void* data); int register_ufuncs(PyObject* np_module, int npy_registered_quadnum) { PyUFuncObject* ufunc = (PyUFuncObject*) PyObject_GetAttrString(np_module, "add"); int args[3] = {npy_registered_quadnum, npy_registered_quadnum, npy_registered_quadnum}; ... if (PyUFunc_RegisterLoopForType(ufunc, npy_registered_quadnum, quad_ufunc_add, args, 0) < 0) { return -1; } return 0; } ``` ] --- .left-column[ ## Exercise ] .right-column[ Try to add a few more ufuncs ] --- template:inverse background-image:url(pictures/flying_baby.jpg)