# Writing an Op to work on an `ndarray`

in C¶

This section walks through a non-trivial example Op that does something pretty
weird and unrealistic, that is hard to express with existing Ops.
(Technically, we could use `Scan`

to implement the Op we’re about to describe,
but we ignore that possibility for the sake of example.)

The following code works, but important error-checking has been omitted for clarity. For example, when you write C code that assumes memory is contiguous, you should check the strides and alignment.

```
import theano
from theano.graph.op import Op
from theano.graph.basic import Apply
class Fibby(Op):
"""
An arbitrarily generalized Fibbonacci sequence
"""
__props__ = ()
def make_node(self, x):
x_ = tt.as_tensor_variable(x)
assert x_.ndim == 1
return Apply(self,
inputs=[x_],
outputs=[x_.type()])
# using x_.type() is dangerous, it copies x's broadcasting behaviour
def perform(self, node, inputs, output_storage):
x, = inputs
y = output_storage[0][0] = x.copy()
for i in range(2, len(x)):
y[i] = y[i-1] * y[i-2] + x[i]
def c_code(self, node, name, inames, onames, sub):
x, = inames
y, = onames
fail = sub['fail']
return """
Py_XDECREF(%(y)s);
%(y)s = (PyArrayObject*)PyArray_FromArray(
%(x)s, 0, NPY_ARRAY_ENSURECOPY);
if (!%(y)s)
%(fail)s;
{//New scope needed to make compilation work
dtype_%(y)s * y = (dtype_%(y)s*)PyArray_DATA(%(y)s);
dtype_%(x)s * x = (dtype_%(x)s*)PyArray_DATA(%(x)s);
for (int i = 2; i < PyArray_DIMS(%(x)s)[0]; ++i)
y[i] = y[i-1]*y[i-2] + x[i];
}
""" % locals()
def c_code_cache_version(self):
return (1,)
fibby = Fibby()
```

In the first two lines of the C function, we make y point to a new array with
the correct size for the output. This is essentially simulating the line
`y = x.copy()`

.
The variables `%(x)s`

and `%(y)s`

are set up by the TensorType to be `PyArrayObject`

pointers.
TensorType also set up `dtype_%(x)s`

to be a typdef to the C type for `x`

.

```
Py_XDECREF(%(y)s);
%(y)s = (PyArrayObject*)PyArray_FromArray(
%(x)s, 0, NPY_ARRAY_ENSURECOPY);
```

The first line reduces the reference count of the data that y originally pointed to. The second line allocates the new data and makes y point to it.

In C code for a theano op, numpy arrays are represented as `PyArrayObject`

C
structs. This is part of the numpy/scipy C API documented at
http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html

TODO: NEEDS MORE EXPLANATION.

## Writing an Optimization¶

`fibby`

of a vector of zeros is another vector of zeros of
the same size.
Theano does not attempt to infer this from the code provided via `Fibby.perform`

or `Fibby.c_code`

.
However, we can write an optimization that makes use of this observation.
This sort of local substitution of special cases is common,
and there is a stage of optimization (specialization) devoted to such optimizations.
The following optimization (`fibby_of_zero`

) tests whether the input is
guaranteed to be all zero, and if so it returns the input itself as a replacement
for the old output.

TODO: talk about OPTIMIZATION STAGES

```
from theano.tensor.basic import get_scalar_constant_value
from theano.tensor.exceptions import NotScalarConstantError
# Remove any fibby(zeros(...))
@theano.tensor.basic_opt.register_specialize
@theano.graph.opt.local_optimizer([fibby])
def fibby_of_zero(fgraph, node):
if node.op == fibby:
x = node.inputs[0]
try:
if numpy.all(0 == get_scalar_constant_value(x)):
return [x]
except NotScalarConstantError:
pass
```

The `register_specialize`

decorator is what activates our optimization, and
tells Theano to use it in the specialization stage.
The `local_optimizer`

decorator builds a class instance around our global
function. The `[fibby]`

argument is a hint that our optimizer works on nodes
whose `.op`

attribute equals `fibby`

.
The function here (`fibby_of_zero`

) expects a `FunctionGraph`

and an `Apply`

instance as
arguments for the parameters `fgraph`

and `node`

. It tests using
function `get_scalar_constant_value`

, which determines if a
Variable (`x`

) is guaranteed to be a constant, and if so, what constant.

## Test the optimization¶

Here is some code to test that the optimization is applied only when needed.

```
import numpy
import theano.tensor as tt
from theano import function
# Test it does not apply when not needed
x = tt.dvector()
f = function([x], fibby(x))
# We call the function to make sure it runs.
# If you run in DebugMode, it will compare the C and Python outputs.
f(numpy.random.rand(5))
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, Fibby)
# Test that the optimization gets applied.
f_zero = function([], fibby(tt.zeros([5])))
# If you run in DebugMode, it will compare the output before
# and after the optimization.
f_zero()
# Check that the optimization removes the Fibby Op.
# For security, the Theano memory interface ensures that the output
# of the function is always memory not aliased to the input.
# That is why there is a DeepCopyOp op.
topo = f_zero.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp)
```