If you look at the programming languages benchmarks game, Python is one of the slowest commonly used programming languages out there. Typical programs written in pure Python average around 40 times slower than the equivalent program written in C or C++.

Despite the performance penalty, Python is still probably the most popular language choice out there for doing Data Analysis and Machine Learning. Most of the recent Deep Learning frameworks target Python for development: TensorFlow, Theano, and Keras all use Python. Torch originally was written for Lua, which is substantially faster than Python when using LuaJIT - but Torch failed to gain traction until switching to Python with the release of PyTorch.

The reason for this is that the performance penalty in writing programs in Python isn’t as large as the programming language benchmarks game would suggest: Most of the best Python Data libraries have their core routines written as native extensions.

This all means that to get the most out of these libraries, you need to treat Python as a Declarative Language - and push as much control flow as possible down to a native layer, and just let the Python program describe what needs done.

Declarative Programming Languages focus on on describing what should be computed - and avoid mentioning how that computation should be performed. In practice this means avoiding expressions of control flow: loops and conditional statements are removed and replaced with higher level constructs that describe the logic of what needs to be computed.

The usual example of a declarative programming language is SQL. It lets you define what data you want computed - and translates that efficiently onto the database schema. This lets you avoid having to specify details of how to execute the query, and instead lets the query optimizer figure out the best index and query plan on a case by case basis.

Python isn’t a pure Declarative Language - but the same flexibility that contributes to its sluggish speed can be be leveraged to create Domain Specific API’s that use the same principles. I thought it would be kind of interesting to look at a couple specific examples of how this plays out.

The design of NumPy has a couple neat declarative programming features, that lead to code that is not only cleaner and easier to understand - but also substantially faster.

A simple example of this might be to apply TF-IDF weighting to a sparse matrix. I originally needed to do this in order to add some basic nearest neighbour recommendation code as a baseline for my implicit recommendation library, and I thought it would provide a good example of what I’m talking about here.

In an imperative style, TF-IDF weighting on a sparse matrix can be written like:

def tfidf_imperative(m): # count up unique occurrences of each column of the sparse matrix X = coo_matrix(m) df = bincount(X.col) # calculate inverse-document-frequency = log(N/df) N = float(X.shape[0]) idf = zeros(X.shape[1]) for i in range(X.shape[1]): idf[i] = log(N / (1.0 + df[i])) # adjust data by TF-IDF weighting X.data = X.data.copy() for i in range(X.nnz): X.data[i] = sqrt(X.data[i]) * idf[X.col[i]] return X

There are 2 different for loops in that code - and each of which can be replaced by different NumPy language constructs.

The first for loop calculates the IDF itself. Numpy lets you do almost all operations on arrays of values as well as on single values
and its much faster to use the vectorized form: ```
idf = log(N / (1.0 +
bincount(X.col)))
```

.

This tells NumPy to loop over the array returned from the `bincount(X.col)`

function, and create a new array with the
IDF value transformed appropriately. In effect we’re telling NumPy what result we want and letting
it figure out how to calculate it best itself.

The final TF-IDF weighting can be done in a similar fashion using NumPy’s array
indexing feature. This feature lets
you use one array as an index into another array. Going `idf[X.col]`

looks up each column value from X in IDF and returns an
array with the IDF weight for that column.

Putting it all together leads to code like:

def tfidf_declarative(m): X = coo_matrix(m) # calculate IDF N = float(X.shape[0]) idf = log(N / (1 + bincount(X.col))) # apply TF-IDF adjustment X.data = sqrt(X.data) * idf[X.col] return X

Not only is the declarative version shorter and more readable, its also substantially faster. By removing the for loops, the iteration can happen
in vectorized C calls and makes this code run **75 times** faster than the imperative version
on my laptop. By avoiding writing any loops, we’ve declared the operations that need to happen
and let the NumPy translate that to control flow.

One frequently asked question is why Deep Learning frameworks like TensorFlow are written for Python instead of a faster language like C++.

The answer is that for the most part these frameworks are *written* in a language like C++, but
provide a Python API to make it convenient to call. The Python code only describes the
computations that need to be performed, all the real work happens in the library in either C++ or
CUDA calls on the GPU.

Take a look at this simple function that uses TensorFlow to do Linear Regression:

def linear_regression(train_X, train_Y, learn_rate=0.005): # define placeholders for input data X, Y = tf.placeholder("float"), tf.placeholder("float") # define the variables we're learning slope, intercept = tf.Variable(0.0), tf.Variable(0.0) # learn the slope/intercept on a ordinary least squares loss function loss_function = (Y - (X * slope + intercept)) ** 2 # find the parameters slope/intercept using a basic GD optimizer train = tf.train.GradientDescentOptimizer(learn_rate).minimize(loss_function) with tf.Session() as sess: tf.global_variables_initializer().run() # Train the model for x in range(100): sess.run(train, feed_dict={X: train_X, Y: train_Y}) return sess.run(slope), sess.run(intercept)

In terms of line count, most of the function is in defining the variables to optimize, the linear model that we’re trying to learn and the loss function and optimization method to learn that function. None of these calls do any work though - all they do is declare a computation graph of what should be done.

Basically everything here happens in the `sess.run`

call on the training function. This call takes the computation
graph defined in the `train`

variable
binds the placeholder variables X and Y to the training data and runs the graph to learn the
regression coefficients.

Python is sort of like glue - it works well for binding different libraries together, but if you try to build a large fast program out of it you end up with a sticky mess that’s difficult to quickly move through.

The reason it has been so successful with Data Processing and Machine Learning tasks is that many of the libraries have adopted API’s where you declare the operations you want to perform, and the library executes those declarations in an efficient manner in a lower level language. This leads to the best of both worlds, code that’s easy to write in Python that runs as fast as code written in C++.

Using an imperative style means that you spend too much time wading through the glue, but declaring what operations you want leads to code that’s efficient and clean.

The side effect of this is that in order to be a great Python programmer, you have to learn to program in a lower level language too. All of the most popular Python data libraries have native extensions: TensorFlow, scikit-learn, NumPy, Pandas, SciPy, spaCY etc all have significant portions of their code written in a native language. If you are comfortable just using these libraries its enough to be just a good Python programmer; however, if you want to be the type of programmer that can produce libraries like these you really should be learning something like C++ or Cython too.

Published on 28 February 2017

Enter your email address to get an email whenever I write a new post: