Mobile Tech Talk

Why Python for Machine Learning?

Python is a deceptively simple but very elegant programming language. It is one of the go-to languages in the domain of numeric computing, scientific computing, data science and machine learning. Data wrangling libraries like Pandas, numeric computing libraries like Numpy and scientific computing libraries like Scipy are all written in Python. Also, machine learning libraries like Scikit-learn, Tensorflow, Keras etc., are in Python. Python data visualization libraries like Matplotlib, Seaborn, Bokeh, Plotly etc are also quite famous. All these libraries are open sourced.

Why is Python famous in these domains?

One of reasons for the elegance of Python is it is an orthogonal language. Programming language experts define orthogonality as “a property of language due to which a relatively small set of primitive constructs can be combined in a relatively small number of ways to build the control and data structures of the language”. This consistency in language allows one, after working with Python for a while, to start making informed, correct guesses about features that are new. Orthogonal features make the language very expressive and it evident in the succinct expressions used in libraries like Pandas and Numpy.

Python is one of the rare languages designed from the outset keeping in mind the beauty and elegance of programs that can be written in the language. Python has its own definition of beauty, implicit in the Zen of Python (aka PEP20):

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested. Sparse is better than dense. Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea — let’s do more of those!

This importance to beautiful programs by the language designer, makes the language a joy to work with.

Python supports operator overloading where one can change the meaning of an operator in Python depending upon the operands used. Even, function invocation(()), attribute access (.), and item access/slicing ([]) are also operators in Python. Thus, the language allows same operator to have different meaning according to the context. This feature is elegantly used in data wrangling and numeric computing libraries like Pandas and Numpy.

Consider a data frame object created using Pandas library.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), columns = list(‘ABCD’))

Now, we want to get a data frame where column A’s value is positive.

This is succinctly expressed in Pandas as df[df.A > 0]

Here ‘>’ operator is overloaded to produce a boolean index Series and indexing operator [ ] over data frame is overloaded to take boolean index series and produce filtered data frame. Many mathematical objects in Python numeric computing libraries make use of operator loading to provide for operations which look quite natural and make sense to the domain experts.

Python supports default arguments and also allows functions to be called using keyword arguments.

Consider the below function

def greet(name, msg = "Good morning!"):
"""
This function greets to
the person with the
provided message.

If message is not provided,
it defaults to “Good
morning!”
"""

print (“Hello”,name + ‘, ‘ + msg)

 

This function can be called using keyword arguments which are out of order from definition as

greet(msg = “How do you do?”,name = “Bruce”)

This language feature allows library designer to have methods with sensible defaults. For example, methods in the machine learning library Scikit have lot of parameters, allowing to tune the algorithm. Whenever an operation requires a user-defined parameter, an appropriate default value is defined by the library. The default value would cause the operation to be performed in a sensible way (giving a base- line solution for the task at hand).

Everything in Python, including Classes and functions, is an object. This allows to have higher order functions which take other methods and classes as arguments. It enables library designers to provide elegant composable APIs. For example, in Scikit-learn library, many machine learning tasks are expressible as sequences or combinations of transformations to data. Some learning algorithms are also naturally viewed as meta-algorithms parametrised on other algorithms. Such algorithms are implemented and composed from existing building blocks.

Generators and iterators provided by Python are excellent mechanisms to represent an infinite stream of data, usually encountered in numeric computing and data science. Infinite streams cannot be stored in memory and since generators produce only one item at a time, it can represent infinite stream of data. Generators can be used to pipeline a series of operations. Generators are extensively used in Scikit-learn to define machine learning algorithm pipeline. A distinguishing feature of the scikit-learn API is its ability to compose new estimators from several base estimators. Composition mechanisms can be used to combine typical machine learning workflows into a single object which is itself an estimator, and can be employed wherever usual estimators can be used.

Python, being an interpreted, dynamically-typed language, supports excellent REPL and other REPL like environments like I Python and Jupyter Notebook. This makes experimenting with ideas and data sets very easy.

Thus Python, an elegant language having a design philosophy emphasizing readability and ease of use, with great libraries , large community and great tools is gaining traction in many domains of programming, especially in data science.

H N Ramkumar
H N Ramkumar is a Technical Architect at Robosoft and has led the design & developments of many projects across Mac, iOS and Android.

Leave Your Comment

Your Comment*

Your Name*
Your Webpage