NAV Navbar
  • Python 101
  • Python 101

    joke

    Python is a “high-level programming language and its core design philosophy is all about code readability and a syntax which allows programmers to express concepts in a few lines of code” created by Guido van Rossum.

    For me, the first reason to learn Python was that it is, in fact, a beautiful programming language. It is really natural to code it and always express my thoughts.

    Another reason is that we can code in Python for multiple purposes like for: data science, web development, machine learning shine here. Quora, Pinterest and Spotify use Python for their backend web development for example Read more here.

    Let's get started!

    Who is this guide for?

    This guide is basics 101. We will focus on Python and NumPy basics. This guide was generated from the Machine Learning MeetUp held on 2nd December 2017.

    First Steps

    To make sure we don’t run into any Python versioning or other installation issues:

    Anaconda-Docker

    Installing Python on your local machine

    Google Colab

    Now you’re ready to follow the Basics 101 tutorial.


    Python Basics

    Python is a great programing language and with numpy, scipy and matplotlib, it becomes a powerful environment for machine learning experiments.

    This section is taken from Stanford's cs231n course.


    DataTypes

    Integers and floats work as you would expect from other languages:

    x = 3
    print(type(x)) # Prints "<class 'int'>"
    print(x)       # Prints "3"
    print(x + 1)   # Addition; prints "4"
    print(x - 1)   # Subtraction; prints "2"
    print(x * 2)   # Multiplication; prints "6"
    print(x ** 2)  # Exponentiation; prints "9"
    x += 1
    print(x)  # Prints "4"
    x *= 2
    print(x)  # Prints "8"
    y = 2.5
    print(type(y)) # Prints "<class 'float'>"
    print(y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"
    

    Numbers

    Like most languages, Python has a number of basic types including integers, floats, booleans, and strings. These data types behave in ways that are familiar from other programming languages.

    Open Google Colab, or your Python console and paste the code alongside. Try out different values for x.

    Python also has built-in types for complex numbers; you can find all of the details in the documentation.


    English words instead of symbols && ||

    t = True
    f = False
    print(type(t)) # Prints "<class 'bool'>"
    print(t and f) # Logical AND; prints "False"
    print(t or f)  # Logical OR; prints "True"
    print(not t)   # Logical NOT; prints "False"
    print(t != f)  # Logical XOR; prints "True"
    

    Boolean

    Python implements all of the usual operators for Boolean logic but uses English words rather than symbols (&&, ||, etc.).

    Operation Meaning
    x or y     if x is false, then y, else x
    x and y if x is false, then x, else y
    not x if x is false, then True, else False
    < strictly less than
    <= less than or equal to
    == equal to
    != not equal
    is object identity
    is not negated object identity

    Strings in Python

    hello = 'hello'    # String literals can use single quotes
    world = "world"    # or double quotes; it does not matter.
    print(hello)       # Prints "hello"
    print(len(hello))  # String length; prints "5"
    hw = hello + ' ' + world  # String concatenation
    print(hw)  # prints "hello world"
    hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
    print(hw12)  # prints "hello world 12"
    

    Useful methods

    s = "hello"
    print(s.capitalize())  # Capitalize a string; prints "Hello"
    print(s.upper())       # Convert a string to uppercase; prints "HELLO"
    print(s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello"
    print(s.center(7))     # Center a string, padding with spaces; prints " hello "
    print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another; # prints "he(ell)(ell)o"
    print('  world '.strip())  # Strip leading and trailing whitespace; prints "world"
    

    Strings

    Python has great support for strings. It is good to have great control over strings if you are looking to move into NLP in future.

    Method             True if
    str.isalnum()             String consists of only alphanumeric characters (no symbols)
    str.isalpha() String consists of only alphabetic characters (no symbols)
    str.islower() String’s alphabetic characters are all lower case
    str.isnumeric()             String consists of only numeric characters
    str.isspace() String consists of only whitespace characters
    str.istitle() String is in title case
    str.isupper() String’s alphabetic characters are all upper case

    Determining String Length

    The string method len() returns the number of characters in a string. This method is useful for when you need to enforce a minimum or maximum password lengths, for example, or to truncate larger strings to be within certain limits for use as abbreviations.

    Python also has some very useful methods on strings. You'd need them for parsing and other text conversions.

    You can find a list of all string methods in the documentation


    Python Containers - List

    xs = [3, 1, 2]    # Create a list
    print(xs, xs[2])  # Prints "[3, 1, 2] 2"
    print(xs[-1])     # Negative indices count from the end of the list; prints "2"
    xs[2] = 'foo'     # Lists can contain elements of different types
    print(xs)         # Prints "[3, 1, 'foo']"
    xs.append('bar')  # Add a new element to the end of the list
    print(xs)         # Prints "[3, 1, 'foo', 'bar']"
    x = xs.pop()      # Remove and return the last element of the list
    print(x, xs)      # Prints "bar [3, 1, 'foo']"
    

    Containers

    Python includes several build-in container types: lists, dictionaries, sets, and tuples.

    List

    A list is the Python equivalent of an array, but is resizeable and can contain elements of different types. The list is one of the simplest and most important data structures in Python. Lists are enclosed in square brackets [ ] and each item is separated by a comma. Lists are collections of items where each item in the list has an assigned index value. A list is mutable, meaning you can change its contents.


    Slicing

    nums = list(range(5))     # range is a built-in function that creates a list of integers
    print(nums)               # Prints "[0, 1, 2, 3, 4]"
    print(nums[2:4])          # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
    print(nums[2:])           # Get a slice from index 2 to the end; prints "[2, 3, 4]"
    print(nums[:2])           # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
    print(nums[:])            # Get a slice of the whole list; prints "[0, 1, 2, 3, 4]"
    print(nums[:-1])          # Slice indices can be negative; prints "[0, 1, 2, 3]"
    nums[2:4] = [8, 9]        # Assign a new sublist to a slice
    print(nums)               # Prints "[0, 1, 8, 9, 4]"
    

    Slicing

    In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing.

    A Python slice extracts elements, based on a start and stop. We take slices on many types in Python. We specify an optional first index, an optional last index, and an optional step.

    Example Meaning
    values[1:3]       Index 1 through index 3.
    values[2:-1] Index 2 through index one from last.
    values[:2] Start through index 2.
    values[2:] Index 2 through end.
    values[::2] Start through end, skipping ahead 2 places each time.

    Loops

    Loops

    animals = ['cat', 'dog', 'monkey']
    for animal in animals:
        print(animal)
    # Prints "cat", "dog", "monkey", each on its own line.
    

    Enumerate

    animals = ['cat', 'dog', 'monkey']
    for idx, animal in enumerate(animals):
        print('#%d: %s' % (idx + 1, animal))
    # Prints "#1: cat", "#2: dog", "#3: monkey", each on its own line
    

    You can loop over the elements of a list like on the code next here:

    for loops are traditionally used when you have a block of code which you want to repeat a fixed number of times.

    The Python for statement iterates over the members of a sequence in order, executing the block each time.

    Enumerate

    Python's enumerate function reduces the visual clutter by hiding the accounting for the indexes and encapsulating the iterable into another iterable (an enumerate object) that yields a two-item tuple of the index and the item that the original iterable would provide


    List Comprehension

    nums = [0, 1, 2, 3, 4]
    squares = []
    for x in nums:
        squares.append(x ** 2)
    print(squares)   # Prints [0, 1, 4, 9, 16]
    
    nums = [0, 1, 2, 3, 4]
    squares = [x ** 2 for x in nums]
    print(squares)   # Prints [0, 1, 4, 9, 16]
    
    # list comprehensions can also contain conditions
    
    nums = [0, 1, 2, 3, 4]
    even_squares = [x ** 2 for x in nums if x % 2 == 0]
    print(even_squares)  # Prints "[0, 4, 16]"
    

    List Comprehension

    When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers on your right.

    List comprehensions provide a concise way to create lists.

    It consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists.

    The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.

    You can make this code simpler using a list comprehension.


    Dictionaries

    d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
    print(d['cat'])       # Get an entry from a dictionary; prints "cute"
    print('cat' in d)     # Check if a dictionary has a given key; prints "True"
    d['fish'] = 'wet'     # Set an entry in a dictionary
    print(d['fish'])      # Prints "wet"
    # print(d['monkey'])  # KeyError: 'monkey' not a key of d
    print(d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
    print(d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"
    del d['fish']         # Remove an element from a dictionary
    print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"
    
    
    # It is easy to iterate over the keys in a dictionary
    
    d = {'person': 2, 'cat': 4, 'spider': 8}
    for animal in d:
        legs = d[animal]
        print('A %s has %d legs' % (animal, legs))
    # Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"
    
    

    Dictionary Comprehension

    # If you want access to keys and their corresponding values, use the items method:
    
    d = {'person': 2, 'cat': 4, 'spider': 8}
    for animal, legs in d.items():
        print('A %s has %d legs' % (animal, legs))
    # Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"
    
    # Dictionary comprehension
    
    nums = [0, 1, 2, 3, 4]
    even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
    print(even_num_to_square)  # Prints "{0: 0, 2: 4, 4: 16}"
    

    Dictionaries

    Dictionaries (or dict in Python) are a way of storing elements just like you would in a Python list. But, rather than accessing elements using its index, you assign a fixed key to it and access the element using the key.

    What you now deal with is a "key-value" pair, which is sometimes a more appropriate data structure for many problems instead of a simple list.

    You will often have to deal with dictionaries when doing data science, which makes dictionary comprehension a skill that you will want to master.

    Important to remember is that a key has to be unique in a dictionary, no duplicates are allowed. However, in case of duplicate keys rather than giving an error, Python will take the last instance of the key to be valid and simply ignore the first key-value pair.

    Dictionary Comprehension

    Dictionary comprehension is a method for transforming one dictionary into another dictionary. During this transformation, items within the original dictionary can be conditionally included in the new dictionary and each item can be transformed as needed.

    A good list comprehension can make your code more expressive and thus, easier to read. The key to creating comprehensions is to not let them get so complex that your head spins when you try to decipher what they are actually doing. Keeping the idea of "easy to read" alive.

    The way to do dictionary comprehension in Python is to be able to access the key objects and the value objects of a dictionary.

    You can find all you need to know about dictionaries in the documentation.


    Sets

    animals = {'cat', 'dog'}
    print('cat' in animals)   # Check if an element is in a set; prints "True"
    print('fish' in animals)  # prints "False"
    animals.add('fish')       # Add an element to a set
    print('fish' in animals)  # Prints "True"
    print(len(animals))       # Number of elements in a set; prints "3"
    animals.add('cat')        # Adding an element that is already in the set does nothing
    print(len(animals))       # Prints "3"
    animals.remove('cat')     # Remove an element from a set
    print(len(animals))       # Prints "2"
    

    Sets

    The sets module provides classes for constructing and manipulating unordered collections of unique elements.

    Common uses include membership testing, removing duplicates from a sequence, and computing standard math operations on sets such as intersection, union, difference, and symmetric difference.

    Like other collections sets support x in set, len(set), and for x in set. Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.


    Tuples

    d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
    t = (5, 6)        # Create a tuple
    print(type(t))    # Prints "<class 'tuple'>"
    print(d[t])       # Prints "5"
    print(d[(1, 2)])  # Prints "1"
    
    

    Tuples

    A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.

    Creating a tuple is as simple as putting different comma-separated values. Optionally you can put these comma-separated values between parentheses also.


    Functions

    Functions

    def sign(x):
        if x > 0:
            return 'positive'
        elif x < 0:
            return 'negative'
        else:
            return 'zero'
    
    for x in [-1, 0, 1]:
        print(sign(x))
    # Prints "negative", "zero", "positive"
    

    Function Arguments

    def hello(name, loud=False):
        if loud:
            print('HELLO, %s!' % name.upper())
        else:
            print('Hello, %s' % name)
    
    hello('Bob') # Prints "Hello, Bob"
    hello('Fred', loud=True)  # Prints "HELLO, FRED!"
    

    A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing.

    As you already know, Python gives you many built-in functions like print(), etc. but you can also create your own functions. These functions are called user-defined functions.

    Defining a function only gives it a name, specifies the parameters that are to be included in the function and structures the blocks of code.

    Once the basic structure of a function is finalized, you can execute it by calling it from another function or directly from the Python prompt.

    Function Arguments

    We will often define functions to take optional keyword arguments.

    You can call a function by using the "required", "keyword", "default" and "variable-length" types of formal arguments.


    NumPy

    NumPy

    import numpy as np
    
    a = np.array([1, 2, 3])   # Create a rank 1 array
    print(type(a))            # Prints "<class 'numpy.ndarray'>"
    print(a.shape)            # Prints "(3,)"
    print(a[0], a[1], a[2])   # Prints "1 2 3"
    a[0] = 5                  # Change an element of the array
    print(a)                  # Prints "[5, 2, 3]"
    
    b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
    print(b.shape)                     # Prints "(2, 3)"
    print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"
    

    Some NumPy Functions

    import numpy as np
    
    a = np.zeros((2,2))   # Create an array of all zeros
    print(a)              # Prints "[[ 0.  0.]
                          #          [ 0.  0.]]"
    
    b = np.ones((1,2))    # Create an array of all ones
    print(b)              # Prints "[[ 1.  1.]]"
    
    c = np.full((2,2), 7)  # Create a constant array
    print(c)               # Prints "[[ 7.  7.]
                           #          [ 7.  7.]]"
    
    e = np.random.random((2,2))  # Create an array filled with random values
    print(e)                     # Might print "[[ 0.91940167  0.08143941]
                                 #               [ 0.68744134  0.87236687]]"
    

    NumPy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays. If you are already familiar with MATLAB, you might find this tutorial useful to get started with NumPy.

    It contains among other things:

    Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

    NumPy Array

    A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

    We can initialize numpy arrays from nested Python lists, and access elements using square brackets.

    NumPy also provides many functions to create arrays.


    Mixing integer indexing with slice indexing

    import numpy as np
    
    # Create the following rank 2 array with shape (3, 4)
    # [[ 1  2  3  4]
    #  [ 5  6  7  8]
    #  [ 9 10 11 12]]
    a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
    
    # Two ways of accessing the data in the middle row of the array.
    row_r1 = a[1, :]    # Rank 1 view of the second row of a
    row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
    print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
    print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"
    
    

    Array Indexing

    NumPy offers several ways to index into arrays. We study slicing here.

    Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

    You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array.

    Note that this is quite different from the way that MATLAB handles array slicing.

    For extensive details on numpy-slicing, please refer to this documentation.


    Datatypes

    import numpy as np
    
    x = np.array([1, 2])   # Let numpy choose the datatype
    print(x.dtype)         # Prints "int64"
    
    x = np.array([1.0, 2.0])   # Let numpy choose the datatype
    print(x.dtype)             # Prints "float64"
    
    x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
    print(x.dtype)                         # Prints "int64"
    

    NumPy Datatypes

    NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modify an array’s data-type.

    Relevant to us are mentioned in this table:

    Data type Description
    bool_ Boolean (True or False) stored as a byte
    int8 Byte (-128 to 127)
    int16 Integer (-32768 to 32767)
    int32 Integer (-2147483648 to 2147483647)
    uint8 Unsigned integer (0 to 255)
    float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
    float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

    Array Math

    import numpy as np
    
    x = np.array([[1,2],[3,4]], dtype=np.float64)
    y = np.array([[5,6],[7,8]], dtype=np.float64)
    
    print(np.add(x, y)) # or print(x + y)
    # [[ 6.0  8.0]
    #  [10.0 12.0]]
    
    print(np.subtract(x, y)) or print(x - y)
    # [[-4.0 -4.0]
    #  [-4.0 -4.0]]
    
    print(np.multiply(x, y)) or print(x * y)
    # [[ 5.0 12.0]
    #  [21.0 32.0]]
    
    print(np.divide(x, y)) or print(x / y)
    # [[ 0.2         0.33333333]
    #  [ 0.42857143  0.5       ]]
    
    print(np.sqrt(x))
    # [[ 1.          1.41421356]
    #  [ 1.73205081  2.        ]]
    
    

    Dot function

    import numpy as np
    
    x = np.array([[1,2],[3,4]])
    y = np.array([[5,6],[7,8]])
    
    v = np.array([9,10])
    w = np.array([11, 12])
    
    # Inner product of vectors; both produce 219
    print(v.dot(w))
    print(np.dot(v, w))
    
    # Matrix / vector product; both produce the rank 1 array [29 67]
    print(x.dot(v))
    print(np.dot(x, v))
    
    # Matrix / matrix product; both produce the rank 2 array
    # [[19 22]
    #  [43 50]]
    print(x.dot(y))
    print(np.dot(x, y))
    

    Sum along an axis

    import numpy as np
    
    x = np.array([[1,2],[3,4]])
    
    print(np.sum(x))  # Compute sum of all elements; prints "10"
    print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
    print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"
    

    Array Math

    Basic mathematical functions operate elementwise on arrays and are available both as operator overloads and as functions in the Numpy module.

    You can perform operations like add, subtract, multiply, divide, etc. directly on the arrays.

    Dot Function

    Note that unlike MATLAB, * is elementwise multiplication, not matrix multiplication.

    We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:

    NumPy provides many useful functions for performing computations on arrays; one of the most useful is sum.

    Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays.

    The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object.

    Sum along the axises

    This is a very important concept, especially from the point of view of Machine Learning. Several times you'd come across a situation where you'd want to sum matrices "along an axis". Let's dig into slight detail to understand how it works:

    Numpy displays a 3D (2,3,5) array as 2 blocks of 3x5 arrays (3 rows, 5 columns). Or call them 'planes' (MATLAB would show it as 5 blocks of 2x3).

    The numpy display matches a nested list - a list of two sublists; each with 3 sublists. Each of those is 5 elements long.

    In the 3x5 2d case, axis 0 sums along the 3 dimension, resulting in a 5 element array. The descriptions 'sum over rows' or 'sum along columns' is a little vague in English. Focus on the results, the change in shape, and which values are being summed, not on the description.

    In this 3d case:

    With axis=0, it sums along the 1st dimension, effectively removing it, leaving us with a 3x5 array. 0+15=16, 1+16=17 etc.

    Axis 1, condenses the size 3 dimension, result is 2x5. 0+5+10=15, etc.

    Axis 2, condense the size 5 dimenson, result is 2x3, sum((0,1,2,3,4)).


    Matplotlib

    MathPlotLib - Plotting Graphs

    import numpy as np
    import matplotlib.pyplot as plt
    
    # Compute the x and y coordinates for points on a sine curve
    x = np.arange(0, 3 * np.pi, 0.1)
    y = np.sin(x)
    
    # Plot the points using matplotlib
    plt.plot(x, y)
    plt.show()  # You must call plt.show() to make graphics appear.
    

    Matplotlib is a plotting library. In this section we give a very brief introduction to the matplotlib.pyplot module, which provides a plotting system similar to that of MATLAB.

    Plotting

    The most important function in matplotlib is plot, which allows you to plot 2D data. Here is a simple example:

    graph


    Images

    Images

    import numpy as np
    from scipy.misc import imread, imresize
    import matplotlib.pyplot as plt
    
    # Uncomment the line below if you're on a notebook
    # %matplotlib inline 
    img = imread('assets/cat.jpg')
    img_tinted = img * [1, 0.95, 0.9]
    
    # Show the original image
    plt.subplot(1, 2, 1)
    plt.imshow(img)
    
    # Show the tinted image
    plt.subplot(1, 2, 2)
    
    # A slight gotcha with imshow is that it might give strange results
    # if presented with data that is not uint8. To work around this, we
    # explicitly cast the image to uint8 before displaying it.
    plt.imshow(np.uint8(img_tinted))
    plt.show()
    

    Once you're done executing your ML program, the first thing you'd want to do is to view your results.

    You can use the imshow() function to show images.

    Most probably you'd be working on an iPython or Jupyter notebook (or Colab). imshow() function would fail in such a case.

    Before any plotting or an import of matplotlib is performed, you must execute the %matplotlib inline magic command. This performs the necessary behind-the-scenes setup for IPython/Jupyter/Colab to work correctly hand in hand with matplotlib

    Here is an example:

    cats