python_1.ipynb - cs107-lecture-examples - Example codes used during Harvard CS107 lectures

python_1.ipynb (32918B)
      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {},
      6    "source": [
      7     "# Introductory Python"
      8    ]
      9   },
     10   {
     11    "cell_type": "markdown",
     12    "metadata": {},
     13    "source": [
     14     "The main topic for today's lecture is Python and some of it's basic\n",
     15     "functionality.  We will cover the basics of \n",
     16     "\n",
     17     "* using Python as a calculator\n",
     18     "* `print` statements\n",
     19     "* the list concept\n",
     20     "* opening and reading from files\n",
     21     "* dictionaries\n",
     22     "* strings\n",
     23     "\n",
     24     "I will show you some very basic examples and you will put them all together in a\n",
     25     "small script for your exercise.  The exercise is displayed at the top of this\n",
     26     "notebook.  If you already know how to do it, then just write up your script now.\n",
     27     "However, you may need some guidance.  You will find such guidance throughout the\n",
     28     "rest of the notebook."
     29    ]
     30   },
     31   {
     32    "cell_type": "markdown",
     33    "metadata": {},
     34    "source": [
     35     "## Important, Useful Libraries"
     36    ]
     37   },
     38   {
     39    "cell_type": "markdown",
     40    "metadata": {},
     41    "source": [
     42     "You should always try to use existing technologies to accomplish your goals\n",
     43     "whenever possible.  For example, don't write your own function to compute the\n",
     44     "square root of a number.  That would be really hard and your implementation\n",
     45     "would most likely not be very efficient.  Instead, use built-in functionality or\n",
     46     "functionality from a nice library such as `numpy`\n",
     47     "([NUMericalPYthon](http://www.numpy.org/)).\n",
     48     "\n",
     49     "> NumPy is the fundamental package for scientific computing with Python. It\n",
     50     "> contains among other things:\n",
     51     ">\n",
     52     "> * a powerful N-dimensional array object \n",
     53     "> * sophisticated (broadcasting) functions \n",
     54     "> * tools for integrating C/C++ and Fortran code \n",
     55     "> * useful linear algebra, Fourier transform, and random number capabilities \n",
     56     ">\n",
     57     "> Besides its obvious scientific uses, NumPy can also be used as an efficient\n",
     58     "> multi-dimensional container of generic data. Arbitrary data-types can be\n",
     59     "> defined. This allows NumPy to seamlessly and speedily integrate with a wide\n",
     60     "> variety of databases.\n",
     61     "\n",
     62     "To import libraries into your Python application, do the following:"
     63    ]
     64   },
     65   {
     66    "cell_type": "code",
     67    "execution_count": 1,
     68    "metadata": {
     69     "collapsed": true
     70    },
     71    "outputs": [],
     72    "source": [
     73     "# The %... is an iPython thing, and is not part of the Python language.\n",
     74     "# In this case we're just telling the plotting library to draw things on\n",
     75     "# the notebook, instead of on a separate window.\n",
     76     "%matplotlib inline \n",
     77     "# the line above prepares IPython notebook for working with matplotlib\n",
     78     "\n",
     79     "import numpy as np # imports a fast numerical programming library\n",
     80     "import scipy as sp #imports stats functions, amongst other things\n",
     81     "import matplotlib as mpl # this actually imports matplotlib\n",
     82     "import matplotlib.cm as cm #allows us easy access to colormaps\n",
     83     "import matplotlib.pyplot as plt #sets up plotting under plt\n",
     84     "import pandas as pd #lets us handle data as dataframes\n",
     85     "#sets up pandas table display\n",
     86     "pd.set_option('display.width', 500)\n",
     87     "pd.set_option('display.max_columns', 100)\n",
     88     "pd.set_option('display.notebook_repr_html', True)"
     89    ]
     90   },
     91   {
     92    "cell_type": "markdown",
     93    "metadata": {},
     94    "source": [
     95     "The way to understand these imports is as follows: _import the library `library`\n",
     96     "with the alias `lib`_ where `library` could be `numpy` or `matplotlib` or\n",
     97     "whatever you want and `lib` is the alias used to refer to that library in our\n",
     98     "code.  Using this flow, we can call methods like `plt.plot()` instead of\n",
     99     "`matplotlib.pyplot.plot()`.  It makes life easier."
    100    ]
    101   },
    102   {
    103    "cell_type": "markdown",
    104    "metadata": {},
    105    "source": [
    106     "**NOTE:** It is not necessary to import _all_ of these libraries all of the\n",
    107     "time.  You should only import the ones you really need.  I listed a bunch above\n",
    108     "to give you a sampling of what's available.\n",
    109     "\n",
    110     "**NOTE:** DO NOT include `%matplotlib inline` in your Python scripts unless\n",
    111     "you're working in the Jupyter notebook."
    112    ]
    113   },
    114   {
    115    "cell_type": "markdown",
    116    "metadata": {},
    117    "source": [
    118     "At the end of this course, someone should be able to `import\n",
    119     "your_kinetics_library` to use the kinetics library that you are about to start\n",
    120     "writing."
    121    ]
    122   },
    123   {
    124    "cell_type": "markdown",
    125    "metadata": {},
    126    "source": [
    127     "## The Very Basics"
    128    ]
    129   },
    130   {
    131    "cell_type": "markdown",
    132    "metadata": {},
    133    "source": [
    134     "We'll fly through this part because you should already know it.  If you don't\n",
    135     "understand something, please Google it and/or refer to the [Python\n",
    136     "Tutorial](https://docs.python.org/3/tutorial/).  I do not want to recreate the\n",
    137     "Python tutorial here; instead, I'll just summarize a few important ideas from\n",
    138     "Python.  We'll give more details a little later on how some of these language\n",
    139     "features work.\n",
    140     "\n",
    141     "Another very helpful resource that explains the basics below (and few additional\n",
    142     "topics) can be found here:\n",
    143     "[https://learnxinyminutes.com/docs/python/](https://learnxinyminutes.com/docs/python/)."
    144    ]
    145   },
    146   {
    147    "cell_type": "markdown",
    148    "metadata": {},
    149    "source": [
    150     "### Calculating"
    151    ]
    152   },
    153   {
    154    "cell_type": "markdown",
    155    "metadata": {},
    156    "source": [
    157     "We can tell the type of a number or variable by using the `type` function."
    158    ]
    159   },
    160   {
    161    "cell_type": "code",
    162    "execution_count": 2,
    163    "metadata": {},
    164    "outputs": [
    165     {
    166      "data": {
    167       "text/plain": [
    168        "(int, float)"
    169       ]
    170      },
    171      "execution_count": 2,
    172      "metadata": {},
    173      "output_type": "execute_result"
    174     }
    175    ],
    176    "source": [
    177     "type(3), type(3.0)"
    178    ]
    179   },
    180   {
    181    "cell_type": "markdown",
    182    "metadata": {},
    183    "source": [
    184     "Remember, every variable in python gets a type. Python is a strongly typed\n",
    185     "language. It is also a dynamic language, in the sense that types are assigned at\n",
    186     "run-time, rather then \"compile\" time, as in a language like C. This makes it\n",
    187     "slower, as the way data is stored cannot be initially optimal, as when the\n",
    188     "program starts, you dont know what that variable will point to."
    189    ]
    190   },
    191   {
    192    "cell_type": "markdown",
    193    "metadata": {},
    194    "source": [
    195     "All the usual calculations can be done in Python."
    196    ]
    197   },
    198   {
    199    "cell_type": "code",
    200    "execution_count": 3,
    201    "metadata": {},
    202    "outputs": [
    203     {
    204      "data": {
    205       "text/plain": [
    206        "6.0"
    207       ]
    208      },
    209      "execution_count": 3,
    210      "metadata": {},
    211      "output_type": "execute_result"
    212     }
    213    ],
    214    "source": [
    215     "2.0 + 4.0 # Adding two floats"
    216    ]
    217   },
    218   {
    219    "cell_type": "code",
    220    "execution_count": 4,
    221    "metadata": {},
    222    "outputs": [
    223     {
    224      "data": {
    225       "text/plain": [
    226        "6"
    227       ]
    228      },
    229      "execution_count": 4,
    230      "metadata": {},
    231      "output_type": "execute_result"
    232     }
    233    ],
    234    "source": [
    235     "2 + 4     # Adding two ints"
    236    ]
    237   },
    238   {
    239    "cell_type": "code",
    240    "execution_count": 5,
    241    "metadata": {},
    242    "outputs": [
    243     {
    244      "data": {
    245       "text/plain": [
    246        "0.3333333333333333"
    247       ]
    248      },
    249      "execution_count": 5,
    250      "metadata": {},
    251      "output_type": "execute_result"
    252     }
    253    ],
    254    "source": [
    255     "1.0 / 3.0 # Dividing two floats"
    256    ]
    257   },
    258   {
    259    "cell_type": "code",
    260    "execution_count": 6,
    261    "metadata": {},
    262    "outputs": [
    263     {
    264      "data": {
    265       "text/plain": [
    266        "0.3333333333333333"
    267       ]
    268      },
    269      "execution_count": 6,
    270      "metadata": {},
    271      "output_type": "execute_result"
    272     }
    273    ],
    274    "source": [
    275     "1 / 3     # Dividing two ints"
    276    ]
    277   },
    278   {
    279    "cell_type": "markdown",
    280    "metadata": {},
    281    "source": [
    282     "Note that in Python 2, the division of two ints would not be interpreted as a\n",
    283     "float; it is integer division.  This is new in Python 3!  Now, if you want\n",
    284     "integer division you have to use the `//` operator."
    285    ]
    286   },
    287   {
    288    "cell_type": "code",
    289    "execution_count": 7,
    290    "metadata": {},
    291    "outputs": [
    292     {
    293      "data": {
    294       "text/plain": [
    295        "0"
    296       ]
    297      },
    298      "execution_count": 7,
    299      "metadata": {},
    300      "output_type": "execute_result"
    301     }
    302    ],
    303    "source": [
    304     "1 // 3    # Integer division"
    305    ]
    306   },
    307   {
    308    "cell_type": "code",
    309    "execution_count": 8,
    310    "metadata": {},
    311    "outputs": [
    312     {
    313      "data": {
    314       "text/plain": [
    315        "32"
    316       ]
    317      },
    318      "execution_count": 8,
    319      "metadata": {},
    320      "output_type": "execute_result"
    321     }
    322    ],
    323    "source": [
    324     "2**5      # Powers"
    325    ]
    326   },
    327   {
    328    "cell_type": "code",
    329    "execution_count": 9,
    330    "metadata": {},
    331    "outputs": [
    332     {
    333      "data": {
    334       "text/plain": [
    335        "15"
    336       ]
    337      },
    338      "execution_count": 9,
    339      "metadata": {},
    340      "output_type": "execute_result"
    341     }
    342    ],
    343    "source": [
    344     "3 * 5     # Multiplication"
    345    ]
    346   },
    347   {
    348    "cell_type": "markdown",
    349    "metadata": {},
    350    "source": [
    351     "#### More advanced operations\n",
    352     "\n",
    353     "We can use `numpy` to do some more advanced operations."
    354    ]
    355   },
    356   {
    357    "cell_type": "code",
    358    "execution_count": 10,
    359    "metadata": {},
    360    "outputs": [
    361     {
    362      "data": {
    363       "text/plain": [
    364        "13.974998513319154"
    365       ]
    366      },
    367      "execution_count": 10,
    368      "metadata": {},
    369      "output_type": "execute_result"
    370     }
    371    ],
    372    "source": [
    373     "np.pi * np.exp(2.0) + np.tanh(1.0) - np.sqrt(100.0)"
    374    ]
    375   },
    376   {
    377    "cell_type": "markdown",
    378    "metadata": {},
    379    "source": [
    380     "Notice that I am always writing my floats with a decimal point.  You don't\n",
    381     "really need to do that in Python because Python will automatically convert\n",
    382     "between types.  For example:"
    383    ]
    384   },
    385   {
    386    "cell_type": "code",
    387    "execution_count": 11,
    388    "metadata": {},
    389    "outputs": [
    390     {
    391      "data": {
    392       "text/plain": [
    393        "(numpy.float64, numpy.float64)"
    394       ]
    395      },
    396      "execution_count": 11,
    397      "metadata": {},
    398      "output_type": "execute_result"
    399     }
    400    ],
    401    "source": [
    402     "type(np.pi * np.exp(2.0) + np.tanh(1.0) - np.sqrt(100.0)), type(np.pi * np.exp(2) + np.tanh(1) - np.sqrt(100))"
    403    ]
    404   },
    405   {
    406    "cell_type": "markdown",
    407    "metadata": {},
    408    "source": [
    409     "However, I like to make the types as explicit as I can so there's no confusion."
    410    ]
    411   },
    412   {
    413    "cell_type": "markdown",
    414    "metadata": {},
    415    "source": [
    416     "### `print`"
    417    ]
    418   },
    419   {
    420    "cell_type": "markdown",
    421    "metadata": {},
    422    "source": [
    423     "The `print` function is the basic way to write information out to the screen.  I\n",
    424     "will briefly review the new form of the `print` function.  In Python 2, `print`\n",
    425     "was a `statement` rather than a `function`."
    426    ]
    427   },
    428   {
    429    "cell_type": "code",
    430    "execution_count": 12,
    431    "metadata": {},
    432    "outputs": [
    433     {
    434      "name": "stdout",
    435      "output_type": "stream",
    436      "text": [
    437       "Good morning!  Today we are doing Python!\n",
    438       "3.0\n",
    439       "3.141592653589793 is a nice, trancendental number\n",
    440       "Eric is nice and so is Sarah\n",
    441       "  3.1415926535897931...: it goes on forever but 3 is just an int.\n"
    442      ]
    443     }
    444    ],
    445    "source": [
    446     "print('Good morning!  Today we are doing Python!')                                  # Basic print\n",
    447     "print(3.0)                                                                          # Print a float\n",
    448     "print('{} is a nice, trancendental number'.format(np.pi))                           # Print just one number\n",
    449     "print('{} is nice and so is {}'.format('Eric', 'Sarah'))                            # Print with two arguments\n",
    450     "print('{0:20.16f}...: it goes on forever but {1} is just an int.'.format(np.pi, 3)) # Print with formatting in argument 0"
    451    ]
    452   },
    453   {
    454    "cell_type": "markdown",
    455    "metadata": {},
    456    "source": [
    457     "Here are some additional resources for the `print` function and formatting:\n",
    458     "* [7. Input and Output](https://docs.python.org/3/tutorial/inputoutput.html)\n",
    459     "* [Formatted Output](https://www.python-course.eu/python3_formatted_output.php)\n",
    460     "* [`Print` function](https://docs.python.org/3/library/functions.html#print)"
    461    ]
    462   },
    463   {
    464    "cell_type": "markdown",
    465    "metadata": {},
    466    "source": [
    467     "### Variables"
    468    ]
    469   },
    470   {
    471    "cell_type": "markdown",
    472    "metadata": {},
    473    "source": [
    474     "We'll have more to say about variables in Python later.  For now, here's how you\n",
    475     "store them syntactically:"
    476    ]
    477   },
    478   {
    479    "cell_type": "code",
    480    "execution_count": 13,
    481    "metadata": {},
    482    "outputs": [
    483     {
    484      "name": "stdout",
    485      "output_type": "stream",
    486      "text": [
    487       "1.0x^2 + -1.0x + -1.0 = 0.0\n"
    488      ]
    489     }
    490    ],
    491    "source": [
    492     "a = 1.0\n",
    493     "b = -1.0\n",
    494     "c = -1.0\n",
    495     "x = (1.0 + np.sqrt(5.0)) / 2.0\n",
    496     "val = a * x**2.0 + b * x + c\n",
    497     "print('{0}x^2 + {1}x + {2} = {3}'.format(a, b, c, val))"
    498    ]
    499   },
    500   {
    501    "cell_type": "markdown",
    502    "metadata": {},
    503    "source": [
    504     "Python has this nice feature where you can assign more than one variable all on\n",
    505     "one line.  It's called the multiple assignment statement."
    506    ]
    507   },
    508   {
    509    "cell_type": "code",
    510    "execution_count": 14,
    511    "metadata": {},
    512    "outputs": [
    513     {
    514      "name": "stdout",
    515      "output_type": "stream",
    516      "text": [
    517       "1.0x^2 + -1.0x + -1.0 = 0.0\n"
    518      ]
    519     }
    520    ],
    521    "source": [
    522     "a, b, c = 1.0, -1.0, -1.0\n",
    523     "x = (1.0 + np.sqrt(5.0)) / 2.0\n",
    524     "val = a * x**2.0 + b * x + c\n",
    525     "print('{0}x^2 + {1}x + {2} = {3}'.format(a, b, c, val))"
    526    ]
    527   },
    528   {
    529    "cell_type": "markdown",
    530    "metadata": {},
    531    "source": [
    532     "Looks a little cleaner now."
    533    ]
    534   },
    535   {
    536    "cell_type": "markdown",
    537    "metadata": {},
    538    "source": [
    539     "### Lists and `for` loops"
    540    ]
    541   },
    542   {
    543    "cell_type": "markdown",
    544    "metadata": {},
    545    "source": [
    546     "Lists are central to Python.  Many things behave like lists.  For now, we'll\n",
    547     "just look at how to create them and do basic operations with them.  I will not\n",
    548     "go through all the details.  Please refer to\n",
    549     "[Lists](https://docs.python.org/3/tutorial/introduction.html#lists) for\n",
    550     "additional examples."
    551    ]
    552   },
    553   {
    554    "cell_type": "code",
    555    "execution_count": 15,
    556    "metadata": {},
    557    "outputs": [
    558     {
    559      "name": "stdout",
    560      "output_type": "stream",
    561      "text": [
    562       "First few primes are: [2, 3, 5, 7, 11, 13]\n",
    563       "Here are the primes up to the number 20: [2, 3, 5, 7, 11, 13, 17, 19]\n"
    564      ]
    565     }
    566    ],
    567    "source": [
    568     "primes = [2, 3, 5, 7, 11, 13]     # A list of primes\n",
    569     "more_primes = primes + [17, 19]   # List concatentation\n",
    570     "print('First few primes are: {primes}'.format(primes=primes))\n",
    571     "print('Here are the primes up to the number 20: {}'.format(more_primes))"
    572    ]
    573   },
    574   {
    575    "cell_type": "markdown",
    576    "metadata": {},
    577    "source": [
    578     "Notice that Python knows that type of `primes`."
    579    ]
    580   },
    581   {
    582    "cell_type": "code",
    583    "execution_count": 16,
    584    "metadata": {},
    585    "outputs": [
    586     {
    587      "name": "stdout",
    588      "output_type": "stream",
    589      "text": [
    590       "primes is of type <class 'list'>\n"
    591      ]
    592     }
    593    ],
    594    "source": [
    595     "print('primes is of type {}'.format(type(primes)))"
    596    ]
    597   },
    598   {
    599    "cell_type": "markdown",
    600    "metadata": {},
    601    "source": [
    602     "The `len` function can provide the number of elements in the list."
    603    ]
    604   },
    605   {
    606    "cell_type": "code",
    607    "execution_count": 17,
    608    "metadata": {},
    609    "outputs": [
    610     {
    611      "name": "stdout",
    612      "output_type": "stream",
    613      "text": [
    614       "There are 6 prime numbers less than or equal to 20.\n"
    615      ]
    616     }
    617    ],
    618    "source": [
    619     "print('There are {} prime numbers less than or equal to 20.'.format(len(primes)))"
    620    ]
    621   },
    622   {
    623    "cell_type": "markdown",
    624    "metadata": {},
    625    "source": [
    626     "Now that we know what a list is, we can discuss `for` loops in Python.  The\n",
    627     "`for` loop iterates over an iterator such as a list.  For example:"
    628    ]
    629   },
    630   {
    631    "cell_type": "code",
    632    "execution_count": 18,
    633    "metadata": {},
    634    "outputs": [
    635     {
    636      "name": "stdout",
    637      "output_type": "stream",
    638      "text": [
    639       "2\n",
    640       "3\n",
    641       "5\n",
    642       "7\n",
    643       "11\n",
    644       "13\n",
    645       "17\n",
    646       "19\n"
    647      ]
    648     }
    649    ],
    650    "source": [
    651     "for p in more_primes:\n",
    652     "    print(p)"
    653    ]
    654   },
    655   {
    656    "cell_type": "markdown",
    657    "metadata": {},
    658    "source": [
    659     "A useful iterator (but not a list!) is the `range` function."
    660    ]
    661   },
    662   {
    663    "cell_type": "code",
    664    "execution_count": 19,
    665    "metadata": {},
    666    "outputs": [
    667     {
    668      "name": "stdout",
    669      "output_type": "stream",
    670      "text": [
    671       "range(0, 10)\n",
    672       "<class 'range'>\n"
    673      ]
    674     }
    675    ],
    676    "source": [
    677     "print(range(10))\n",
    678     "print(type(range(10)))"
    679    ]
    680   },
    681   {
    682    "cell_type": "markdown",
    683    "metadata": {},
    684    "source": [
    685     "It's not a list anymore (it used to be in Python 2) and therefore can't be\n",
    686     "sliced like a list can (see below).  Still, you can use it in `for` loops which\n",
    687     "is where it finds most of its use."
    688    ]
    689   },
    690   {
    691    "cell_type": "code",
    692    "execution_count": 20,
    693    "metadata": {},
    694    "outputs": [
    695     {
    696      "name": "stdout",
    697      "output_type": "stream",
    698      "text": [
    699       "0\n",
    700       "1\n",
    701       "2\n",
    702       "3\n",
    703       "4\n",
    704       "5\n",
    705       "6\n",
    706       "7\n",
    707       "8\n",
    708       "9\n"
    709      ]
    710     }
    711    ],
    712    "source": [
    713     "for n in range(10):\n",
    714     "    print(n)"
    715    ]
    716   },
    717   {
    718    "cell_type": "markdown",
    719    "metadata": {},
    720    "source": [
    721     "There is something called a _list comprehension_ in Python.  List comprehensions\n",
    722     "are just a way to transform one list into another list."
    723    ]
    724   },
    725   {
    726    "cell_type": "code",
    727    "execution_count": 21,
    728    "metadata": {},
    729    "outputs": [
    730     {
    731      "name": "stdout",
    732      "output_type": "stream",
    733      "text": [
    734       "The new list is [0, 1, 1, 2, 3, 4, 5, 6]\n"
    735      ]
    736     }
    737    ],
    738    "source": [
    739     "not_all_primes = [p // 3 for p in more_primes]\n",
    740     "print('The new list is {}'.format(not_all_primes))"
    741    ]
    742   },
    743   {
    744    "cell_type": "markdown",
    745    "metadata": {},
    746    "source": [
    747     "We can also count the number of each element in the list.  There are a number of\n",
    748     "ways of doing this, but one convenient way is to use the `collections` library."
    749    ]
    750   },
    751   {
    752    "cell_type": "code",
    753    "execution_count": 22,
    754    "metadata": {},
    755    "outputs": [
    756     {
    757      "name": "stdout",
    758      "output_type": "stream",
    759      "text": [
    760       "Counter({1: 2, 0: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1})\n",
    761       "<class 'collections.Counter'>\n"
    762      ]
    763     }
    764    ],
    765    "source": [
    766     "import collections\n",
    767     "how_many = collections.Counter(not_all_primes)\n",
    768     "print(how_many)\n",
    769     "print(type(how_many))"
    770    ]
    771   },
    772   {
    773    "cell_type": "markdown",
    774    "metadata": {},
    775    "source": [
    776     "We see that there are 2 ones, 1 two, 1 three, etc.\n",
    777     "\n",
    778     "We can even find the most common element of the list and how many occurrences of\n",
    779     "it there are and return the result as a list."
    780    ]
    781   },
    782   {
    783    "cell_type": "code",
    784    "execution_count": 23,
    785    "metadata": {},
    786    "outputs": [
    787     {
    788      "name": "stdout",
    789      "output_type": "stream",
    790      "text": [
    791       "[(1, 2), (0, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1)]\n",
    792       "<class 'list'>\n"
    793      ]
    794     }
    795    ],
    796    "source": [
    797     "how_many_list = how_many.most_common()\n",
    798     "print(how_many_list)\n",
    799     "print(type(how_many_list))"
    800    ]
    801   },
    802   {
    803    "cell_type": "markdown",
    804    "metadata": {},
    805    "source": [
    806     "We see that the result is a list of tuples with the most common element of our\n",
    807     "original list (`not_all_primes`) displayed first.  We want the most common\n",
    808     "element of our original list, so we just access the first element using a simple\n",
    809     "index."
    810    ]
    811   },
    812   {
    813    "cell_type": "code",
    814    "execution_count": 24,
    815    "metadata": {},
    816    "outputs": [
    817     {
    818      "name": "stdout",
    819      "output_type": "stream",
    820      "text": [
    821       "(1, 2)\n",
    822       "<class 'tuple'>\n"
    823      ]
    824     }
    825    ],
    826    "source": [
    827     "most_common = how_many_list[0]\n",
    828     "print(most_common)\n",
    829     "print(type(most_common))"
    830    ]
    831   },
    832   {
    833    "cell_type": "markdown",
    834    "metadata": {},
    835    "source": [
    836     "We're almost there.  We recall the first element of this tuple is the value from\n",
    837     "our original list and the second element in the tuple is its frequency.  We're\n",
    838     "finally ready to get our result!"
    839    ]
    840   },
    841   {
    842    "cell_type": "code",
    843    "execution_count": 25,
    844    "metadata": {},
    845    "outputs": [
    846     {
    847      "name": "stdout",
    848      "output_type": "stream",
    849      "text": [
    850       "The number 1 is the most common value in our list.\n",
    851       "It occurs 2 times.\n"
    852      ]
    853     }
    854    ],
    855    "source": [
    856     "print('The number {} is the most common value in our list.'.format(most_common[0]))\n",
    857     "print('It occurs {} times.'.format(most_common[1]))"
    858    ]
    859   },
    860   {
    861    "cell_type": "markdown",
    862    "metadata": {},
    863    "source": [
    864     "List indexing is also very important.  It can also do much more than what we did\n",
    865     "above."
    866    ]
    867   },
    868   {
    869    "cell_type": "code",
    870    "execution_count": 26,
    871    "metadata": {},
    872    "outputs": [
    873     {
    874      "name": "stdout",
    875      "output_type": "stream",
    876      "text": [
    877       "5\n",
    878       "[5, 7, 11]\n",
    879       "13\n",
    880       "[7, 11, 13]\n"
    881      ]
    882     }
    883    ],
    884    "source": [
    885     "print(primes[2])   # print the 3rd entry \n",
    886     "print(primes[2:5]) # print the 3rd to 5th entries\n",
    887     "print(primes[-1])  # print the last entry\n",
    888     "print(primes[-3:]) # print the three entries"
    889    ]
    890   },
    891   {
    892    "cell_type": "markdown",
    893    "metadata": {},
    894    "source": [
    895     "Other types of slices and indexing can be done as well.  I leave it to you to\n",
    896     "look this up as you need it.  It is a **very** useful thing to know."
    897    ]
    898   },
    899   {
    900    "cell_type": "markdown",
    901    "metadata": {},
    902    "source": [
    903     "Two convenient built-in functions are `enumerate` and `zip`.  You may find\n",
    904     "various uses for them.\n",
    905     "\n",
    906     "* `enumerate` gives a representation of a list of tuples with each tuple of the\n",
    907     "  form `(index, value)`.  This provides an easy way to access the `index` of the\n",
    908     "  value in the `list`.\n",
    909     "* `zip` takes elements from each list and puts them together into a\n",
    910     "  representation of a list of tuples.  This provides a nice way to aggregate\n",
    911     "  lists."
    912    ]
    913   },
    914   {
    915    "cell_type": "markdown",
    916    "metadata": {},
    917    "source": [
    918     "We'll make two lists for the following examples:"
    919    ]
    920   },
    921   {
    922    "cell_type": "code",
    923    "execution_count": 27,
    924    "metadata": {
    925     "collapsed": true
    926    },
    927    "outputs": [],
    928    "source": [
    929     "species = ['H2', 'O2', 'OH', 'H2O', 'H2O2']\n",
    930     "species_names = ['Hydrogen', 'Oxygen', 'Hydroxyl', 'Water', 'Hydrogen Peroxide']"
    931    ]
    932   },
    933   {
    934    "cell_type": "markdown",
    935    "metadata": {},
    936    "source": [
    937     "#### `enumerate` example"
    938    ]
    939   },
    940   {
    941    "cell_type": "code",
    942    "execution_count": 28,
    943    "metadata": {},
    944    "outputs": [
    945     {
    946      "name": "stdout",
    947      "output_type": "stream",
    948      "text": [
    949       "<enumerate object at 0x10adecb40>\n"
    950      ]
    951     }
    952    ],
    953    "source": [
    954     "print(enumerate(species)) "
    955    ]
    956   },
    957   {
    958    "cell_type": "markdown",
    959    "metadata": {},
    960    "source": [
    961     "Notice that `enumerate()` just returns an iterator object.  To actually see\n",
    962     "what's in the iterator object, we need to convert the iterator object to a list"
    963    ]
    964   },
    965   {
    966    "cell_type": "code",
    967    "execution_count": 29,
    968    "metadata": {},
    969    "outputs": [
    970     {
    971      "name": "stdout",
    972      "output_type": "stream",
    973      "text": [
    974       "[(0, 'H2'), (1, 'O2'), (2, 'OH'), (3, 'H2O'), (4, 'H2O2')]\n"
    975      ]
    976     }
    977    ],
    978    "source": [
    979     "print(list(enumerate(species)))"
    980    ]
    981   },
    982   {
    983    "cell_type": "markdown",
    984    "metadata": {},
    985    "source": [
    986     "We see that we have a list of tuples (in the form `(index, value)` where `index`\n",
    987     "starts from 0).  Here's just one way that this might be used:"
    988    ]
    989   },
    990   {
    991    "cell_type": "code",
    992    "execution_count": 30,
    993    "metadata": {},
    994    "outputs": [
    995     {
    996      "name": "stdout",
    997      "output_type": "stream",
    998      "text": [
    999       "H2 is species 1\n",
   1000       "O2 is species 2\n",
   1001       "OH is species 3\n",
   1002       "H2O is species 4\n",
   1003       "H2O2 is species 5\n"
   1004      ]
   1005     }
   1006    ],
   1007    "source": [
   1008     "for i, s in enumerate(species):\n",
   1009     "    print('{species} is species {ind}'.format(species=s, ind=i+1))"
   1010    ]
   1011   },
   1012   {
   1013    "cell_type": "markdown",
   1014    "metadata": {},
   1015    "source": [
   1016     "What happened is that the `for` loop iterated over the iterable (here\n",
   1017     "`enumerate`).  The first index in the `for` loop corresponds to the first entry\n",
   1018     "in the `enumerate` tuple and the second index in the `for` loop corresponds to\n",
   1019     "the second entry in the `enumerate` tuple."
   1020    ]
   1021   },
   1022   {
   1023    "cell_type": "markdown",
   1024    "metadata": {},
   1025    "source": [
   1026     "#### `zip` example"
   1027    ]
   1028   },
   1029   {
   1030    "cell_type": "markdown",
   1031    "metadata": {},
   1032    "source": [
   1033     "Let's see how `zip` works.  We'll aggregate the `species` and `species_names`\n",
   1034     "lists."
   1035    ]
   1036   },
   1037   {
   1038    "cell_type": "code",
   1039    "execution_count": 31,
   1040    "metadata": {},
   1041    "outputs": [
   1042     {
   1043      "name": "stdout",
   1044      "output_type": "stream",
   1045      "text": [
   1046       "<zip object at 0x10adfb108>\n",
   1047       "[('H2', 'Hydrogen'), ('O2', 'Oxygen'), ('OH', 'Hydroxyl'), ('H2O', 'Water'), ('H2O2', 'Hydrogen Peroxide')]\n"
   1048      ]
   1049     }
   1050    ],
   1051    "source": [
   1052     "print(zip(species, species_names))\n",
   1053     "print(list(zip(species, species_names)))"
   1054    ]
   1055   },
   1056   {
   1057    "cell_type": "code",
   1058    "execution_count": 32,
   1059    "metadata": {},
   1060    "outputs": [
   1061     {
   1062      "name": "stdout",
   1063      "output_type": "stream",
   1064      "text": [
   1065       "H2 is called Hydrogen\n",
   1066       "O2 is called Oxygen\n",
   1067       "OH is called Hydroxyl\n",
   1068       "H2O is called Water\n",
   1069       "H2O2 is called Hydrogen Peroxide\n"
   1070      ]
   1071     }
   1072    ],
   1073    "source": [
   1074     "for s, name in zip(species, species_names):\n",
   1075     "    print('{specie} is called {name}'.format(specie=s, name=name))"
   1076    ]
   1077   },
   1078   {
   1079    "cell_type": "markdown",
   1080    "metadata": {},
   1081    "source": [
   1082     "We see that this worked in a similar way to `enumerate`."
   1083    ]
   1084   },
   1085   {
   1086    "cell_type": "markdown",
   1087    "metadata": {},
   1088    "source": [
   1089     "Finally, you will sometimes see `enumerate` and `zip` used together."
   1090    ]
   1091   },
   1092   {
   1093    "cell_type": "code",
   1094    "execution_count": 33,
   1095    "metadata": {},
   1096    "outputs": [
   1097     {
   1098      "name": "stdout",
   1099      "output_type": "stream",
   1100      "text": [
   1101       "Species 1 is H2 and it is called Hydrogen.\n",
   1102       "Species 2 is O2 and it is called Oxygen.\n",
   1103       "Species 3 is OH and it is called Hydroxyl.\n",
   1104       "Species 4 is H2O and it is called Water.\n",
   1105       "Species 5 is H2O2 and it is called Hydrogen Peroxide.\n"
   1106      ]
   1107     }
   1108    ],
   1109    "source": [
   1110     "for n, (s, name) in enumerate(zip(species, species_names), 1):\n",
   1111     "    print('Species {ind} is {specie} and it is called {name}.'.format(ind=n, specie=s, name=name))"
   1112    ]
   1113   },
   1114   {
   1115    "cell_type": "markdown",
   1116    "metadata": {},
   1117    "source": [
   1118     "### Opening Files"
   1119    ]
   1120   },
   1121   {
   1122    "cell_type": "markdown",
   1123    "metadata": {},
   1124    "source": [
   1125     "There are a variety of ways to open files in Python.  We'll see a bunch as the\n",
   1126     "semester progresses.  Today, we'll focus on opening and reading text files."
   1127    ]
   1128   },
   1129   {
   1130    "cell_type": "code",
   1131    "execution_count": 34,
   1132    "metadata": {
   1133     "collapsed": true
   1134    },
   1135    "outputs": [],
   1136    "source": [
   1137     "species_file = open(\"species.txt\") # Open the file\n",
   1138     "species_text = species_file.read() # Read the lines of the file\n",
   1139     "species_tokens = species_text.split() # Split the string and separate based on white spaces\n",
   1140     "species_file.close()               # Close the file!"
   1141    ]
   1142   },
   1143   {
   1144    "cell_type": "code",
   1145    "execution_count": 35,
   1146    "metadata": {},
   1147    "outputs": [
   1148     {
   1149      "name": "stdout",
   1150      "output_type": "stream",
   1151      "text": [
   1152       "['H2', 'O2', 'OH', 'H2O', 'H2O2']\n",
   1153       "<class 'list'>\n"
   1154      ]
   1155     }
   1156    ],
   1157    "source": [
   1158     "print(species_tokens)\n",
   1159     "print(type(species_tokens))"
   1160    ]
   1161   },
   1162   {
   1163    "cell_type": "markdown",
   1164    "metadata": {},
   1165    "source": [
   1166     "Notice that we get a list of strings."
   1167    ]
   1168   },
   1169   {
   1170    "cell_type": "markdown",
   1171    "metadata": {},
   1172    "source": [
   1173     "Here's a better way to open a file.  The `close` operation is handled\n",
   1174     "automatically for us."
   1175    ]
   1176   },
   1177   {
   1178    "cell_type": "code",
   1179    "execution_count": 36,
   1180    "metadata": {
   1181     "collapsed": true
   1182    },
   1183    "outputs": [],
   1184    "source": [
   1185     "with open('species.txt') as species_file:\n",
   1186     "    species_text = species_file.read()\n",
   1187     "    species_tokens = species_text.split()"
   1188    ]
   1189   },
   1190   {
   1191    "cell_type": "markdown",
   1192    "metadata": {},
   1193    "source": [
   1194     "### Dictionaries"
   1195    ]
   1196   },
   1197   {
   1198    "cell_type": "markdown",
   1199    "metadata": {},
   1200    "source": [
   1201     "Dictionaries are extremely important in Python.  For particular details on\n",
   1202     "dictionaries refer to\n",
   1203     "[Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).\n",
   1204     "From that tutorial we have a few comments on dictionaries:\n",
   1205     "\n",
   1206     "> Unlike sequences, which are indexed by a range of numbers, dictionaries are\n",
   1207     "> indexed by keys, which can be any immutable type; strings and numbers can\n",
   1208     "> always be keys.\n",
   1209     ">\n",
   1210     "> It is best to think of a dictionary as an unordered set of key: value pairs,\n",
   1211     "> with the requirement that the keys are unique (within one dictionary). A pair\n",
   1212     "> of braces creates an empty dictionary: {}. Placing a comma-separated list of\n",
   1213     "> key:value pairs within the braces adds initial key:value pairs to the\n",
   1214     "> dictionary; this is also the way dictionaries are written on output.\n",
   1215     ">\n",
   1216     "> The main operations on a dictionary are storing a value with some key and\n",
   1217     "> extracting the value given the key."
   1218    ]
   1219   },
   1220   {
   1221    "cell_type": "markdown",
   1222    "metadata": {},
   1223    "source": [
   1224     "Let's create a chemical species dictionary."
   1225    ]
   1226   },
   1227   {
   1228    "cell_type": "code",
   1229    "execution_count": 37,
   1230    "metadata": {},
   1231    "outputs": [
   1232     {
   1233      "name": "stdout",
   1234      "output_type": "stream",
   1235      "text": [
   1236       "{'H2': 'Hydrogen', 'O2': 'Oxygen', 'OH': 'Hydroxyl', 'H2O': 'Water', 'H2O2': 'Hydrogen Peroxide'}\n"
   1237      ]
   1238     }
   1239    ],
   1240    "source": [
   1241     "species_dict = {'H2':'Hydrogen', 'O2':'Oxygen', 'OH':'Hydroxyl', 'H2O':'Water', 'H2O2':'Hydrogen Peroxide'}\n",
   1242     "print(species_dict)"
   1243    ]
   1244   },
   1245   {
   1246    "cell_type": "markdown",
   1247    "metadata": {},
   1248    "source": [
   1249     "The entries to the left of the colon are the keys and the entries to the right\n",
   1250     "of the colon are the values.  To access a value we just reference the key."
   1251    ]
   1252   },
   1253   {
   1254    "cell_type": "code",
   1255    "execution_count": 38,
   1256    "metadata": {},
   1257    "outputs": [
   1258     {
   1259      "name": "stdout",
   1260      "output_type": "stream",
   1261      "text": [
   1262       "Hydrogen\n"
   1263      ]
   1264     }
   1265    ],
   1266    "source": [
   1267     "print(species_dict['H2'])"
   1268    ]
   1269   },
   1270   {
   1271    "cell_type": "markdown",
   1272    "metadata": {},
   1273    "source": [
   1274     "Pretty cool!\n",
   1275     "\n",
   1276     "Suppose we want to add another species to our dictionary.  No problem!"
   1277    ]
   1278   },
   1279   {
   1280    "cell_type": "code",
   1281    "execution_count": 39,
   1282    "metadata": {},
   1283    "outputs": [
   1284     {
   1285      "name": "stdout",
   1286      "output_type": "stream",
   1287      "text": [
   1288       "{'H2': 'Hydrogen', 'O2': 'Oxygen', 'OH': 'Hydroxyl', 'H2O': 'Water', 'H2O2': 'Hydrogen Peroxide', 'H': 'Atomic Hydrogen'}\n",
   1289       "Atomic Hydrogen\n"
   1290      ]
   1291     }
   1292    ],
   1293    "source": [
   1294     "species_dict['H'] = 'Atomic Hydrogen'\n",
   1295     "print(species_dict)\n",
   1296     "print(species_dict['H'])"
   1297    ]
   1298   },
   1299   {
   1300    "cell_type": "markdown",
   1301    "metadata": {},
   1302    "source": [
   1303     "Why should we use dictionaries at all?  Clearly they're very convenient.  But\n",
   1304     "they're also fast.  See [indexnext |previous |How to Think Like a Computer\n",
   1305     "Scientist: Learning with Python 3: 20.\n",
   1306     "Dictionaries](http://openbookproject.net/thinkcs/python/english3e/dictionaries.html)\n",
   1307     "for a decent explanation."
   1308    ]
   1309   }
   1310  ],
   1311  "metadata": {
   1312   "jupytext": {
   1313    "formats": "ipynb,md"
   1314   },
   1315   "kernelspec": {
   1316    "display_name": "Python 3",
   1317    "language": "python",
   1318    "name": "python3"
   1319   },
   1320   "language_info": {
   1321    "codemirror_mode": {
   1322     "name": "ipython",
   1323     "version": 3
   1324    },
   1325    "file_extension": ".py",
   1326    "mimetype": "text/x-python",
   1327    "name": "python",
   1328    "nbconvert_exporter": "python",
   1329    "pygments_lexer": "ipython3",
   1330    "version": "3.6.5"
   1331   }
   1332  },
   1333  "nbformat": 4,
   1334  "nbformat_minor": 2
   1335 }
	cs107-lecture-examples Example codes used during Harvard CS107 lectures
	git clone https://git.0xfab.ch/cs107-lecture-examples.git
	Log \| Files \| Refs \| README \| LICENSE