Having taken some of the best pieces out of Matlab, the Python community turned its attention to R; the other behemoth language of data science. Key to the functionality of R is its concept of the data frame, and the Python package pandas emerged to challenge in this arena. Pandas' data frame has proven extremely adept for data ingestion and manipulation, especially of time series data, and has now been linked into multiple packages, facilitating an easy end to end data analytics and machine learning experience.
It is in the area of machine learning in which Python has really separated itself from the rest of the pack. Taking a leaf out of R's book, the scikit‐learn module was built to mimic the functionality of the R module caret. Scikit‐learn offers a plethora of algorithms and data manipulation features which make some of the routine tasks of data science very simple and intuitive. Scikit‐learn is a fantastic example of how powerful the pythonic method for creating libraries can be.
2.3 Anaconda Python
2.3.1 Why Use Anaconda?
When you first pick up this book, it may be tempting to run off and download Python to start playing with some examples (your machine may even have Python pre‐installed on it). However, this is unlikely to be a good move in the long term. Many core Python libraries are highly interdependent, and can require a good deal of setting up – which can be a skill in itself. Also, the process will differ for different operating systems (Windows installations can be particularly tricky for the uninitiated) and you can easily find yourself spending a good deal of time just installing packages, which is not why you picked up this book in the first place, is it?
Anaconda Python offers an alternative to this. It is a mechanism for one‐click (or type) installation of Python packages, including all dependencies. For those of you who do not like the command line at all, it even has a graphical user interface (GUI) for controlling the installation and updates of packages. For the time being, I will not go down that route, but instead will assume that you have a basic understanding of the command line interface.
2.3.2 Downloading and Installing Anaconda Python
Detailed installation instructions are available on the anaconda website ( https://conda.io/docs/user‐guide/install/index.html). For the rest of this chapter, I will assume that you are using MacOS – if you are not, do not worry; other operating systems are covered on the website as well.
The first step is to download the installer from the Anaconda website ( https://www.anaconda.com/download/#macos).
When you go to the website, you will see that there are two options for Anaconda; Conda; and Mini‐conda. Mini‐conda is a bare‐bones installation of Python, which does not have any packages attached. This can be useful if you are looking to have a very lean installation (for example, you are building a Docker image, or your computer does not have much space for programmes), but for now we will assume that this is not a problem, and use the full Anaconda installation, which has many packages preinstalled.
You can select the Python2 or Python3 version. If you are running a lot of older code, you might want to use the Python2 version, as Python2 and Python3 codes do not always play well together. If you are working from a clean slate, however, I recommend that you use the Python3 installation as this “future proofs” you somewhat against libraries which make the switch, and no longer support Python2 (the inverse is much rarer, now).
So long as you have chosen Anaconda version (not Mini‐coda), you can just double click the pkg file, and the installation will commence. Once installation is finished (unless you have specific reasons, accept any defaults during installation) you should be able to run.
$ > conda list
If the installation is successful, a list of installed packages will be printed to screen.
But I already have Python installed on my computer? Do I need to uninstall? Anaconda can run alongside any other versions of Python (including any which are installed by the system). In order to make sure that Anaconda is being used, you simply have to make sure that the system knows where it is. This is achieved by editing the PATH environment variable. In order to see whether Anaconda is in your path, run the following command in a Terminal $> echo $PATH To check that Anaconda is set to be the default Python run: $> which python NB the PATH variable should be set by the Anaconda installer, so there is normally no need to do anything.
From here, installing packages is easy. First, search your package on Anaconda's cloud ( https://anaconda.org/), and you will be able to choose your package. For example, scikit‐learn's page is at https://anaconda.org/anaconda/scikit‐learn. On each page, the command for installing is given. For scikit‐learn, it looks like this:
$> conda install –c anaconda scikit-learn
Here, the –c flag denotes a specific channel for the conda installer to search to locate the package binaries to install. Usefully, this page also shows all the different operating systems which the package has been built from, so you can be sure that the binary has been built for your system.
Task: Install Anaconda, and use the instructions below to ensure that you have TensorFlow installed on your system
2.3.2.1 Installing TensorFlow
2.3.2.1.1 Without GPU
Neural network training can be significantly accelerated through the use of graphical processing units (GPUs), however they are not strictly necessary. When using smaller architectures and/or working with small amounts of data, a typical central processing unit (CPU) will be sufficient. As such, GPU acceleration will not be required for many of the tasks in this book. Installing Tensorflow without CPU involves a single conda install command:
$> conda install –c conda-forge tensorflow
To make use of TensorFlow's GPU acceleration, you will need to ensure that you have a compute unified device architecture (CUDA)‐capable GPU and all of the required drivers installed on your system. More information on setting up your system for GPU support can be found here: https://www.tensorflow.org/install/gpu
If you are using Linux, you can greatly simplify the configuration process by using the TensorFlow Docker image with GPU support: https://www.tensorflow.org/install/docker
Once you have the prerequisites installed, you can install TensorFlow via:
$> pip install tensorflow
We recommend sticking to conda install commands to ensure package compatibility with your conda environment, however a few earlier examples made use of TensorFlow 1's low‐level application programming interface (API) to illustrate lower‐level concepts. For compatibility, the earlier low‐level API can be used by including the following at the top of your script:
import tensorflow.compat.v1 as tf tf.compat.v1.disable_eager_execution()
This has been included in the code examples wherever necessary.
Note: with the latest version of TensorFlow, this command will install TensorFlow with both CPU and GPU (if available) support .
Jupyter notebooks provide a method to easily create interactive documents capable of hosting and running Python code. This is a great way to work for many scientific applications, allowing you to incorporate descriptive text alongside executable code – helping others to understand and reproduce your work. In this way, they are an excellent tool for collaboration, and can be used to build living documents in which code can be updated, and visualisations can be easily rerun with new data. To use Jupyter notebooks, you will need to first install Jupyter by running:
Читать дальше