Learning Biothings

A bioinformatic journey

Introduction to Deep Learning

The Artificial Neural Network

An artificial neural network (ANN) is a computing system made up of a number of simple, highly interconnected processing elements (neurons), which process information by their dynamic state response to external inputs [2]. A neuron recieves a set of inputs. Each input is multiplied by the corresponding weight in that neuron, and the result is summed and passed through an activation function to provide the output of the neuron.

ANNs are typically organized in layers of neurons. Data is recieved through the input layer, which communicates with the hidden layer. At the end, the output layer produces the results. Each neuron computes a unique function (thanks to its trainable weights) on their input data. Neurons from the input layer compute a function directly on the raw data. Neurons from the first hidden layer compute a function on the output of the raw data, and so on. When all neurons of a layers recieve the input from all the neurons of the previous layer, that layer is known as fully connected layer.

ANN Training

The goal of a ANN training is to adjust the values of each neuron’s weights, such that the output of the ANN corresponds with the ground truth of the training data.
An ANN learns example by example, adjusting the weights based on the results obtained for each input data point, typically using the backpropagation algorithm. In this algorithm, the ANN processes an input (e.g., an image) through its various layers, until an output is produced (Forward phase). The output is then compared with the known label of the input (the ground truth), and the amount of error made by the ANN is computed using a loss function (e.g., cross-entropymean-squared error, etc). Once the error is known, the error is propagated back through the ANN, so that every neuron in the ANN ends up with an associated error value which roughly corresponds to its contribution to the output (and thus the error of the prediction)(Backpropagation phase). Using these values, each neuron calculates the gradient with respect to its weights (Weight update phase). This gradient is then used by an optimization method (e.g., stochastic gradient descent (SGD)) to obtain the updated weights for the neuron.

ANN for Classification

To use a ANN to solve a classification problem, we can simply add a function to the output which transforms the output of the previous layer to probabilities of the various classes. The most popular function of that type is the softmax, which transforms vector of arbitrary real values to a vector of equal length real values in the range (0, 1) that add up to 1.

There are many other applications to ANNs, which can be generated just by modifying the last layers of the network (regression, segmentation, image reconstruction, etc.).

ANN Training Parameters

Instead of processing all input data at once to compute the loss and update the weights, its common to use a subset (i.e., batch). Using batches produce less accurate estimates of the gradients, but it requires less memory and it is faster. Since ANN require datasets with many instances, processing the data through batches is in most cases a necessity. The parameter batch_sizespecifies the size of the batch.

Since the training process is a slow convergence of weights, the same data can be used several times for adjusting the weights without producing overfitting. The number of times one processes the whole dataset through the ANN is called the epochs. The optimal number of epochs is one which stops when error converges. There are techniques to optimize it, such as early stopping.

Limitations of fully connected networks

Number of parameters (weights)

Fully connected layers contain a lot of parameters (one for every connection between neurons).

When the number of neurons per layer and the number of layers increase, the number of parameter explodes. This increases the computational cost of training fully connected ANN, and the time required for convergence. It also makes it easier to end up overfitting.

E.g., given an input image of 1000×1000 pixels, and 1 million hidden units, the total number of parameters is 10^12!.

Two dimensional Data

Certain data is naturally structured in 2D, using both horizontal and vertical axis. Examples of that are images, board games (e.g., chess, go), etc. Fully connected ANNs may need lots of data to find the complex non-linearities that characterize these domains. Regardless, 2D data has certain properties which can be included in the design of the ANN, to guide and speed up the process.


The concept of convolution was created to exploit known properties of 2D data, so that more efficient ANNs could be designed and trained.

Basically, in 2D data we know that the vertical and horizontal dimensions are relevant for the characterization of the data (two consecutive pixels are more related than two non-consecutive ones), hence, we could define neurons that look only at consecutive parts of the image (e.g., 3×3 squares). These filters would learn to recognize subpatterns of the whole input, instead of trying to find a pattern of all the input as a fully connected neuron does. Since the output of a neuron is one value, the output of the convolutional neuron will be the sum of the data within the 3×3 window, weighted by the weights learnt by the neuron. This allows us to define and use neurons which have very few parameters (i.e., weights) when compared to fully connected neurons, while also exploiting the 2D structure of the data.

At the same time, we also know that a 2D pattern that occurs on a certain part of the data, may occur in another part with similar meaning. Hence, we can use the same neuron that looks for the 3×3 squares to look all over the image. That is known as share weights. In essence, it allows us to use the same neuron with a fixed input size (e.g., 3×3) to process the whole input. Reusing neurons this way allows us to define less neurons per layer.


For a given data point, we may have more information than just one value. For example, in color images we have three channels, R, G and B. Convolution filters work in full depth, by considering all channels.

Hence, filters have weights for all depth layers. These are usually known as channels.

A given convolutional neuron of a first layer, for example with weights 3x3x3, produces a 2D representation of the data, which corresponds to what the neuron sees on all three channels. Padding is added to complete the border data points. Each neuron produces data which then becomes a channel for the next layer.


To gain invariance to small changes in location, and also to reduce the size of the network and increase its efficiency, a common approach is to perform pooling between convolutional layers. Pooling simply reduces neighborhood values (typically using a max or average function).

This process reduces the height and width of the data, but leaves the same number of channels.

Typical Architectures

Most CNN architectures use the same basic scheme:
– convolution layer
– pooling layer
– convolution layer
– …
– fully connected layer
– softmax

This architecture provides the best result for image recognition challenges (ImageNet), reaching precision above human level. Beyond that, there are many possible modifications which allow CNN to adapt to diferent problems, such as:
– Removing the fully connected layers of the end, to avoid size restrictions on input (fully convolutional)
– Adding a deconvolution network at the end, for image generation
– Fixing parameters (transfer learning for fine tunning), or correlations (transfer style)

What do CNN filters learn to recognize?

Each filter learns weights which allow it to identify a given pattern in the input. The CNN architecture filters, go from the simple to the abstract through the various layers of the CNN, and also thanks to pooling. Thus, the filters learnt vary in complexity depending on their location within the network.


[1] https://en.wikipedia.org/wiki/Artificial_neuron
[2] “Neural Network Primer: Part I” by Maureen Caudill, AI Expert, Feb. 19893[1] http://colah.github.io/posts/2014-07-Understanding-Convolutions/





Tensorflow is an open sourse software library for numerical computation. It’s designed to work with Deep Learning (DL) and Machine Learning (ML) problems. However the system is general enough to deal with problems from other domains as well.

It provides High Level APIs and Low Level APIs. The High Level APIs is useful to try already build models but it is difficult to customize such models. On the other side, by using the Low Level API the user has freedom to easily build its own models,train and run them.

Tensorflow Main Concepts

The main component of Tensorflow is the tensor. Everything in Tensorflow works around tensors.

tensor consists of a set of primitive values shaped into an array of any number of dimensions. It has two main properties:

  • rank: number of dimensions
  • shape: tuple of integers specifying the array’s length along each dimension

In a nutshell, any Tensorflow program consist into a serie of operation that consume and produce new tensors.

Tensorflow workflow is divided into two main sections/parts:

  1. Building a computational graph
  2. Running the computational graph


The graph is just a series of operations that work over tensors arranged into a graph. The graph is composed by two type of objects:

  • Operations: Nodes of the graph. Consume and produce tensors
  • Tensors: Edges of the graph


There are different types of tensors:

  • tf.Variable: This is the only tensor that is not immutable (i.e., its value can change)
  • tf.constant: constant tensor
  • tf.placeholder: placeholder for a tensor that will be always fed. It will produce an error if it is evaluated. It must be fed using the feed_dict optional argument when evaluated.
  • tf.SparseTensor: sparse tensor represented as three separeate dense tensors: indicesvalues and dense_shape


There are a wide range of operation provided by Tensorflow:

  • add
  • substract
  • multiply
  • negative
  • abs
  • conv2d
  • relu
  • max_pool

Although we can aplly custom functions to the tensors it is important and recommended to use the operations provided by the Tensorflow library. Tensorflow operations are implemented efficiently taking advantadge of how the framework works in such a way that when we execute the graph the operation are arranged and ran efficiently.

Building the graph

At time of building the graph what we do is to design what or model is gonna do, which calculations it will perform, etc. It’s like a blueprint, we are just designing the workflow of our model but not running it.

Thus, we don’t need values at designing time.

As an example we could design a graph that computes the equation:2×2+3.

The tensorflow code to build that graph is:

import tensorflow as tf

two = tf.constant(2,tf.float32,name='2')
three = tf.constant(3,tf.float32,name='3')
x = tf.placeholder(tf.float32,shape=(),name='x')
x_x = tf.multiply(x,x,name='xx')
x_x_2 = tf.multiply(x_x,two,name='2xx')
y = tf.add(x_x_2,three,name='y')

If we want to see the resulting graph Tensorflow provides a tool called Tensorboard. Tensorboard allows us to see the designed graph:

This graph is just a design of what we want to compute but nothing has been computed yet. Tensorflow will arrange efficiently all the computations when we evaluate the graph.

Once we have built the graph that describes our model its time to execute/evaluate it.

Executing the graph

In order to evaluate the tensors we need to instantiate a Tensorflow Session (tf.Session). If we look at the code above we can see that we have different tensors and ops. To evaluate them we can pass to the session the desired tensor/operation to evaluate. The Tensorflow inference engine will manage how to make the computations internally.

We create the session:

sess = tf.Session()

For example if we want to evaluate the tensor two we will write:

result = sess.run(two)
print('result: ', result)

That will output:

Tensor("2:0", shape=(), dtype=float32)
('result: ', 2.0)

Here we can see that two is a Tensor and its value when evaluated is the float 2.0. Moreover if we evaluate y:

result = sess.run(y)
print('result: ', result)

The output is:

Tensor("y:0", shape=(), dtype=float32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run

We can see that y is a tensor. However, when we evaluate y an exception is raised. That is because we are evaluating a tf.placehodler that has not been fed. We need to initialize it with the optional argument feed_dict.

result = sess.run(y,feed_dict={x:3})
print('result: ', result)

Now that we gave the value of 3 to the x tensor the result of the equation is prompt:

('result: ', 21.0)

Deep Learning Approach

In order to build and train Deep Neural Networks (DNN) the steps to follow are essentially the same listed above. Furthermore, the design of the graph and the tensors and operations to use are more complex. But the main concepts remain the same!

In this hands-on we are going to provide encapsulated functions that will make the DNN building code easier and we will focus into the feeding phase, that is how we build the input pipeline to feed the DNN.

When we have built a DNN we need to feed them in order to train it. DNN are built for a variety of tasks. Taking as an example a Convolutional Neural Network (CNN) that classifies images it learns during the training from the examples given from our dataset. Tensorflow provides the Dataset API to build our Dataset and feed the NN with its examples.

Dataset API

In the Tensorflow context a Dataset is composed by a serie of elements with the same structure. An element is composed by one or more tensors called components having each of them its own type and shape.

  1. In order to define an input pipeline to our model (the Dataset) we need to start defining a source from which build the Dataset. The nature of the source could be different:
    • data in memory
    • TFrecord format files in disk
    • csv files
    • txt files
  2. However, we can further create a new Dataset by applying one of more transformations involving different datasets (e.g., join two datasets, …).
  3. Finally, in order to consume the values of the Dataset to be fed into the model an Iterator must be used.

We are going to see two main cases:

  • We can load the whole dataset in memory
  • Our dataset does not fit in memory

Dataset fits in memory

If our dataset fits in memory means that we will have all the elements from the dataset in some sort structrure:

  • List of images: [[128,128,3],[128,128,3],…]
  • Dict: {‘name’: [‘Pep’,’Maria’,…],’age’: [14,29,…],’height’: [166,170,…]}

Let’s say that we have our features in a variable called features (does not matter if it is a list or a dictionary) and the corresponding labels to that features into a variable called labels. Notice that their dimensions must be the same (if we have features for 10 examples we need to have 10 labels as well).

Then to build our Dataset we just need to use the function slice_from_tensors:

dataset = tf.data.Dataset.from_tensor_slices((features,labels))

Now we have our dataset source. If we have two datasets dataset_1 and dataset_2 containing instances with the same structure we can build a new dataset by applying a transformation:

new_dataset = tf.data.Dataset.zip((dataset_1,dataset_2))

Tensorflow provides a wide set of transformation to apply to the Dataset instances.

Dataset does not fit in memory

Usually in DL the bigger the data the better the training of the DNN. Thus, often it’s not possible to load the whole dataset into memory. We will focus on that case.

In those cases we need to have our files stored at disk in some sort of format. Tensorflow recommends to use its own binary format called TFRecord. Another common case is that we have the raw images stored in our filesystem and a txt or csv file describing those images (name, label, etc.).

We assume that we have a csv file where each line describes one instance of our dataset:


Tensorflow provides a subclass of Dataset called TextLineDataset to deal with this kind of input. Thus, the code will look like:

dataset = tf.data.TextLineDataset(csv_path).skip(1)

At this point we just have built a Dataset object being each line of the csv file an element. Notice that we skipped the first line cause is just the header. We want however a Dataset where the elements are composed just by two components: the image and the label. To accomplish that we can apply a transformation to the current Dataset. The Dataset API provides a map function in order to apply an specific function to each of its elements. In our case we will apply a parsing function.

dataset = dataset.map(_parse_line)

So the function _parse_line will be something like:

def _parse_line(line):
    # Decode the line into its fields
    image_name,image_path,label = tf.decode_csv(line, record_defaults=[tf.string, tf.string, tf.int32])
    image = tf.decode_image(tf.read_file(image_path), channels=3)
    return image, label

Thus, by applying _parse_line the resulting dataset will be composed by an image and the corresponding label. You may think that if we do that to every image our Dataset will not fit in memory. The key point here is how we consume tha data from the Dataset.

As commented before, to consume data from the Dataset we need to use Iterators.


Tensorflow provides 4 iterators:

  • one-shot: suited for most of the cases
  • initializable: when we have parameters involved in the creation of the Dataset
  • reinitializable: allows to be used with multiple Dataset’s
  • feedable: allows more parametrization

We will focus on the one-shot iterator since it is suitable for most of the cases. So to create the iterator we just:

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

Above we have created the iterator and one operation that when evaluated in the current Session will pop the next element. Thus, when we evaluate next_element we will obtain a Tensor with the image and the label*. However, when we train a DNN we usually use batches of elements instead of a single element.

The Dataset API provides a serie of functionalities to deal with that situation. We are going to see some of them:

  • shuffle: shuffles the dataset in such a way that the order in which the DNN sees the elements is randomized
  • repeat: allows us to walk through the Dataset multiple times
  • batch: batches the Dataset in such a way that when we evaluate next_element we will obtain a batch instead of a single instance

For example, if we want to train our network for 10 epochs (sees the whole Dataset 10 times),with the Datased shuffled and with batces of 32 images the code will look like:

dataset = dataset.shuffle()
dataset = dataset.batch(32)
dataset = dataset.repeat(10)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

That will make the work. The only remaining thing is to take care of how we design the input of our DNN.

In the given example where we design a graph that computesy=2×2+3we saw that the input placeholder is just a float number so its shape its simple. Now, we have as an input a tensor with 32 instances of images ([32,128,128,3]) and the 32 labels ([32,1]). Thus we have to prepare the placeholders for that data:

images_placeholder = tf.placeholder(tf.float32, shape=(32,128,128,3))
labels_placeholder = tf.placeholder(tf.int32, shape=(32))

And feed the training with the feed_dict argument:

# we evaluate the next element
images,labels = sess.run(next_element)
# we built the dict that we will use to feed the input
# we will have a Tensorflow operation for the training called train_op

We will run this code as many times as needed until we consume the whole Dataset (10 times in this case). We will know thay by catching the exception OutOfRange that is raised when the Dataset is consumed.





In order to visualize the graph that we designed through Tensorflow code we just have to add the following line:


Notice that we need to have initialized a Session (sess in this case). It is important to put this line below the code where we design our computational graph. That will produce a file called eventsXXXX in the specified folder.
After that, we open a terminal, navigate to the specified folder and run tensorboard:

cd /path/to/folder
tensorboard --logdir ./

Then we open a browser and navigate to localhost:6006.





Inhere we provide the reader with some code snipets to play around Tensorflow. Such snipets lack of some parts of the code that have to be fullfilled by the reader.

Those parts are the related to the building phase of the Dataset and how the NN has to be fed. Although we provide a pre-defined NN to work with there is also another blank space where the user can define its own NN designs.

Files to Download


Dummy Dataset Creation

In order to build a dataset we provide dataset_creation.py in order to generate an artificial Dataset with the desired number of instances. This code allows to generate images containing a square or a triangle.

In order to use the snippet we call it giving four parameters:

  • shape: {square,triangle} to define the shape to draw
  • num_images: how many images we want to generate
  • output_directory: where to store the images
  • size: size of the images (squared images)


python dataset_creation.py --shape square --num_images 10000 --output_directory ./squares --size 128

That will create a folder in the current directory called squares containing 10000 images of 128x128x3 of squares. Moreover, a csv describing the generated images is stored as well.
We do the same for the triangles. We join the csv files:

cat squares/info.cv triangles/info.csv > dataset.csv

Thus, we will have the csv file describing our dataset.

Provided Code

We have 5 files:

  • dlho2018.py: provides some functions needed to facilitate the work of the hands-on
  • dlho2018_nns.py: file where to put the different NN designs. It contains one pre-defined
  • tf_layers.py: file containing the code of the layers
  • training.py: file with the code to train a NN (NEED TO BE COMPLETED)
  • evaluating.py: filw with the code to evaluate the trained NN (NEED TO BE COMPLETED)

training.py and evaluating.py are partially implemented. They need to be fullfilled with some code explained before (Tensorlow tab). Moreover, the dlho2018_nn.py contains an already defined NN but we provide some abstraction to let the reader build its own models. The layers provided (implemented in tf_layers) are:

  • j_fc_layer: fully connected layer
  • j_conv_layer: convolutional layer
  • j_dropout_layer: dropout layer
  • j_logits: logits layer
  • j_flatten_layer: flatten layer

Let’s check the existing NN:

flat = j_flatten_layer(images,'flat')
fc1 = j_fc_layer(flat,128,'fc1')
fc2 = j_fc_layer(fc1,32,'fc2')
fc3 = j_fc_layer(fc2,64,'fc3')
logits = j_logits(fc3,2,'logits')

Notice that by using this code we just need to write the layer to add taking into account that the input of each of them is the output of the previous one. However the first one is the images given to the function and the last one has to be logits which is the one used to compute the classification output.





In order to setup the environment is recommended to use a virtual environment. We create it through the command virtualenv

virtualenv env
source env/bin/activate

Now we are logged to the environment that we just created. From now on, everything that we intall will no affect our system, it will be installed just inside the virtual environment.

We install tensorflow and Pillow.

pip install tensorflow
pip install Pillow

Notice that we need to use the code that we will provide from the terminal where we have logged into the virtual environment.



Post a comment

%d bloggers like this: