Running fast.ai lesson notebooks in Colab + Google Drive


#1

I’ll be posting roughly the same content on my personal blog but I decided to share this content with the Machine Learning Tokyo community first

Background

fast.ai is with no doubt one of the best resources out there for anyone interested in getting into Deep Learning with a top-down hands-on approach.

They provide an interesting platform with a video lesson going through the contents and a Jupyter Notebook for you to actually get around coding stuff. To get you all setup for the coding part they provide detailed installation steps for using a few cloud providers (AWS, Crestle, Paperspace) and also have a Kaggle kernel that you can just clone.

However many of us don’t feel like paying for a cloud provider and for some reason might not want to use the Kaggle platform, or maybe you’re just adventurous and love colab like me feel compelled to see if you can actually make it work. So let’s get into the actual work.

This will be divided in basically 2 steps:

  1. Getting the files into Google Drive (should be the simple part)
  2. Setting up the Notebook to work with Colab and Google Drive (not so simple)

1. Google Drive

There are a few ways of doing this but mainly what you have to do is:

  • Clone the git repository https://github.com/fastai/fastai.git
  • Upload the folder to Google Drive
  • Upload any other data necessary for the lessons to an accessible place (dogs and cats in the case of the first lesson)

In my case, I wanted to make it easier to sync the repository so I just mounted Google Drive in my machine (you can get the software here) and cloned the repository in the mount folder. Then created a folder in there for the cats and dogs data and unzipped it all there. Boom, just let Drive sync everything automatically (some 30,000 files) and you’re done. Obviously you can clone the repository and extract the data and then upload all that through the web interface by just dragging and dropping the folders.

2. Setting up the Colab Notebook

After you upload all the files to Google Drive you can just go to fastai/courses/dl1 and open the notebook called lesson1.ipynb

To make things less confusing below is image of how the fully setup notebook looks like:

And now to the details.

Reading data from Google Drive (Mounting the drive)

First you’ll need to enable Colab to read your drive data. I found so many different ways involving downloading packages and an OCaml driver but after out of curiosity opening the “Code Snippets” tab in Colab itself I found the correct answer:

from google.colab import drive
drive.mount('drive')

This will mount your Google Drive to a folder called drive and it will appear on the left-side panel, in the “Files” section.

Installing the required libraries - Part 1

There’s a bunch of libraries that are required to run the notebook. I won’t go too deep into this. You’re always welcome to Google the package names to learn what they are about

!pip3 install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp36-cp36m-linux_x86_64.whl
!pip3 install torchvision
!pip install bcolz
!pip install graphviz
!pip install sklearn_pandas
!pip install isoweek
!pip install pandas_summary
!pip install ipywidgets

Installing the required libraries - Part 2

This could actually be all put together but I separated these just to for the sake of better understanding (and remembering) that the ordering of these actually mean something.

!pip install fastai==0.7
!pip install torchtext==0.2.3
!pip install Pillow==4.0.0

When you pip install a version of a library it will upgrade libraries according to its dependencies. The problem is that sometimes you get into some conflicts.

  • fastai has to be version 0.7 as this is the latest stable release. The present course is based on this version. Not setting a version number will cause pip to install version 1.x
  • fastai requires version 0.2.3 of torchtext or it will throw errors when trying to import fastai libraries
  • When you install the libraries above (I believe pytorch, not sure) they end up updating Pillow from the preinstalled version 4 to version 5, however that ends up causing an error inside Colab and the easiest solution is to just downgrade it to version 4 after everything is done.

Ok! You should be good to go now!

Bonus: Setting the path for the Dogs and Cats task

For those of you who got stuck on how to specify the path to your image folder, here is a tip :slight_smile:

When you mount it using external libraries your Drive content will be directly under the drive folder, however using the “official” way I found out that it creates another folder “My Folder”. I haven’t tested but this might change depending on your locale. The best way to check this is to click on the drive folder on the left pane and check for yourself. My path looked like the one below as I extracted the picture data inside of a folder called data inside of the dl1 folder.

Ok, now you’re all set for real. Now you can enjoy fast.ai lessons inside of Colab as well anywhere in the world without having to pay for an cloud platform.

This is no expert work. This was based on a lot of trial and error. Feel free to comment with better ideas or correct me if I made any mistakes. I might come back to this article and edit a few things as I have more time to figure things out and understand them better.

I’d like to give credit to this article which was the base for most of my work. It was really just a lot of tweaking around the contents here (in Japanese)


#2

Thank you for the post Francisco!

Also, the link to your blog is pointing to a bad url: dallarosa.tumblr instead of dallarosa.tumblr.com :wink:


#3

Thanks! Fixed it to the new link: blog.dallarosa.me :slight_smile: