Building a dev environment for Deep Neural Networks
Despite an astonishing growth in recent years, both in business applications and technical capabilities, deep learning is still an experimental field.
Many applications such as object detection or recognition have been achieved with managed services, such as Amazon Rekognition, providing accurate enough prediction for generic use cases.
Some not trivial applications, however, can benefit from model re-training over custom datasets, achieving even more accuracy and providing continuous training and fine tuning of models.
Moreover, there are many use cases where model code is already available and needs to be taken into production, or already available Jupyter Notebooks can save the day. Here come solutions such as Amazon SageMaker which is super effective in managing model lifecycle from a working model in a pre built repo to a fully scalable inference endpoint (now even better with elastic GPU for inference and SageMaker Neo).
But what happens before having «a working notebook»?
Trial and error model development
Before model retraining with customer dataset, before model hyper parameters fine tuning, even before optimized data loading from S3.. there was a notebook!
In many experimental fields (did someone said Kaggle competition?) a built-in model is not enough, you have to start from scratch diving into your dataset with the help of a Jupyter Notebook interface.
Such notebooks are pretty much a testing environment where data is downloaded, adjusted, re-packaged just to begin.
Right after that our deep neural network architecture is built, probably starting from an existing ResNet or a VGG then removing and adding layers.
At each stage of this process, the network should be tested, validation loss and metrics checked to understand whether you are overfitting or made a poor choice for a given learning rate.
Model building instances in Sagemaker do not fit the case because you need GPUs on board and ml.p2.any instances are very expensive compared to DeepLearning AMIs. Moreover you cannot opt-out GPU once you have finished building your network architecture and move on engineering model serving with SageMaker.
GPU instances for model building are a requirement only for deep neural networks, there are plenty of use cases where standard ml.t2 instances are more than enough.
Here at Neosperience we choose to add a deep learning model building environment «before» model building stage in SageMaker (that we renamed «engineering» stage to avoid misunderstandings). Such environment must comply as much as possible with SageMaker model (click, fire your env and be productive), so we adopted some opinionated configurations that I want to share here. The first and foremost of them is of course..
Running Jupyter Notebook as service for automated IDE startup
This should be as easy as starting up a new AMI in EC2. Unfortunately Ubuntu configs is not as easy as everyone would like and we ended up having to tune up some configurations you can find online.
Let’s start with a fresh new EC2 instance fired up from AWS Marketplace with an Ubuntu and a p2.xlarge size.
New P3 instances are far more powerful, but we found more expensive than “old” P2. Since we do not forecast heavy workload when developing a new model, the standard xlarge tier will be more than enough. However, the following configs are suitable for every DeepLearning AMI with Ubuntu OS. We also suppose you are able to log into this instance using SSH with generated PEM file.
Step 00 — Create a DeepLearning AMI instance
The first task to accomplish is to setup a new EC2 instance, of type DeepLearning AMI with Ubuntu flavour. Choosing the right instance is not trivial, we usually stick for the p2.xlarge which is the smallest instance onboarding a nVidia GPU available.
Going on with the configuration, please remember to attach a sufficient disk (we usually opt for 500GB) and open port 8888 in the new security group. We could also setup our instance to be reachable on port 80, which is the browser’s default, but some advancements we’ll discuss in a future article would require having port 80 available to host REST endpoints.
Step 01 — Locate Jupyter executable
Usually it is inside a .jupyter folder inside your current home, but for DeepLearning AMI, would be better check it out.
Step 02 — Generate a sha encoded password
Before jumping into editing madness, spend a second generating a new password with proper encoding in our python shell.
python3 # enter interactive shellfrom notebook.auth import passwd;
Double enter a password (please choose carefully because it is going to be exposed to the whole world) and note it.
Step 03 — Configure Jupyter Notebook to run from anywhere
Now we have to configure Jupyter Notebook environment. Run the following command from a bash shell.
jupyter notebook --generate-config
It will ask you about overwriting an existing config, then once you approve it is going to write a new config file under /home/ubuntu/.jupyter/jupyter_notebook_config.py
Now edit .jupyter/jupyter_notebook_config.py and look for some properties that need to be set accordingly. Remember to fill YOUR_PASSWORD_HERE with generated password in the previous step
# uncomment and set the following properties
c.NotebookApp.allow_origin = '*'
c.NotebookApp.allow_root = False# This enable your notebook being reached from outside
c.NotebookApp.ip = '0.0.0.0' c.NotebookApp.open_browser = True
c.NotebookApp.password = 'sha1:YOUR_PASSWORD_HERE'
c.NotebookApp.port = 8888
Step 04 — Write a service config file
Fire up your preferred text editor (i suggest vi since is more complex than nano, but more powerful) and create the following file named jupyter.service.
ExecStart=/home/ubuntu/anaconda3/bin/jupyter notebook --config=/home/ubuntu/.jupyter/jupyter_notebook_config.py
Please be sure to replace /home/ubuntu/anaconda3/bin/jupyter with your output from step 01
Step 05 — Start daemon
Now that our service configuration is done, we must move it to a shared folder, change its owner and start system service. Please remember to force daemons reload, making ubuntu aware of the service configuration we created. If anything goes wrong, just run sudo systemctl status jupyter.service to obtain a decent logging of what happened to our daemon.
sudo mv jupyter.service /lib/systemd/system/
sudo chown root:root /lib/systemd/system/jupyter.servicesudo systemctl start jupyter.service
sudo systemctl daemon-reload
sudo systemctl restart jupyter.service
Then finally, if you have attached an Elastic IP to your instance, this is what should appear every time the EC2 instance is started
Providing Jupyter Notebook as a service on GPU instances is the first step to achieve a working development pipeline for deep learning models. More customization can be achieved starting from AWS DeepLearning AMIs such as auto switch off, S3 auto mount and more fine tuning.
We’ll discuss some of them in a future article.