About this project
This book has one goal — to help developers, researchers, and students just like yourself become experts in deep learning for image recognition and classification.
Whether this is the first time you've worked with machine learning and neural networks or you're already a seasoned deep learning practitioner, Deep Learning for Computer Vision with Python is engineered from the ground up to help you reach expert status.
Inside this book you'll find:
- Super practical walkthroughs that present solutions to actual, real-world image classification problems, challenges, and competitions.
- Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision, but their implementations as well.
- A no-bullshit teaching style that is guaranteed to cut through all the cruft and help you master deep learning for image understanding and visual recognition.
To learn more about this book and how it can help you on your deep learning journey, just keep reading.
Are you just getting started in deep learning? Don't worry, you won't get bogged down by tons of theory and complex equations. We'll start off with the basics of machine learning and neural networks. You'll learn in a fun, practical way with lots of code. You'll be a neural network ninja in no time, and be able to graduate to the more advanced content.
Are you already a seasoned deep learning pro? This book isn't just for beginners — there's advanced content in here too. You'll discover how to train your own custom object detectors using deep learning. You'll build a custom framework that can be used to train very deep architectures on the challenging ImageNet dataset from scratch. I'll even show you my personal blueprint, that I use to determine which deep learning techniques to apply when confronted with a new problem. Best of all, these solutions and tactics can be directly applied to your current job and research.
Regardless of your experience level, you'll find tremendous value inside Deep Learning for Computer Vision with Python, I guarantee it.
Deep Learning for Computer Vision with Python will make you an expert in deep learning for computer vision and visual recognition tasks.
Inside the book we will focus on:
- Neural Networks and Machine Learning
- Convolutional Neural Networks (CNNs)
- Object detection/localization with deep learning
- Training large-scale (ImageNet-level) networks
- Hands on implementation using the Python programming language and the Keras (which is compatible with either TensorFlow or Theano) + mxnet libraries
Since this is a huge amount of content to cover, I've decided to break the book down into three volumes called "bundles". A bundle includes the eBook, video tutorials, and source code for a given volume.
Each bundle builds on top of the others and includes all content from the lower volumes. You should choose a bundle based on how in-depth you want to study deep learning and computer vision.
I've included a quick breakdown of the three bundles below — the full Table of Contents for each bundle can be found later on this page:
- Starter Bundle: A great fit for those taking their first steps towards deep learning for image classification mastery. You'll learn the basics of (1) machine learning, (2) neural networks, (3) Convolutional Neural Networks, and (4) how to work with your own custom datasets.
- Practitioner Bundle: Perfect for readers who are ready to study deep learning in-depth, understand advanced techniques, and discover common best practices and rules of thumb.
- ImageNet Bundle: The complete deep learning for computer vision experience. In this bundle I demonstrate how to train-large scale neural networks on the massive ImageNet dataset. You just can't beat this bundle.
This book is for developers, researchers, and students who have at least some programming/scripting experience and want to become proficient in deep learning for computer vision & visual recognition.
- Are a computer vision developer that utilizes OpenCV (among other image processing libraries) and are eager to level-up your skills.
- Have experience with machine learning and want to break into neural networks/deep learning for image understanding.
- Are a college student and want more than your university offers (or want to get ahead of your class).
- Are a scientist looking to apply deep learning + computer vision algorithms to your research.
- Utilize computer vision algorithms in your own projects but have yet to try deep learning.
- Used deep learning in projects before, but never in the context of visual recognition and image understanding.
- Write Python/machine learning code at your day job and are motivated to stand out from your coworkers.
- Are a "machine learning hobbyist" who knows how to program and wants to understand what this "deep learning" thing is all about.
If any of these descriptions fit you, rest assured: you're the target student. I am writing this book for you.
Do I need prior programming/machine learning experience to get value out of this book?
Deep Learning for Computer Vision with Python assumes you have prior programming experience (e.g., you know what a variable function, loop, etc. are). It does not make any assumptions on your previous experiences with computer vision, machine learning, or deep learning.
That said, having some experience in both machine learning and computer vision are very helpful when working through this material.
If you have little-to-no experience with machine learning or computer vision, don't worry — this book is still for you...but I would highly recommend that you back a Kickstarter reward level that includes either my book, Practical Python and OpenCV, or my PyImageSearch Gurus course, so you can quickly level-up your current skill set before diving into deep learning.
We'll be utilizing the Python programming language for all examples in this book. Python is an extremely easy language to learn. It has an intuitive syntax and is super powerful — it is the best way to work with deep learning algorithms.
The primary deep learning library we'll be using is Keras (which can use either TensorFlow or Theano as a backend). The Keras library is developed and maintained by the brilliant François Chollet, a deep learning researcher and engineer at Google. I have been using Keras for years and it's hands-down my favorite deep learning package.
The secondary deep learning library we'll be using inside the book is mxnet, which is lightweight, portable, and flexible. The mxnet library provides bindings to the Python programming language and specializes in distributed, multi-machine learning — the ability to parallelize training across GPUs/devices/nodes is critical when training deep neural network architectures on massive datasets (such as ImageNet).
Finally, we'll also be using a few computer vision, image processing, and machine learning libraries such as OpenCV, scikit-image, scikit-learn, etc.
Python, Keras, and mxnet are all well-built tools that, when combined together, create a powerful deep learning development environment that you can use to master deep learning for computer vision and visual recognition.
Hey, I'm Adrian Rosebrock, a Ph.D and entrepreneur who has spent his entire adult life studying computer vision and machine learning.
Over the past 3 years alone I have:
- Started the PyImageSearch.com blog and published over 175+ tutorials and articles aimed at teaching computer vision, image processing, and machine learning.
- Authored a book, Practical Python and OpenCV, which has been featured on the official OpenCV.org website.
- Created PyImageSearch Gurus, an actionable, real-world course on computer vision and OpenCV. This course is the most comprehensive computer vision education online today, covering 13 modules broken out into 168 lessons with over 2,161 pages of content.
- Answered 10,000 emails and helped 1,000's of developers, researchers, and students learn the ropes of computer vision and machine learning.
As you can see, teaching computer vision, machine learning, and deep learning are my passions. And I want to pass these passions on to you.
If studying deep learning and visual recognition sounds interesting to you, I hope you'll consider helping me bring this book to life. You'll learn a ton about about deep learning and computer vision in a practical, hands-on way. And you'll have fun doing it.
See you on the other side!
P.S. It's also worth mentioning that students have loved my books and courses. Here are just a few quotes from past students of mine:
"Adrian possesses a very rare talent of making complex concepts easy to grasp." — Jean-Francois Parent
"I'm constantly recommending your [PyImageSearch.com] site to people I know at Georgia Tech and Udacity. While I consider Udacity the gold standard, I would rate your material at the same level. Keep up the good work." — Andrew Baker
"Adrian's book and blog have been integral in both computer vision skills and confidence building. Adrian delivers!" — MJ Woodward-Greene
The main reward for this Kickstarter is the Deep Learning for Computer Vision with Python eBook offered at a discounted rate from what it will be once the book is released to the public. These rates are exclusive to the Kickstarter campaign and will not be available once Deep Learning for Computer Vision with Python officially launches.
I have broken down the book into three volumes called "bundles" so you can decide which bundle is most appropriate for you based on:
- How in-depth you want to study deep learning, computer vision, and visual recognition.
- Your particular budget.
- Whether you would like a copy of my book, Practical Python and OpenCV, or access to my PyImageSearch Gurus course, to help you get up to speed with computer vision and machine learning before you start studying deep learning (I highly recommend this if you're new to the world of computer vision, image processing, or machine learning).
Each bundle will include:
- The eBook files in PDF, .mobi, and .epub format.
- Video tutorials and walkthroughs for each chapter in the book.
- All source code listings so you can run the examples from the book out-of-the-box.
- A downloadable pre-configured Ubuntu VirtualBox virtual machine that ships with all necessary Python + deep learning libraries you will need to be successful pre-installed.
- Access to the Deep Learning for Computer Vision with Python companion website, so you can further your knowledge, even when you're done reading the book.
- A hardcopy edition of the complete book delivered to your doorstep (ImageNet Bundle only).
A high-level overview of what's inside each bundle is included below. Each bundle builds on top of the others and includes all content from lower tiers. The complete Table of Contents for each bundle is listed in the next section.
The Starter Bundle is appropriate if (1) you are brand new to the world of machine learning/neural networks or (2) are on a budget. It begins with a gentle introduction to the world of computer vision and machine learning, builds to neural networks, and then turns full steam into deep learning and Convolutional Neural Networks. You'll even solve fun and interesting real-world problems using deep learning along the way.
While this is the lowest tier bundle, you'll still be getting a complete education. That said, for a more in-depth treatment of deep learning for computer vision, I would recommend either the Practitioner Bundle or ImageNet Bundle.
Bottom Line: The Starter Bundle is a great first step towards deep learning for image classification mastery. If your reason for going with this bundle is that you're new to the world of machine learning or computer vision, then you should absolutely look at the Practical Python and OpenCV + PyImageSearch Gurus add-ons below — both of these can be used to level-up your skills quickly.
The Practitioner Bundle is sure to be the most popular of the bunch and is geared towards readers who want an in-depth study of deep learning for computer vision.
Here I cover more advanced techniques and algorithms, including:
- Transfer learning
- Fine tuning
- Networks as feature extractors
- Cropping techniques
- Network ensembles
- HDF5 + working with large datasets
- Deeper CNNs
- Object detection/localization.
I also demonstrate how to train networks to compete in popular image classification challenges, specifically:
- How to hand-code the AlexNet architecture and train it on the Kaggle Dogs vs. Cats challenge, claiming a position in the top-25 leaderboard.
- Utilize a variant of VGGNet to compete in Stanford's cs231n Tiny ImageNet classification challenge...and take home the #1 position.
The Practitioner Bundle gives you the best bang for your buck. If you're even remotely serious about studying deep learning, you should go with this bundle.
Bottom Line: You should choose the Practitioner Bundle if you want to study deep learning for computer vision in-depth, but cannot afford the ImageNet Bundle. Again, be sure to take a look at the Practical Python and OpenCV + PyImageSearch Gurus add-ons below to help level-up your prior computer vision or machine learning skills.
The ImageNet Bundle is the most in-depth bundle and is for readers who want to train large-scale deep neural networks.
This bundle is also the only bundle that includes a hardcopy edition of the complete Deep Learning for Computer Vision with Python book mailed to your doorstep.
Inside this bundle I demonstrate how to construct an entire Python framework to train network architectures such as AlexNet, VGGNet, and SqueezeNet from scratch on the challenging ImageNet dataset.
Using the training techniques I outline in this bundle, you'll be able to reproduce the results you see in popular deep learning papers and publications — this is an absolute must for anyone doing research and development in the deep learning space.
I also provide a number of case studies, including training a network to predict the age/gender of a person based on photo and how to predict a car model type from an image.
Bottom Line: You should choose the ImageNet Bundle if you want the complete deep learning for computer vision experience and intend on training neural networks from scratch. When it comes to studying deep learning, you can't beat this bundle! Be sure to take a look at the computer vision book + course add-ons below to freshen up before you jump in!
After choosing a bundle, you should decide whether you would like any add-ons to help level-up your computer vision/OpenCV education before you begin your deep learning journey.
I highly recommend that you choose at least one of these add-ons to make the most out of Deep Learning for Computer Vision with Python as the add-ons will help you level-up any existing knowledge and be better prepared for the deep learning book.
Along with the Deep Learning for Computer Vision with Python book bundles, I'm also offering a Kickstarter-exclusive printing of my book, Practical Python and OpenCV. I will be individually numbering and hand-signing each copy, just for you.
Bottom Line: You should choose the Practical Python and OpenCV reward add-on if you have zero (or minimal) experience in computer vision or OpenCV and want to learn the basics in less than a weekend. Please see this page for more details on my book.
PyImageSearch Gurus is a online course and community I have meticulously designed to take you from computer vision beginner to expert. Guaranteed.
Inside PyImageSearch Gurus, you'll find:
- An actionable, real-world course on OpenCV and computer vision.
- The most comprehensive computer vision education online today, covering 13 modules, broken out into 168 lessons with over 2,161 pages of content.
- A community of like-minded developers, researchers, and students just like you, who are eager to learn computer vision and level-up their skills.
If you're a regular reader of the PyImageSearch blog, you'll know that I never discount the Gurus course (normally a one-time payment of $995).
This course is super in-depth, so know that you'll be getting a HUGE deal by going with any reward that includes this course (the cost of Deep Learning for Computer Vision with Python is practically free once you build in the price of the Gurus course). You can learn more about PyImageSearch Gurus here.
Bottom Line: You should choose the PyImageSearch Gurus add-on if you want to study computer vision in the same level of depth that you'll be studying deep learning once Deep Learning for Computer Vision with Python is released.
Below follows the list of chapters for each bundle that I have already planned out. More topics will be covered based on what you and other Kickstarter backers want to learn.
If you see a topic that is not on the list that you want me to cover, just send me a message or leave a comment. Remember, this is your book and I want to tune it to what you want to learn.
The Starter Bundle includes the following topics:
Take the first step:
- Learn how to setup and configure your development environment to study deep learning.
- Understand image basics, including coordinate systems; width, height, depth; and aspect ratios.
- Review popular image datasets used to benchmark machine learning, deep learning, and Convolutional Neural Network algorithms.
Form a solid understanding of machine learning basics, including:
- The simple k-NN classifier.
- Parameterized learning (i.e., "learning from data")
- Data and feature vectors.
- Understanding scoring functions.
- How loss functions work.
- Defining weight matrices and bias vectors (and how they facilitate learning).
Study basic optimization methods (i.e., how "learning" is actually done) via:
- Gradient Descent
- Stochastic Gradient Descent
- Batched Stochastic Gradient Descent
Discover feedforward network architectures:
- Implement the classic Perceptron algorithm by hand.
- Use the Perceptron algorithm to learn actual functions (and understand the limitations of the Perceptron algorithm).
- Take an in-depth dive into the Backpropagation algorithm.
- Implement Backpropagation by hand using Python + NumPy.
- Utilize a worksheet to help you practice the Backpropagation algorithm.
- Grasp multi-layer networks (and train them from scratch).
- Implement neural networks both by hand and with the Keras library.
Start with the basics of convolutions:
- Understand convolutions (and why they are so much easier to grasp than they seem).
- Study Convolutional Neural Networks (what they are used for, why they work so well for image classification, etc.).
- Train your first Convolutional Neural Network from scratch.
Review the building blocks of Convolutional Neural Networks, including:
- Convolutional layers
- Activation layers
- Pooling layers
- Batch Normalization
Uncover common architectures and training patterns:
- Discover common network architecture patterns you can use to design architectures of your own with minimal frustration and headaches.
- Utilize out-of-the-box CNNs for classification that are pre-trained and ready to be applied to your own images/image datasets (VGG16, VGG19, ResNet50, etc.).
- Save and load your own network models from disk.
- Checkpoint your models to spot high performing epochs and restart training.
- Learn how to spot underfitting and overfitting, allowing you to correct for them and improve your classification accuracy.
- Utilize decay and learning rate schedulers.
- Train the classic LeNet architecture from scratch to recognize handwritten digits.
Working with your custom datasets + deep learning is easy:
- Learn how to gather your own training images.
- Discover how to annotate and label your dataset.
- Train a Convolutional Neural Network from scratch on top of your dataset.
- Evaluate the accuracy of your model.
- ...all of this explained by demonstrating how to gather, annotate, and train a CNN to break image captchas.
The Practitioner Bundle includes everything in the Starter Bundle. It also includes the following topics.
Discover how to use transfer learning to:
- Treat pre-trained networks as feature extractors to obtain high classification accuracy with little effort.
- Utilize fine-tuning to boost the accuracy of pre-trained networks.
- Apply data augmentation to increase network classification accuracy without gathering more training data.
Work with deeper network architectures:
- Code the seminal AlexNet architecture.
- Implement the VGGNet architecture (and variants of).
Explore more advanced optimization algorithms, including:
- ...and how to fine-tune SGD parameters.
Uncover common techniques & best practices to improve classification accuracy:
- Understand rank-1 and rank-5 accuracy (and how we use them to measure the classification power of a given network).
- Utilize image cropping for an easy way to boost accuracy on your test set.
- Explore how network ensembles can be used to increase classification accuracy simply by training multiple networks.
- Discover my optimal pathway for applying deep learning techniques to maximize classification accuracy (and which order to apply these techniques in to achieve greatest effectiveness).
Work with datasets too large to fit into memory:
- Learn how to convert an image dataset from raw images on disk to HDF5 format, making networks easier (and faster) to train.
- Compress large image datasets into efficiently packed record files.
Compete in deep learning challenges and competitions:
- Compete in Stanford’s cs231n Tiny ImageNet classification challenge...and take home the #1 position.
- Train a network on the Kaggle Dogs vs. Cats challenge and claim a position in the top-25 leaderboard with minimal effort.
Detect objects in images using deep learning by:
- Utilizing naive image pyramids and sliding windows for object detection.
- Training your own YOLO detector for recognizing objects in images/video streams in real-time.
The ImageNet Bundle includes everything in the Starter Bundle and Practitioner Bundle. It also includes the following additional topics:
Train state-of-the-art networks on the ImageNet dataset:
- Discover what the massive ImageNet (1,000 category) dataset is and why it’s considered the de-facto challenge to benchmark image classification algorithms.
- Obtain the ImageNet dataset.
- Convert ImageNet into a format suitable for training.
- Learn how to utilize multiple GPUs to train your network in parallel, greatly reducing training time.
- Train AlexNet on ImageNet from scratch.
- Train VGGNet from the ground-up on ImageNet.
- Apply the SqueezeNet architecture to ImageNet to obtain a (high accuracy) model, fully deployable to smaller devices, such as the Raspberry Pi.
Unlock the same techniques deep learning pros use on ImageNet:
- Save weeks (and even months) of training time by discovering learning rate schedules that actually work.
- Spot overfitting on ImageNet and catch it before you waste hours (or days) watching your validation accuracy stall.
- Learn how to restart training from saved epochs, lower learning rates, and boost accuracy.
- Uncover methods to quickly tune hyperparameters to massive networks.
Discover how to solve real-world deep learning problems, including:
- Train a network to predict the gender and age of people in images using deep learning techniques.
- Automatically classify car type using Convolutional Neural Networks.
- Determine (and correct) image orientation using CNNs.
Have a suggestion for a topic you want to see covered?
Be sure to leave a comment on this Kickstarter or message me.
I've constructed the following flowchart to help you decide which bundle and reward tier is most appropriate for you:
If this Kickstarter campaign is successfully funded I intend on having the Starter Bundle 100% completed and released by September 2017. The Practitioner Bundle will be published by October 2017. And finally, the ImageNet Bundle will be released in November 2017. Hardcopy editions included in the ImageNet Bundle will ship shortly thereafter.
I have included a timeline of important events below:
Given my experience authoring blog posts, tutorials, books, and online courses, I am extremely comfortable with my writing abilities. I am confident that I can deliver these three bundles by the proposed deadlines in Autumn 2017.
EC2 GPU Instances.
I've already put my money where my mouth is and put $15,000 of my own money into my deep learning rig back in June 2016. This machine has helped me write code + gather results for approximately 60% of the chapters in Deep Learning for Computer Vision with Python thus far.
However, at the end of the day, it's still only one machine and I need more horsepower. In order to turbo-charge the result gathering process for the remaining 40% of the chapters, I intend to use (and am currently using) Amazon EC2 GPU instances.
Smaller GPU instances in the EC2 ecosystem start off at $0.65/hour (g2.2xlarge) and $0.90/hour (p2.xlarge). For even the larger instances, the prices skyrocket to $7.20/hour (p2.8xlarge) and $14.40/hour (p2.16xlarge).
I am already using the more expensive p2 instances on a daily basis to enable swifter result gathering and a bumped-up publish date, but as I mentioned above, these machines are not cheap.
Furthermore, the more funds this Kickstarter campaign raises, the more GPU instances I can have running in parallel, potentially allowing me to finish Deep Learning for Computer Vision with Python well before the November 2017 deadline.
Hiring an Editor.
A portion of the funds raised by this Kickstarter campaign will be used to hire an editor to ensure the final product is just as polished as the hood of your father's old '68 Ford Mustang convertible.
Risks and challenges
Like many Kickstarter campaigns, this project is already in the "alpha" phase. I have gathered results for approximately 60% of the chapters, and I have a clear path for gathering the remaining 40% of results to create a publishable work.
Once I have all necessary results, I can crank out chapter after chapter without any problem. Again, given my experience and expertise in the area, I believe that many of the risks and challenges are already mitigated.
Launch timing: With any project there are always potential risks and unforeseen circumstances that cause delays in launch. That said, I don't expect any significant hiccups along the way and am confident that I can deliver by the September 2017 deadline. If there is any type of delay in the publishing process, I'll keep you in the loop at all times.
Experience: Over the past 3 years alone I have authored over 175+ blog posts/tutorials, a 275 page book, and a 2,161 page course. I have no doubt that I'll be able to deliver a high-quality book that will take you from deep learning and visual recognition novice to expert.Learn about accountability on Kickstarter
Support this project
- (30 days)