The authors of Dive into Deep Learning, an open source, interactive book that teaches the ideas, the mathematical theory, and the code that power deep learning models, recently announced updates to the book. The updates include the addition of a Google JAX implementation, and three new chapters to volume 2. In addition, the authors say a hardcover version of volume 1 of the book is forthcoming from Cambridge University Press.
Automatically translating “Dive into Deep Learning”
Learn how to use Amazon Translate to simplify the process of translating Dive into Deep Learning into other languages, with code and data examples.
The interactive book, which has been adopted by more than 400 universities worldwide, added an implementation of Google JAX, a Python programming library for high-performance numerical computing whose application programming interface (API) is based on NumPy, a collection of functions used in scientific programming. The authors say that, to the best of their knowledge, Dive into Deep Learning is the first deep-learning book to offer a JAX implementation.
“We have noticed a growing number of projects that have been implemented in JAX in both open-source projects and research papers,” says Aston Zhang, an Amazon senior scientist, and one of the book’s original coauthors. “So that’s why we decided to quickly add the JAX implementation to our book. If members of the JAX community wish to learn deep learning they can come to D2L.ai and learn using their favorite language.”
The book also has added three new chapters to volume 2: reinforcement learning, Gaussian processes, and hyperparameter optimization; each chapter is written by Amazon scientists, or a scientist affiliated with Amazon.
Zhang cites the new content related to reinforcement learning as an example of the project leaders’ desire to keep content current. The research community’s recent excitement about OpenAI’s ChatGPT is an opportunity to help explain one of the key technologies behind the generative AI model.
“Reinforcement learning is one of the key technologies underlying ChatGPT,” Zhang said, “and we know there’s a tremendous amount of interest right now in wanting to understand the technologies behind it. So while reinforcement learning has obviously been around for a long time, we are excited to release this new chapter just in time for people who want to learn more about this topic.”
In each of the new and existing chapters of the book, readers can discuss and learn more about the topic from peers in the community via a link to discussions provided in each section.
The chapter on reinforcement learning is authored by Pratik Chaudhari, an Amazon Visiting Academic, and a University of Pennsylvania assistant professor of electrical and systems engineering and computer and information sciences, and Amazon scientists Rasool Fakoor and Kavosh Asadi.
In introducing the new content the authors say, “In this chapter, we will develop the fundamentals of reinforcement learning and obtain hands-on experience in implementing some popular reinforcement learning methods. We will first develop a concept called a Markov Decision Process (MDP) which allows us to think of such sequential decision-making problems. An algorithm called Value Iteration will be our first insight into solving reinforcement learning problems under the assumption that we know how the uncontrolled variables in an MDP (in RL, these controlled variables are called the environment) typically behave.
“Using the more general version of Value Iteration, an algorithm called Q-Learning, we will be able to take appropriate actions even when we do not necessarily have full knowledge of the environment. We will then study how to use deep networks for reinforcement learning problems by imitating the actions of an expert. And finally, we will develop a reinforcement learning method that uses a deep network to take actions in unknown environments. These techniques form the basis of more advanced RL algorithms that are used today in a variety of real-world applications, some of which we will point to in the chapter.”
The new chapter on Gaussian processes is authored by Andrew Gordon Wilson, an Amazon Visiting Academic who leads the machine learning group at New York University (NYU), and teaches classes on Bayesian machine learning and information theory
In introducing the new chapter, Wilson says: “Learning about Gaussian processes is important for three reasons: (1) they provide a function space perspective of modelling, which makes understanding a variety of model classes, including deep neural networks, much more approachable; (2) they have an extraordinary range of applications where they are state-of-the-art, including active learning, hyperparameter learning, auto-ML, and spatiotemporal regression; (3) over the last few years, algorithmic advances have made Gaussian processes increasingly scalable and relevant, harmonizing with deep learning through frameworks such as GPyTorch.
“Indeed,” Wilson adds, “GPs and and deep neural networks are not competing approaches, but highly complementary, and can be combined to great effect. These algorithmic advances are not just relevant to Gaussian processes, but provide a foundation in numerical methods that is broadly useful in deep learning.”
In introducing the topic, the authors write: “The performance of every machine learning model depends on its hyperparameters. They control the learning algorithm or the structure of the underlying statistical model. However, there is no general way to choose hyperparameters in practice. Instead, hyperparameters are often set in a trial-and-error manner or sometimes left to their default values by practitioners, leading to suboptimal generalization.
“Hyperparameter optimization provides a systematic approach to this problem, by casting it as an optimization problem: a good set of hyperparameters should (at least) minimize a validation error. Compared to most other optimization problems arising in machine learning, hyperparameter optimization is a nested one, where each iteration requires training and validating a machine learning model.
“In this chapter, we will first introduce the basics of hyperparameter optimization. We will also present some recent advancements that improve the overall efficiency of hyperparameter optimization by exploiting cheap-to-evaluate proxies of the original objective function. At the end of this chapter, you should be able to apply state-of-the-art hyperparameter optimization techniques to optimize the hyperparameter of your own machine learning algorithm.”
In other news related to the book, the original volume 1, authored by Aston Zhang, Zach Lipton, a Carnegie Mellon University assistant professor and Amazon Visiting Academic, Mu Li, Amazon senior principal scientist, and Alex Smola, Amazon vice president and distinguished scientist, will be published in 2023 by Cambridge University Press.