{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Introduction to `Trainer` class\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mathLab/PINA/blob/master/tutorials/tutorial11/tutorial.ipynb)\n", "\n", "In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers). \n", "\n", "The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team!\n", "\n", "Our leading example will revolve around solving a simple regression problem where we want to approximate the following function with a Neural Net model $\\mathcal{M}_{\\theta}$:\n", "$$y = x^3$$\n", "by having only a set of $20$ observations $\\{x_i, y_i\\}_{i=1}^{20}$, with $x_i \\sim\\mathcal{U}[-3, 3]\\;\\;\\forall i\\in(1,\\dots,20)$.\n", "\n", "Let's start by importing useful modules!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " import google.colab\n", "\n", " IN_COLAB = True\n", "except:\n", " IN_COLAB = False\n", "if IN_COLAB:\n", " !pip install \"pina-mathlab[tutorial]\"\n", "\n", "import torch\n", "import warnings\n", "\n", "from pina import Trainer\n", "from pina.solver import SupervisedSolver\n", "from pina.model import FeedForward\n", "from pina.problem.zoo import SupervisedProblem\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define problem and solver." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# defining the problem\n", "x_train = torch.empty((20, 1)).uniform_(-3, 3)\n", "y_train = x_train.pow(3) + 3 * torch.randn_like(x_train)\n", "\n", "problem = SupervisedProblem(x_train, y_train)\n", "\n", "# build the model\n", "model = FeedForward(\n", " layers=[10, 10],\n", " func=torch.nn.Tanh,\n", " output_dimensions=1,\n", " input_dimensions=1,\n", ")\n", "\n", "# create the SupervisedSolver object\n", "solver = SupervisedSolver(problem, model, use_lt=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Till now we just followed the extact step of the previous tutorials. The `Trainer` object\n", "can be initialized by simiply passing the `SupervisedSolver` solver" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: True\n", "TPU available: False, using: 0 TPU cores\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "trainer = Trainer(solver=solver)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trainer Accelerator\n", "\n", "When creating the `Trainer`, **by default** the most performing `accelerator` for training which is available in your system will be chosen, ranked as follows:\n", "1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)\n", "2. [IPU](https://www.graphcore.ai/products/ipu)\n", "3. [HPU](https://habana.ai/)\n", "4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)\n", "5. CPU\n", "\n", "For setting manually the `accelerator` run:\n", "\n", "* `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "trainer = Trainer(solver=solver, accelerator=\"cpu\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, even if a `GPU` is available on the system, it is not used since we set `accelerator='cpu'`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trainer Logging\n", "\n", "In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTracker` class from `pina.callbacks`, as seen in the [*Introduction to Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.\n", "\n", "However, especially when we need to train multiple times to get an average of the loss across multiple runs, `lightning.pytorch.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).\n", "\n", "We will now import `TensorBoardLogger`, do three runs of training, and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters); set it to `True` if needed." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "HPU available: False, using: 0 HPUs\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "775a2d088e304b2589631b176c9e99e2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Training: | | 0/? [00:00\n", "\\\"Logging\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modifying the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks)).\n", "\n", "## Trainer Callbacks\n", "\n", "Whenever we need to access certain steps of the training for logging, perform static modifications (i.e. not changing the `Solver`), or update `Problem` hyperparameters (static variables), we can use **Callbacks**. Notice that **Callbacks** allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.\n", "\n", "Lightning has a callback system to execute them when needed. **Callbacks** should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.\n", "\n", "The following are best practices when using/designing callbacks:\n", "\n", "* Callbacks should be isolated in their functionality.\n", "* Your callback should not rely on the behavior of other callbacks in order to work properly.\n", "* Do not manually call methods from the callback.\n", "* Directly calling methods (e.g., on_validation_end) is strongly discouraged.\n", "* Whenever possible, your callbacks should not depend on the order in which they are executed.\n", "\n", "We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from lightning.pytorch.callbacks import Callback\n", "from lightning.pytorch.callbacks import EarlyStopping\n", "import torch\n", "\n", "\n", "# define a simple callback\n", "class NaiveMetricTracker(Callback):\n", " def __init__(self):\n", " self.saved_metrics = []\n", "\n", " def on_train_epoch_end(\n", " self, trainer, __\n", " ): # function called at the end of each epoch\n", " self.saved_metrics.append(\n", " {key: value for key, value in trainer.logged_metrics.items()}\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see the results when applied to the problem. You can define **callbacks** when initializing the `Trainer` by using the `callbacks` argument, which expects a list of callbacks.\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n", "HPU available: False, using: 0 HPUs\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f38442d749ad4702a0c99715ecf08c59", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Training: | | 0/? [00:00