diff --git a/docs/source/_rst/_tutorial.rst b/docs/source/_rst/_tutorial.rst index ad9a986..020dbe2 100644 --- a/docs/source/_rst/_tutorial.rst +++ b/docs/source/_rst/_tutorial.rst @@ -10,6 +10,7 @@ Getting started with PINA :titlesonly: Introduction to PINA for Physics Informed Neural Networks training + PINA and PyTorch Lightning, training tips and visualizations Building custom geometries with PINA Location class Physics Informed Neural Networks diff --git a/docs/source/_rst/tutorials/tutorial11/logging.png b/docs/source/_rst/tutorials/tutorial11/logging.png new file mode 100644 index 0000000..c4b421e Binary files /dev/null and b/docs/source/_rst/tutorials/tutorial11/logging.png differ diff --git a/docs/source/_rst/tutorials/tutorial11/tutorial.rst b/docs/source/_rst/tutorials/tutorial11/tutorial.rst new file mode 100644 index 0000000..d9c95a1 --- /dev/null +++ b/docs/source/_rst/tutorials/tutorial11/tutorial.rst @@ -0,0 +1,536 @@ +Tutorial: PINA and PyTorch Lightning, training tips and visualizations +====================================================================== + +In this tutorial, we will delve deeper into the functionality of the +``Trainer`` class, which serves as the cornerstone for training **PINA** +`Solvers `__. +The ``Trainer`` class offers a plethora of features aimed at improving +model accuracy, reducing training time and memory usage, facilitating +logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team! +Our leading example will revolve around solving the ``SimpleODE`` +problem, as outlined in the `Introduction to PINA for Physics Informed +Neural Networks +training `__. +If you haven’t already explored it, we highly recommend doing so before +diving into this tutorial. +Let’s start by importing useful modules, define the ``SimpleODE`` +problem and the ``PINN`` solver. + +.. code:: ipython3 + + import torch + + from pina import Condition, Trainer + from pina.solvers import PINN + from pina.model import FeedForward + from pina.problem import SpatialProblem + from pina.operators import grad + from pina.geometry import CartesianDomain + from pina.equation import Equation, FixedValue + + class SimpleODE(SpatialProblem): + + output_variables = ['u'] + spatial_domain = CartesianDomain({'x': [0, 1]}) + + # defining the ode equation + def ode_equation(input_, output_): + u_x = grad(output_, input_, components=['u'], d=['x']) + u = output_.extract(['u']) + return u_x - u + + # conditions to hold + conditions = { + 'x0': Condition(location=CartesianDomain({'x': 0.}), equation=FixedValue(1)), # We fix initial condition to value 1 + 'D': Condition(location=CartesianDomain({'x': [0, 1]}), equation=Equation(ode_equation)), # We wrap the python equation using Equation + } + + # defining the true solution + def truth_solution(self, pts): + return torch.exp(pts.extract(['x'])) + + + # sampling for training + problem = SimpleODE() + problem.discretise_domain(1, 'random', locations=['x0']) + problem.discretise_domain(20, 'lh', locations=['D']) + + # build the model + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + + # create the PINN object + pinn = PINN(problem, model) + +Till now we just followed the extact step of the previous tutorials. The +``Trainer`` object can be initialized by simiply passing the ``PINN`` +solver + +.. code:: ipython3 + + trainer = Trainer(solver=pinn) + + +.. parsed-literal:: + + GPU available: True (mps), used: True + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + +Trainer Accelerator +------------------- + +When creating the trainer, **by defualt** the ``Trainer`` will choose +the most performing ``accelerator`` for training which is available in +your system, ranked as follow: + +1. `TPU `__ +2. `IPU `__ +3. `HPU `__ +4. `GPU `__ or `MPS `__ +5. CPU + +For setting manually the ``accelerator`` run: + +- ``accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}`` sets the + accelerator to a specific one + +.. code:: ipython3 + + trainer = Trainer(solver=pinn, + accelerator='cpu') + + +.. parsed-literal:: + + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + +as you can see, even if in the used system ``GPU`` is available, it is +not used since we set ``accelerator='cpu'``. + +Trainer Logging +--------------- + +In **PINA** you can log metrics in different ways. The simplest approach +is to use the ``MetricTraker`` class from ``pina.callbacks`` as seen in +the `Introduction to PINA for Physics Informed Neural Networks +training `__ +tutorial. + +However, expecially when we need to train multiple times to get an +average of the loss across multiple runs, ``pytorch_lightning.loggers`` +might be useful. Here we will use ``TensorBoardLogger`` (more on +`logging `__ +here), but you can choose the one you prefer (or make your own one). + +We will now import ``TensorBoardLogger``, do three runs of training and +then visualize the results. Notice we set ``enable_model_summary=False`` +to avoid model summary specifications (e.g. number of parameters), set +it to true if needed. + +.. code:: ipython3 + + from pytorch_lightning.loggers import TensorBoardLogger + + # three run of training, by default it trains for 1000 epochs + # we reinitialize the model each time otherwise the same parameters will be optimized + for _ in range(3): + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + pinn = PINN(problem, model) + trainer = Trainer(solver=pinn, + accelerator='cpu', + logger=TensorBoardLogger(save_dir='simpleode'), + enable_model_summary=False) + trainer.train() + + +.. parsed-literal:: + + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + `Trainer.fit` stopped: `max_epochs=1000` reached. + Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 133.46it/s, v_num=6, x0_loss=1.48e-5, D_loss=0.000655, mean_loss=0.000335] + + +.. parsed-literal:: + + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + `Trainer.fit` stopped: `max_epochs=1000` reached. + Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 154.49it/s, v_num=7, x0_loss=6.21e-6, D_loss=0.000221, mean_loss=0.000114] + + +.. parsed-literal:: + + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + `Trainer.fit` stopped: `max_epochs=1000` reached. + Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 62.60it/s, v_num=8, x0_loss=1.44e-5, D_loss=0.000572, mean_loss=0.000293] + + +We can now visualize the logs by simply running +``tensorboard --logdir=simpleode/`` on terminal, you should obtain a +webpage as the one shown below: + +.. image:: logging.png + +as you can see, by default, **PINA** logs the losses which are shown in +the progress bar, as well as the number of epochs. You can always insert +more loggings by either defining a **callback** (`more on +callbacks `__), +or inheriting the solver and modify the programs with different +**hooks** (`more on +hooks `__). + +Trainer Callbacks +----------------- + +Whenever we need to access certain steps of the training for logging, do +static modifications (i.e. not changing the ``Solver``) or updating +``Problem`` hyperparameters (static variables), we can use +``Callabacks``. Notice that ``Callbacks`` allow you to add arbitrary +self-contained programs to your training. At specific points during the +flow of execution (hooks), the Callback interface allows you to design +programs that encapsulate a full set of functionality. It de-couples +functionality that does not need to be in **PINA** ``Solver``\ s. +Lightning has a callback system to execute them when needed. Callbacks +should capture NON-ESSENTIAL logic that is NOT required for your +lightning module to run. + +The following are best practices when using/designing callbacks. + +- Callbacks should be isolated in their functionality. +- Your callback should not rely on the behavior of other callbacks in + order to work properly. +- Do not manually call methods from the callback. +- Directly calling methods (eg. on_validation_end) is strongly + discouraged. +- Whenever possible, your callbacks should not depend on the order in + which they are executed. + +We will try now to implement a naive version of ``MetricTraker`` to show +how callbacks work. Notice that this is a very easy application of +callbacks, fortunately in **PINA** we already provide more advanced +callbacks in ``pina.callbacks``. + +.. raw:: html + + + +.. code:: ipython3 + + from pytorch_lightning.callbacks import Callback + import torch + + # define a simple callback + class NaiveMetricTracker(Callback): + def __init__(self): + self.saved_metrics = [] + + def on_train_epoch_end(self, trainer, __): # function called at the end of each epoch + self.saved_metrics.append( + {key: value for key, value in trainer.logged_metrics.items()} + ) + +Let’s see the results when applyed to the ``SimpleODE`` problem. You can +define callbacks when initializing the ``Trainer`` by the ``callbacks`` +argument, which expects a list of callbacks. + +.. code:: ipython3 + + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + pinn = PINN(problem, model) + trainer = Trainer(solver=pinn, + accelerator='cpu', + enable_model_summary=False, + callbacks=[NaiveMetricTracker()]) # adding a callbacks + trainer.train() + + +.. parsed-literal:: + + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + `Trainer.fit` stopped: `max_epochs=1000` reached. + Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 149.27it/s, v_num=1, x0_loss=7.27e-5, D_loss=0.0016, mean_loss=0.000838] + + +We can easily access the data by calling +``trainer.callbacks[0].saved_metrics`` (notice the zero representing the +first callback in the list given at initialization). + +.. code:: ipython3 + + trainer.callbacks[0].saved_metrics[:3] # only the first three epochs + + + + +.. parsed-literal:: + + [{'x0_loss': tensor(0.9141), + 'D_loss': tensor(0.0304), + 'mean_loss': tensor(0.4722)}, + {'x0_loss': tensor(0.8906), + 'D_loss': tensor(0.0287), + 'mean_loss': tensor(0.4596)}, + {'x0_loss': tensor(0.8674), + 'D_loss': tensor(0.0274), + 'mean_loss': tensor(0.4474)}] + + + +PyTorch Lightning also has some built in ``Callbacks`` which can be used +in **PINA**, `here an extensive +list `__. + +We can for example try the ``EarlyStopping`` routine, which +automatically stops the training when a specific metric converged (here +the ``mean_loss``). In order to let the training keep going forever set +``max_epochs=-1``. + +.. code:: ipython3 + + # ~2 mins + from pytorch_lightning.callbacks import EarlyStopping + + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + pinn = PINN(problem, model) + trainer = Trainer(solver=pinn, + accelerator='cpu', + max_epochs = -1, + enable_model_summary=False, + callbacks=[EarlyStopping('mean_loss')]) # adding a callbacks + trainer.train() + + +.. parsed-literal:: + + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + Epoch 6157: 100%|██████████| 1/1 [00:00<00:00, 139.84it/s, v_num=9, x0_loss=4.21e-9, D_loss=9.93e-6, mean_loss=4.97e-6] + + +As we can see the model automatically stop when the logging metric +stopped improving! + +Trainer Tips to Boost Accuracy, Save Memory and Speed Up Training +----------------------------------------------------------------- + +Untill now we have seen how to choose the right ``accelerator``, how to +log and visualize the results, and how to interface with the program in +order to add specific parts of code at specific points by ``callbacks``. +Now, we well focus on how boost your training by saving memory and +speeding it up, while mantaining the same or even better degree of +accuracy! + +There are several built in methods developed in PyTorch Lightning which +can be applied straight forward in **PINA**, here we report some: + +- `Stochastic Weight + Averaging `__ + to boost accuracy +- `Gradient + Clippling `__ to + reduce computational time (and improve accuracy) +- `Gradient + Accumulation `__ + to save memory consumption +- `Mixed Precision + Training `__ + to save memory consumption + +We will just demonstrate how to use the first two, and see the results +compared to a standard training. We use the +`Timer `__ +callback from ``pytorch_lightning.callbacks`` to take the times. Let’s +start by training a simple model without any optimization (train for +2000 epochs). + +.. code:: ipython3 + + from pytorch_lightning.callbacks import Timer + from pytorch_lightning import seed_everything + + # setting the seed for reproducibility + seed_everything(42, workers=True) + + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + + pinn = PINN(problem, model) + trainer = Trainer(solver=pinn, + accelerator='cpu', + deterministic=True, # setting deterministic=True ensure reproducibility when a seed is imposed + max_epochs = 2000, + enable_model_summary=False, + callbacks=[Timer()]) # adding a callbacks + trainer.train() + print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s') + + +.. parsed-literal:: + + Seed set to 42 + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + + `Trainer.fit` stopped: `max_epochs=2000` reached. + Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 163.58it/s, v_num=31, x0_loss=1.12e-6, D_loss=0.000127, mean_loss=6.4e-5] + Total training time 17.36381 s + + +Now we do the same but with StochasticWeightAveraging + +.. code:: ipython3 + + from pytorch_lightning.callbacks import StochasticWeightAveraging + + # setting the seed for reproducibility + seed_everything(42, workers=True) + + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + pinn = PINN(problem, model) + trainer = Trainer(solver=pinn, + accelerator='cpu', + deterministic=True, + max_epochs = 2000, + enable_model_summary=False, + callbacks=[Timer(), + StochasticWeightAveraging(swa_lrs=0.005)]) # adding StochasticWeightAveraging callbacks + trainer.train() + print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s') + + +.. parsed-literal:: + + Seed set to 42 + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + + Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 210.04it/s, v_num=47, x0_loss=4.17e-6, D_loss=0.000204, mean_loss=0.000104] + Swapping scheduler `ConstantLR` for `SWALR` + `Trainer.fit` stopped: `max_epochs=2000` reached. + Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 120.85it/s, v_num=47, x0_loss=1.56e-7, D_loss=7.49e-5, mean_loss=3.75e-5] + Total training time 17.10627 s + + +As you can see, the training time does not change at all! Notice that +around epoch ``1600`` the scheduler is switched from the defalut one +``ConstantLR`` to the Stochastic Weight Average Learning Rate +(``SWALR``). This is because by default ``StochasticWeightAveraging`` +will be activated after ``int(swa_epoch_start * max_epochs)`` with +``swa_epoch_start=0.7`` by default. Finally, the final ``mean_loss`` is +lower when ``StochasticWeightAveraging`` is used. + +We will now now do the same but clippling the gradient to be relatively +small. + +.. code:: ipython3 + + # setting the seed for reproducibility + seed_everything(42, workers=True) + + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + pinn = PINN(problem, model) + trainer = Trainer(solver=pinn, + accelerator='cpu', + max_epochs = 2000, + enable_model_summary=False, + gradient_clip_val=0.1, # clipping the gradient + callbacks=[Timer(), + StochasticWeightAveraging(swa_lrs=0.005)]) + trainer.train() + print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s') + + +.. parsed-literal:: + + Seed set to 42 + GPU available: True (mps), used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + + Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 261.80it/s, v_num=46, x0_loss=9e-8, D_loss=2.39e-5, mean_loss=1.2e-5] + Swapping scheduler `ConstantLR` for `SWALR` + `Trainer.fit` stopped: `max_epochs=2000` reached. + Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 148.99it/s, v_num=46, x0_loss=7.08e-7, D_loss=1.77e-5, mean_loss=9.19e-6] + Total training time 17.01149 s + + +As we can see we by applying gradient clipping we were able to even +obtain lower error! + +What’s next? +------------ + +Now you know how to use efficiently the ``Trainer`` class **PINA**! +There are multiple directions you can go now: + +1. Explore training times on different devices (e.g.) ``TPU`` + +2. Try to reduce memory cost by mixed precision training and gradient + accumulation (especially useful when training Neural Operators) + +3. Benchmark ``Trainer`` speed for different precisions. diff --git a/pina/label_tensor.py b/pina/label_tensor.py index c92dda9..6731a98 100644 --- a/pina/label_tensor.py +++ b/pina/label_tensor.py @@ -1,6 +1,6 @@ """ Module for LabelTensor """ -from typing import Any +from copy import deepcopy import torch from torch import Tensor @@ -79,6 +79,21 @@ class LabelTensor(torch.Tensor): ) self._labels = labels + def __deepcopy__(self, __): + """ + Implements deepcopy for label tensor. By default it stores the + current labels and use the :meth:`~torch._tensor.Tensor.__deepcopy__` + method for creating a new :class:`pina.label_tensor.LabelTensor`. + + :param __: Placeholder parameter. + :type __: None + :return: The deep copy of the :class:`pina.label_tensor.LabelTensor`. + :rtype: LabelTensor + """ + labels = self.labels + copy_tensor = deepcopy(self.tensor) + return LabelTensor(copy_tensor, labels) + @property def labels(self): """Property decorator for labels diff --git a/pina/problem/abstract_problem.py b/pina/problem/abstract_problem.py index a368b40..6ebba6c 100644 --- a/pina/problem/abstract_problem.py +++ b/pina/problem/abstract_problem.py @@ -2,6 +2,7 @@ from abc import ABCMeta, abstractmethod from ..utils import merge_tensors, check_consistency +from copy import deepcopy import torch @@ -29,6 +30,23 @@ class AbstractProblem(metaclass=ABCMeta): # put in self.input_pts all the points that we don't need to sample self._span_condition_points() + def __deepcopy__(self, memo): + """ + Implements deepcopy for the + :class:`~pina.problem.abstract_problem.AbstractProblem` class. + + :param dict memo: Memory dictionary, to avoid excess copy + :return: The deep copy of the + :class:`~pina.problem.abstract_problem.AbstractProblem` class + :rtype: AbstractProblem + """ + cls = self.__class__ + result = cls.__new__(cls) + memo[id(self)] = result + for k, v in self.__dict__.items(): + setattr(result, k, deepcopy(v, memo)) + return result + @property def input_variables(self): """ diff --git a/tutorials/README.md b/tutorials/README.md index d0abcbd..10ba0c6 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -7,6 +7,7 @@ In this folder we collect useful tutorials in order to understand the principles | Description | Tutorial | |---------------|-----------| Introduction to PINA for Physics Informed Neural Networks training|[[.ipynb](tutorial1/tutorial.ipynb), [.py](tutorial1/tutorial.py), [.html](http://mathlab.github.io/PINA/_rst/tutorials/tutorial1/tutorial.html)]| +PINA and PyTorch Lightning, training tips and visualizations|[[.ipynb](tutorial11/tutorial.ipynb), [.py](tutorial11/tutorial.py), [.html](http://mathlab.github.io/PINA/_rst/tutorials/tutorial11/tutorial.html)]| Building custom geometries with PINA `Location` class|[[.ipynb](tutorial6/tutorial.ipynb), [.py](tutorial6/tutorial.py), [.html](http://mathlab.github.io/PINA/_rst/tutorials/tutorial6/tutorial.html)]| diff --git a/tutorials/tutorial11/logging.png b/tutorials/tutorial11/logging.png new file mode 100644 index 0000000..c4b421e Binary files /dev/null and b/tutorials/tutorial11/logging.png differ diff --git a/tutorials/tutorial11/tutorial.ipynb b/tutorials/tutorial11/tutorial.ipynb new file mode 100644 index 0000000..68d1f5e --- /dev/null +++ b/tutorials/tutorial11/tutorial.ipynb @@ -0,0 +1,821 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial: PINA and PyTorch Lightning, training tips and visualizations \n", + "\n", + "\n", + "In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers). \n", + "\n", + "The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team!\n", + "\n", + "Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial.\n", + "\n", + "Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "\n", + "from pina import Condition, Trainer\n", + "from pina.solvers import PINN\n", + "from pina.model import FeedForward\n", + "from pina.problem import SpatialProblem\n", + "from pina.operators import grad\n", + "from pina.geometry import CartesianDomain\n", + "from pina.equation import Equation, FixedValue\n", + "\n", + "class SimpleODE(SpatialProblem):\n", + "\n", + " output_variables = ['u']\n", + " spatial_domain = CartesianDomain({'x': [0, 1]})\n", + "\n", + " # defining the ode equation\n", + " def ode_equation(input_, output_):\n", + " u_x = grad(output_, input_, components=['u'], d=['x'])\n", + " u = output_.extract(['u'])\n", + " return u_x - u\n", + "\n", + " # conditions to hold\n", + " conditions = {\n", + " 'x0': Condition(location=CartesianDomain({'x': 0.}), equation=FixedValue(1)), # We fix initial condition to value 1\n", + " 'D': Condition(location=CartesianDomain({'x': [0, 1]}), equation=Equation(ode_equation)), # We wrap the python equation using Equation\n", + " }\n", + "\n", + " # defining the true solution\n", + " def truth_solution(self, pts):\n", + " return torch.exp(pts.extract(['x']))\n", + " \n", + "\n", + "# sampling for training\n", + "problem = SimpleODE()\n", + "problem.discretise_domain(1, 'random', locations=['x0'])\n", + "problem.discretise_domain(20, 'lh', locations=['D'])\n", + "\n", + "# build the model\n", + "model = FeedForward(\n", + " layers=[10, 10],\n", + " func=torch.nn.Tanh,\n", + " output_dimensions=len(problem.output_variables),\n", + " input_dimensions=len(problem.input_variables)\n", + ")\n", + "\n", + "# create the PINN object\n", + "pinn = PINN(problem, model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Till now we just followed the extact step of the previous tutorials. The `Trainer` object\n", + "can be initialized by simiply passing the `PINN` solver" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "GPU available: True (mps), used: True\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + } + ], + "source": [ + "trainer = Trainer(solver=pinn)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Trainer Accelerator\n", + "\n", + "When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow:\n", + "1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)\n", + "2. [IPU](https://www.graphcore.ai/products/ipu)\n", + "3. [HPU](https://habana.ai/)\n", + "4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)\n", + "5. CPU" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For setting manually the `accelerator` run:\n", + "\n", + "* `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + } + ], + "source": [ + "trainer = Trainer(solver=pinn,\n", + " accelerator='cpu')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Trainer Logging\n", + "\n", + "In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.\n", + "\n", + "However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).\n", + "\n", + "We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 8: 100%|██████████| 1/1 [00:00<00:00, 232.78it/s, v_num=6, x0_loss=0.436, D_loss=0.129, mean_loss=0.283] " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 222.52it/s, v_num=6, x0_loss=1.48e-5, D_loss=0.000655, mean_loss=0.000335]" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`Trainer.fit` stopped: `max_epochs=1000` reached.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 133.46it/s, v_num=6, x0_loss=1.48e-5, D_loss=0.000655, mean_loss=0.000335]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 274.80it/s, v_num=7, x0_loss=6.21e-6, D_loss=0.000221, mean_loss=0.000114]" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`Trainer.fit` stopped: `max_epochs=1000` reached.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 154.49it/s, v_num=7, x0_loss=6.21e-6, D_loss=0.000221, mean_loss=0.000114]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 78.56it/s, v_num=8, x0_loss=1.44e-5, D_loss=0.000572, mean_loss=0.000293] " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`Trainer.fit` stopped: `max_epochs=1000` reached.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 62.60it/s, v_num=8, x0_loss=1.44e-5, D_loss=0.000572, mean_loss=0.000293]\n" + ] + } + ], + "source": [ + "from pytorch_lightning.loggers import TensorBoardLogger\n", + "\n", + "# three run of training, by default it trains for 1000 epochs\n", + "# we reinitialize the model each time otherwise the same parameters will be optimized\n", + "for _ in range(3):\n", + " model = FeedForward(\n", + " layers=[10, 10],\n", + " func=torch.nn.Tanh,\n", + " output_dimensions=len(problem.output_variables),\n", + " input_dimensions=len(problem.input_variables)\n", + " )\n", + " pinn = PINN(problem, model)\n", + " trainer = Trainer(solver=pinn,\n", + " accelerator='cpu',\n", + " logger=TensorBoardLogger(save_dir='simpleode'),\n", + " enable_model_summary=False)\n", + " trainer.train()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now visualize the logs by simply running `tensorboard --logdir=simpleode/` on terminal, you should obtain a webpage as the one shown below:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

\n", + "\\\"Logging\n", + "

" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "as you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modify the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks))." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Trainer Callbacks" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Whenever we need to access certain steps of the training for logging, do static modifications (i.e. not changing the `Solver`) or updating `Problem` hyperparameters (static variables), we can use `Callabacks`. Notice that `Callbacks` allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.\n", + "Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.\n", + "\n", + "The following are best practices when using/designing callbacks.\n", + "\n", + "* Callbacks should be isolated in their functionality.\n", + "* Your callback should not rely on the behavior of other callbacks in order to work properly.\n", + "* Do not manually call methods from the callback.\n", + "* Directly calling methods (eg. on_validation_end) is strongly discouraged.\n", + "* Whenever possible, your callbacks should not depend on the order in which they are executed.\n", + "\n", + "We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`.\n", + "\n", + "" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from pytorch_lightning.callbacks import Callback\n", + "import torch\n", + "\n", + "# define a simple callback\n", + "class NaiveMetricTracker(Callback):\n", + " def __init__(self):\n", + " self.saved_metrics = []\n", + "\n", + " def on_train_epoch_end(self, trainer, __): # function called at the end of each epoch\n", + " self.saved_metrics.append(\n", + " {key: value for key, value in trainer.logged_metrics.items()}\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's see the results when applyed to the `SimpleODE` problem. You can define callbacks when initializing the `Trainer` by the `callbacks` argument, which expects a list of callbacks. " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 241.30it/s, v_num=1, x0_loss=7.27e-5, D_loss=0.0016, mean_loss=0.000838] " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`Trainer.fit` stopped: `max_epochs=1000` reached.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 149.27it/s, v_num=1, x0_loss=7.27e-5, D_loss=0.0016, mean_loss=0.000838]\n" + ] + } + ], + "source": [ + "model = FeedForward(\n", + " layers=[10, 10],\n", + " func=torch.nn.Tanh,\n", + " output_dimensions=len(problem.output_variables),\n", + " input_dimensions=len(problem.input_variables)\n", + " )\n", + "pinn = PINN(problem, model)\n", + "trainer = Trainer(solver=pinn,\n", + " accelerator='cpu',\n", + " enable_model_summary=False,\n", + " callbacks=[NaiveMetricTracker()]) # adding a callbacks\n", + "trainer.train()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can easily access the data by calling `trainer.callbacks[0].saved_metrics` (notice the zero representing the first callback in the list given at initialization)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'x0_loss': tensor(0.9141),\n", + " 'D_loss': tensor(0.0304),\n", + " 'mean_loss': tensor(0.4722)},\n", + " {'x0_loss': tensor(0.8906),\n", + " 'D_loss': tensor(0.0287),\n", + " 'mean_loss': tensor(0.4596)},\n", + " {'x0_loss': tensor(0.8674),\n", + " 'D_loss': tensor(0.0274),\n", + " 'mean_loss': tensor(0.4474)}]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trainer.callbacks[0].saved_metrics[:3] # only the first three epochs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "PyTorch Lightning also has some built in `Callbacks` which can be used in **PINA**, [here an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks). \n", + "\n", + "We can for example try the `EarlyStopping` routine, which automatically stops the training when a specific metric converged (here the `mean_loss`). In order to let the training keep going forever set `max_epochs=-1`." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 4: 100%|██████████| 1/1 [00:00<00:00, 255.67it/s, v_num=9, x0_loss=0.876, D_loss=0.00542, mean_loss=0.441]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 6157: 100%|██████████| 1/1 [00:00<00:00, 139.84it/s, v_num=9, x0_loss=4.21e-9, D_loss=9.93e-6, mean_loss=4.97e-6] \n" + ] + } + ], + "source": [ + "# ~2 mins\n", + "from pytorch_lightning.callbacks import EarlyStopping\n", + "\n", + "model = FeedForward(\n", + " layers=[10, 10],\n", + " func=torch.nn.Tanh,\n", + " output_dimensions=len(problem.output_variables),\n", + " input_dimensions=len(problem.input_variables)\n", + " )\n", + "pinn = PINN(problem, model)\n", + "trainer = Trainer(solver=pinn,\n", + " accelerator='cpu',\n", + " max_epochs = -1,\n", + " enable_model_summary=False,\n", + " callbacks=[EarlyStopping('mean_loss')]) # adding a callbacks\n", + "trainer.train()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we can see the model automatically stop when the logging metric stopped improving!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Trainer Tips to Boost Accuracy, Save Memory and Speed Up Training\n", + "\n", + "Untill now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points by `callbacks`.\n", + "Now, we well focus on how boost your training by saving memory and speeding it up, while mantaining the same or even better degree of accuracy!\n", + "\n", + "\n", + "There are several built in methods developed in PyTorch Lightning which can be applied straight forward in **PINA**, here we report some:\n", + "\n", + "* [Stochastic Weight Averaging](https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/) to boost accuracy\n", + "* [Gradient Clippling](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy)\n", + "* [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption \n", + "* [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption \n", + "\n", + "We will just demonstrate how to use the first two, and see the results compared to a standard training.\n", + "We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to take the times. Let's start by training a simple model without any optimization (train for 2000 epochs)." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Seed set to 42\n", + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 275.87it/s, v_num=31, x0_loss=1.12e-6, D_loss=0.000127, mean_loss=6.4e-5] " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`Trainer.fit` stopped: `max_epochs=2000` reached.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 163.58it/s, v_num=31, x0_loss=1.12e-6, D_loss=0.000127, mean_loss=6.4e-5]\n", + "Total training time 17.36381 s\n" + ] + } + ], + "source": [ + "from pytorch_lightning.callbacks import Timer\n", + "from pytorch_lightning import seed_everything\n", + "\n", + "# setting the seed for reproducibility\n", + "seed_everything(42, workers=True)\n", + "\n", + "model = FeedForward(\n", + " layers=[10, 10],\n", + " func=torch.nn.Tanh,\n", + " output_dimensions=len(problem.output_variables),\n", + " input_dimensions=len(problem.input_variables)\n", + " )\n", + "\n", + "pinn = PINN(problem, model)\n", + "trainer = Trainer(solver=pinn,\n", + " accelerator='cpu',\n", + " deterministic=True, # setting deterministic=True ensure reproducibility when a seed is imposed\n", + " max_epochs = 2000,\n", + " enable_model_summary=False,\n", + " callbacks=[Timer()]) # adding a callbacks\n", + "trainer.train()\n", + "print(f'Total training time {trainer.callbacks[0].time_elapsed(\"train\"):.5f} s')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we do the same but with StochasticWeightAveraging" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Seed set to 42\n", + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 210.04it/s, v_num=47, x0_loss=4.17e-6, D_loss=0.000204, mean_loss=0.000104]" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Swapping scheduler `ConstantLR` for `SWALR`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 259.39it/s, v_num=47, x0_loss=1.56e-7, D_loss=7.49e-5, mean_loss=3.75e-5] " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`Trainer.fit` stopped: `max_epochs=2000` reached.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 120.85it/s, v_num=47, x0_loss=1.56e-7, D_loss=7.49e-5, mean_loss=3.75e-5]\n", + "Total training time 17.10627 s\n" + ] + } + ], + "source": [ + "from pytorch_lightning.callbacks import StochasticWeightAveraging\n", + "\n", + "# setting the seed for reproducibility\n", + "seed_everything(42, workers=True)\n", + "\n", + "model = FeedForward(\n", + " layers=[10, 10],\n", + " func=torch.nn.Tanh,\n", + " output_dimensions=len(problem.output_variables),\n", + " input_dimensions=len(problem.input_variables)\n", + " )\n", + "pinn = PINN(problem, model)\n", + "trainer = Trainer(solver=pinn,\n", + " accelerator='cpu',\n", + " deterministic=True,\n", + " max_epochs = 2000,\n", + " enable_model_summary=False,\n", + " callbacks=[Timer(),\n", + " StochasticWeightAveraging(swa_lrs=0.005)]) # adding StochasticWeightAveraging callbacks\n", + "trainer.train()\n", + "print(f'Total training time {trainer.callbacks[0].time_elapsed(\"train\"):.5f} s')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the training time does not change at all! Notice that around epoch `1600`\n", + "the scheduler is switched from the defalut one `ConstantLR` to the Stochastic Weight Average Learning Rate (`SWALR`).\n", + "This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `mean_loss` is lower when `StochasticWeightAveraging` is used.\n", + "\n", + "We will now now do the same but clippling the gradient to be relatively small." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Seed set to 42\n", + "GPU available: True (mps), used: False\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 261.80it/s, v_num=46, x0_loss=9e-8, D_loss=2.39e-5, mean_loss=1.2e-5] " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Swapping scheduler `ConstantLR` for `SWALR`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 261.78it/s, v_num=46, x0_loss=7.08e-7, D_loss=1.77e-5, mean_loss=9.19e-6] " + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`Trainer.fit` stopped: `max_epochs=2000` reached.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 148.99it/s, v_num=46, x0_loss=7.08e-7, D_loss=1.77e-5, mean_loss=9.19e-6]\n", + "Total training time 17.01149 s\n" + ] + } + ], + "source": [ + "# setting the seed for reproducibility\n", + "seed_everything(42, workers=True)\n", + "\n", + "model = FeedForward(\n", + " layers=[10, 10],\n", + " func=torch.nn.Tanh,\n", + " output_dimensions=len(problem.output_variables),\n", + " input_dimensions=len(problem.input_variables)\n", + " )\n", + "pinn = PINN(problem, model)\n", + "trainer = Trainer(solver=pinn,\n", + " accelerator='cpu',\n", + " max_epochs = 2000,\n", + " enable_model_summary=False,\n", + " gradient_clip_val=0.1, # clipping the gradient\n", + " callbacks=[Timer(),\n", + " StochasticWeightAveraging(swa_lrs=0.005)])\n", + "trainer.train()\n", + "print(f'Total training time {trainer.callbacks[0].time_elapsed(\"train\"):.5f} s')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we can see we by applying gradient clipping we were able to even obtain lower error!\n", + "\n", + "## What's next?\n", + "\n", + "Now you know how to use efficiently the `Trainer` class **PINA**! There are multiple directions you can go now:\n", + "\n", + "1. Explore training times on different devices (e.g.) `TPU` \n", + "\n", + "2. Try to reduce memory cost by mixed precision training and gradient accumulation (especially useful when training Neural Operators)\n", + "\n", + "3. Benchmark `Trainer` speed for different precisions." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pina", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/tutorials/tutorial11/tutorial.py b/tutorials/tutorial11/tutorial.py new file mode 100644 index 0000000..1f51f4a --- /dev/null +++ b/tutorials/tutorial11/tutorial.py @@ -0,0 +1,336 @@ +#!/usr/bin/env python +# coding: utf-8 + +# # Tutorial: PINA and PyTorch Lightning, training tips and visualizations +# +# +# In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers). +# +# The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more. +# +# Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial. +# +# Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver. + +# In[18]: + + +import torch + +from pina import Condition, Trainer +from pina.solvers import PINN +from pina.model import FeedForward +from pina.problem import SpatialProblem +from pina.operators import grad +from pina.geometry import CartesianDomain +from pina.equation import Equation, FixedValue + +class SimpleODE(SpatialProblem): + + output_variables = ['u'] + spatial_domain = CartesianDomain({'x': [0, 1]}) + + # defining the ode equation + def ode_equation(input_, output_): + u_x = grad(output_, input_, components=['u'], d=['x']) + u = output_.extract(['u']) + return u_x - u + + # conditions to hold + conditions = { + 'x0': Condition(location=CartesianDomain({'x': 0.}), equation=FixedValue(1)), # We fix initial condition to value 1 + 'D': Condition(location=CartesianDomain({'x': [0, 1]}), equation=Equation(ode_equation)), # We wrap the python equation using Equation + } + + # defining the true solution + def truth_solution(self, pts): + return torch.exp(pts.extract(['x'])) + + +# sampling for training +problem = SimpleODE() +problem.discretise_domain(1, 'random', locations=['x0']) +problem.discretise_domain(20, 'lh', locations=['D']) + +# build the model +model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) +) + +# create the PINN object +pinn = PINN(problem, model) + + +# Till now we just followed the extact step of the previous tutorials. The `Trainer` object +# can be initialized by simiply passing the `PINN` solver + +# In[3]: + + +trainer = Trainer(solver=pinn) + + +# ## Trainer Accelerator +# +# When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow: +# 1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu) +# 2. [IPU](https://www.graphcore.ai/products/ipu) +# 3. [HPU](https://habana.ai/) +# 4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/) +# 5. CPU + +# For setting manually the `accelerator` run: +# +# * `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one + +# In[5]: + + +trainer = Trainer(solver=pinn, + accelerator='cpu') + + +# as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`. + +# ## Trainer Logging +# +# In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial. +# +# However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one) thanks to the amazing job done by the PyTorch Lightning team! +# +# We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed. +# + +# In[7]: + + +from pytorch_lightning.loggers import TensorBoardLogger + +# three run of training, by default it trains for 1000 epochs +# we reinitialize the model each time otherwise the same parameters will be optimized +for _ in range(3): + model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + pinn = PINN(problem, model) + trainer = Trainer(solver=pinn, + accelerator='cpu', + logger=TensorBoardLogger(save_dir='simpleode'), + enable_model_summary=False) + trainer.train() + + +# We can now visualize the logs by simply running `tensorboard --logdir=simpleode/` on terminal, you should obtain a webpage as the one shown below: + +#

+# \"Logging +#

+ +# as you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modify the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks)). + +# ## Trainer Callbacks + +# Whenever we need to access certain steps of the training for logging, do static modifications (i.e. not changing the `Solver`) or updating `Problem` hyperparameters (static variables), we can use `Callabacks`. Notice that `Callbacks` allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s. +# Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run. +# +# The following are best practices when using/designing callbacks. +# +# * Callbacks should be isolated in their functionality. +# * Your callback should not rely on the behavior of other callbacks in order to work properly. +# * Do not manually call methods from the callback. +# * Directly calling methods (eg. on_validation_end) is strongly discouraged. +# * Whenever possible, your callbacks should not depend on the order in which they are executed. +# +# We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`. +# +# + +# In[8]: + + +from pytorch_lightning.callbacks import Callback +import torch + +# define a simple callback +class NaiveMetricTracker(Callback): + def __init__(self): + self.saved_metrics = [] + + def on_train_epoch_end(self, trainer, __): # function called at the end of each epoch + self.saved_metrics.append( + {key: value for key, value in trainer.logged_metrics.items()} + ) + + +# Let's see the results when applyed to the `SimpleODE` problem. You can define callbacks when initializing the `Trainer` by the `callbacks` argument, which expects a list of callbacks. + +# In[10]: + + +model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) +pinn = PINN(problem, model) +trainer = Trainer(solver=pinn, + accelerator='cpu', + enable_model_summary=False, + callbacks=[NaiveMetricTracker()]) # adding a callbacks +trainer.train() + + +# We can easily access the data by calling `trainer.callbacks[0].saved_metrics` (notice the zero representing the first callback in the list given at initialization). + +# In[9]: + + +trainer.callbacks[0].saved_metrics[:3] # only the first three epochs + + +# PyTorch Lightning also has some built in `Callbacks` which can be used in **PINA**, [here an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks). +# +# We can for example try the `EarlyStopping` routine, which automatically stops the training when a specific metric converged (here the `mean_loss`). In order to let the training keep going forever set `max_epochs=-1`. + +# In[7]: + + +# ~2 mins +from pytorch_lightning.callbacks import EarlyStopping + +model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) +pinn = PINN(problem, model) +trainer = Trainer(solver=pinn, + accelerator='cpu', + max_epochs = -1, + enable_model_summary=False, + callbacks=[EarlyStopping('mean_loss')]) # adding a callbacks +trainer.train() + + +# As we can see the model automatically stop when the logging metric stopped improving! + +# ## Trainer Tips to Boost Accuracy, Save Memory and Speed Up Training +# +# Untill now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points by `callbacks`. +# Now, we well focus on how boost your training by saving memory and speeding it up, while mantaining the same or even better degree of accuracy! +# +# +# There are several built in methods developed in PyTorch Lightning which can be applied straight forward in **PINA**, here we report some: +# +# * [Stochastic Weight Averaging](https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/) to boost accuracy +# * [Gradient Clippling](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy) +# * [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption +# * [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption +# +# We will just demonstrate how to use the first two, and see the results compared to a standard training. +# We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to take the times. Let's start by training a simple model without any optimization (train for 2000 epochs). + +# In[19]: + + +from pytorch_lightning.callbacks import Timer +from pytorch_lightning import seed_everything + +# setting the seed for reproducibility +seed_everything(42, workers=True) + +model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) + +pinn = PINN(problem, model) +trainer = Trainer(solver=pinn, + accelerator='cpu', + deterministic=True, # setting deterministic=True ensure reproducibility when a seed is imposed + max_epochs = 2000, + enable_model_summary=False, + callbacks=[Timer()]) # adding a callbacks +trainer.train() +print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s') + + +# Now we do the same but with StochasticWeightAveraging + +# In[36]: + + +from pytorch_lightning.callbacks import StochasticWeightAveraging + +# setting the seed for reproducibility +seed_everything(42, workers=True) + +model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) +pinn = PINN(problem, model) +trainer = Trainer(solver=pinn, + accelerator='cpu', + deterministic=True, + max_epochs = 2000, + enable_model_summary=False, + callbacks=[Timer(), + StochasticWeightAveraging(swa_lrs=0.005)]) # adding StochasticWeightAveraging callbacks +trainer.train() +print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s') + + +# As you can see, the training time does not change at all! Notice that around epoch `1600` +# the scheduler is switched from the defalut one `ConstantLR` to the Stochastic Weight Average Learning Rate (`SWALR`). +# This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `mean_loss` is lower when `StochasticWeightAveraging` is used. +# +# We will now now do the same but clippling the gradient to be relatively small. + +# In[35]: + + +# setting the seed for reproducibility +seed_everything(42, workers=True) + +model = FeedForward( + layers=[10, 10], + func=torch.nn.Tanh, + output_dimensions=len(problem.output_variables), + input_dimensions=len(problem.input_variables) + ) +pinn = PINN(problem, model) +trainer = Trainer(solver=pinn, + accelerator='cpu', + max_epochs = 2000, + enable_model_summary=False, + gradient_clip_val=0.1, # clipping the gradient + callbacks=[Timer(), + StochasticWeightAveraging(swa_lrs=0.005)]) +trainer.train() +print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s') + + +# As we can see we by applying gradient clipping we were able to even obtain lower error! +# +# ## What's next? +# +# Now you know how to use efficiently the `Trainer` class **PINA**! There are multiple directions you can go now: +# +# 1. Explore training times on different devices (e.g.) `TPU` +# +# 2. Try to reduce memory cost by mixed precision training and gradient accumulation (especially useful when training Neural Operators) +# +# 3. Benchmark `Trainer` speed for different precisions.