{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: PINA and PyTorch Lightning, training tips and visualizations \n", "\n", "[](https://colab.research.google.com/github/mathLab/PINA/blob/master/tutorials/tutorial11/tutorial.ipynb)\n", "\n", "In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers). \n", "\n", "The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team!\n", "\n", "Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial.\n", "\n", "Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "## routine needed to run the notebook on Google Colab\n", "try:\n", " import google.colab\n", " IN_COLAB = True\n", "except:\n", " IN_COLAB = False\n", "if IN_COLAB:\n", " !pip install \"pina-mathlab\"\n", "\n", "import torch\n", "\n", "from pina import Condition, Trainer\n", "from pina.solvers import PINN\n", "from pina.model import FeedForward\n", "from pina.problem import SpatialProblem\n", "from pina.operators import grad\n", "from pina.geometry import CartesianDomain\n", "from pina.equation import Equation, FixedValue\n", "\n", "class SimpleODE(SpatialProblem):\n", "\n", " output_variables = ['u']\n", " spatial_domain = CartesianDomain({'x': [0, 1]})\n", "\n", " # defining the ode equation\n", " def ode_equation(input_, output_):\n", " u_x = grad(output_, input_, components=['u'], d=['x'])\n", " u = output_.extract(['u'])\n", " return u_x - u\n", "\n", " # conditions to hold\n", " conditions = {\n", " 'x0': Condition(location=CartesianDomain({'x': 0.}), equation=FixedValue(1)), # We fix initial condition to value 1\n", " 'D': Condition(location=CartesianDomain({'x': [0, 1]}), equation=Equation(ode_equation)), # We wrap the python equation using Equation\n", " }\n", "\n", " # defining the true solution\n", " def truth_solution(self, pts):\n", " return torch.exp(pts.extract(['x']))\n", " \n", "\n", "# sampling for training\n", "problem = SimpleODE()\n", "problem.discretise_domain(1, 'random', locations=['x0'])\n", "problem.discretise_domain(20, 'lh', locations=['D'])\n", "\n", "# build the model\n", "model = FeedForward(\n", " layers=[10, 10],\n", " func=torch.nn.Tanh,\n", " output_dimensions=len(problem.output_variables),\n", " input_dimensions=len(problem.input_variables)\n", ")\n", "\n", "# create the PINN object\n", "pinn = PINN(problem, model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Till now we just followed the extact step of the previous tutorials. The `Trainer` object\n", "can be initialized by simiply passing the `PINN` solver" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "trainer = Trainer(solver=pinn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trainer Accelerator\n", "\n", "When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow:\n", "1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)\n", "2. [IPU](https://www.graphcore.ai/products/ipu)\n", "3. [HPU](https://habana.ai/)\n", "4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)\n", "5. CPU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For setting manually the `accelerator` run:\n", "\n", "* `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "trainer = Trainer(solver=pinn,\n", " accelerator='cpu')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trainer Logging\n", "\n", "In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.\n", "\n", "However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).\n", "\n", "We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 8: 100%|██████████| 1/1 [00:00<00:00, 232.78it/s, v_num=6, x0_loss=0.436, D_loss=0.129, mean_loss=0.283] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 222.52it/s, v_num=6, x0_loss=1.48e-5, D_loss=0.000655, mean_loss=0.000335]" ] }, { "name": "stderr", "output_type": "stream", "text": [ "`Trainer.fit` stopped: `max_epochs=1000` reached.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 133.46it/s, v_num=6, x0_loss=1.48e-5, D_loss=0.000655, mean_loss=0.000335]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 274.80it/s, v_num=7, x0_loss=6.21e-6, D_loss=0.000221, mean_loss=0.000114]" ] }, { "name": "stderr", "output_type": "stream", "text": [ "`Trainer.fit` stopped: `max_epochs=1000` reached.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 154.49it/s, v_num=7, x0_loss=6.21e-6, D_loss=0.000221, mean_loss=0.000114]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 78.56it/s, v_num=8, x0_loss=1.44e-5, D_loss=0.000572, mean_loss=0.000293] " ] }, { "name": "stderr", "output_type": "stream", "text": [ "`Trainer.fit` stopped: `max_epochs=1000` reached.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 62.60it/s, v_num=8, x0_loss=1.44e-5, D_loss=0.000572, mean_loss=0.000293]\n" ] } ], "source": [ "from pytorch_lightning.loggers import TensorBoardLogger\n", "\n", "# three run of training, by default it trains for 1000 epochs\n", "# we reinitialize the model each time otherwise the same parameters will be optimized\n", "for _ in range(3):\n", " model = FeedForward(\n", " layers=[10, 10],\n", " func=torch.nn.Tanh,\n", " output_dimensions=len(problem.output_variables),\n", " input_dimensions=len(problem.input_variables)\n", " )\n", " pinn = PINN(problem, model)\n", " trainer = Trainer(solver=pinn,\n", " accelerator='cpu',\n", " logger=TensorBoardLogger(save_dir='simpleode'),\n", " enable_model_summary=False)\n", " trainer.train()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now visualize the logs by simply running `tensorboard --logdir=simpleode/` on terminal, you should obtain a webpage as the one shown below:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n",
"
\n",
"