Update Tutorials (#544)

* update tutorials
* tutorial guidelines
* doc
This commit is contained in:
Dario Coscia
2025-04-23 16:19:07 +02:00
parent 7e403acf58
commit 29b14ee9b6
45 changed files with 6279 additions and 6726 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 204 KiB

View File

@@ -4,17 +4,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: PINA and PyTorch Lightning, training tips and visualizations \n",
"\n",
"# Tutorial: Introduction to `Trainer` class\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mathLab/PINA/blob/master/tutorials/tutorial11/tutorial.ipynb)\n",
"\n",
"In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers). \n",
"\n",
"The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team!\n",
"\n",
"Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial.\n",
"Our leading example will revolve around solving a simple regression problem where we want to approximate the following function with a Neural Net model $\\mathcal{M}_{\\theta}$:\n",
"$$y = x^3$$\n",
"by having only a set of $20$ observations $\\{x_i, y_i\\}_{i=1}^{20}$, with $x_i \\sim\\mathcal{U}[-3, 3]\\;\\;\\forall i\\in(1,\\dots,20)$.\n",
"\n",
"Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver."
"Let's start by importing useful modules!"
]
},
{
@@ -30,18 +31,15 @@
"except:\n",
" IN_COLAB = False\n",
"if IN_COLAB:\n",
" !pip install \"pina-mathlab\"\n",
" !pip install \"pina-mathlab[tutorial]\"\n",
"\n",
"import torch\n",
"import warnings\n",
"\n",
"from pina import Condition, Trainer\n",
"from pina.solver import PINN\n",
"from pina import Trainer\n",
"from pina.solver import SupervisedSolver\n",
"from pina.model import FeedForward\n",
"from pina.problem import SpatialProblem\n",
"from pina.operator import grad\n",
"from pina.domain import CartesianDomain\n",
"from pina.equation import Equation, FixedValue\n",
"from pina.problem.zoo import SupervisedProblem\n",
"\n",
"warnings.filterwarnings(\"ignore\")"
]
@@ -59,55 +57,22 @@
"metadata": {},
"outputs": [],
"source": [
"# defining the ode equation\n",
"def ode_equation(input_, output_):\n",
"# defining the problem\n",
"x_train = torch.empty((20, 1)).uniform_(-3, 3)\n",
"y_train = x_train.pow(3) + 3 * torch.randn_like(x_train)\n",
"\n",
" # computing the derivative\n",
" u_x = grad(output_, input_, components=[\"u\"], d=[\"x\"])\n",
"\n",
" # extracting the u input variable\n",
" u = output_.extract([\"u\"])\n",
"\n",
" # calculate the residual and return it\n",
" return u_x - u\n",
"\n",
"\n",
"class SimpleODE(SpatialProblem):\n",
"\n",
" output_variables = [\"u\"]\n",
" spatial_domain = CartesianDomain({\"x\": [0, 1]})\n",
"\n",
" domains = {\n",
" \"x0\": CartesianDomain({\"x\": 0.0}),\n",
" \"D\": CartesianDomain({\"x\": [0, 1]}),\n",
" }\n",
"\n",
" # conditions to hold\n",
" conditions = {\n",
" \"bound_cond\": Condition(domain=\"x0\", equation=FixedValue(1.0)),\n",
" \"phys_cond\": Condition(domain=\"D\", equation=Equation(ode_equation)),\n",
" }\n",
"\n",
" # defining the true solution\n",
" def solution(self, pts):\n",
" return torch.exp(pts.extract([\"x\"]))\n",
"\n",
"\n",
"# sampling for training\n",
"problem = SimpleODE()\n",
"problem.discretise_domain(1, \"random\", domains=[\"x0\"])\n",
"problem.discretise_domain(20, \"lh\", domains=[\"D\"])\n",
"problem = SupervisedProblem(x_train, y_train)\n",
"\n",
"# build the model\n",
"model = FeedForward(\n",
" layers=[10, 10],\n",
" func=torch.nn.Tanh,\n",
" output_dimensions=len(problem.output_variables),\n",
" input_dimensions=len(problem.input_variables),\n",
" output_dimensions=1,\n",
" input_dimensions=1,\n",
")\n",
"\n",
"# create the PINN object\n",
"pinn = PINN(problem, model)"
"# create the SupervisedSolver object\n",
"solver = SupervisedSolver(problem, model, use_lt=False)"
]
},
{
@@ -115,7 +80,7 @@
"metadata": {},
"source": [
"Till now we just followed the extact step of the previous tutorials. The `Trainer` object\n",
"can be initialized by simiply passing the `PINN` solver"
"can be initialized by simiply passing the `SupervisedSolver` solver"
]
},
{
@@ -134,7 +99,7 @@
}
],
"source": [
"trainer = Trainer(solver=pinn)"
"trainer = Trainer(solver=solver)"
]
},
{
@@ -143,18 +108,13 @@
"source": [
"## Trainer Accelerator\n",
"\n",
"When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow:\n",
"When creating the `Trainer`, **by default** the most performing `accelerator` for training which is available in your system will be chosen, ranked as follows:\n",
"1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)\n",
"2. [IPU](https://www.graphcore.ai/products/ipu)\n",
"3. [HPU](https://habana.ai/)\n",
"4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)\n",
"5. CPU"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"5. CPU\n",
"\n",
"For setting manually the `accelerator` run:\n",
"\n",
"* `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one"
@@ -162,7 +122,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 15,
"metadata": {},
"outputs": [
{
@@ -176,14 +136,14 @@
}
],
"source": [
"trainer = Trainer(solver=pinn, accelerator=\"cpu\")"
"trainer = Trainer(solver=solver, accelerator=\"cpu\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`."
"As you can see, even if a `GPU` is available on the system, it is not used since we set `accelerator='cpu'`."
]
},
{
@@ -192,16 +152,16 @@
"source": [
"## Trainer Logging\n",
"\n",
"In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.\n",
"In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTracker` class from `pina.callbacks`, as seen in the [*Introduction to Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.\n",
"\n",
"However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).\n",
"However, especially when we need to train multiple times to get an average of the loss across multiple runs, `lightning.pytorch.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).\n",
"\n",
"We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed.\n"
"We will now import `TensorBoardLogger`, do three runs of training, and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters); set it to `True` if needed."
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 17,
"metadata": {},
"outputs": [
{
@@ -209,113 +169,108 @@
"output_type": "stream",
"text": [
"GPU available: True (mps), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"TPU available: False, using: 0 TPU cores\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"HPU available: False, using: 0 HPUs\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 233.15it/s, v_num=0, bound_cond_loss=1.22e-5, phys_cond_loss=0.000517, train_loss=0.000529]"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 137.95it/s, v_num=0, bound_cond_loss=1.22e-5, phys_cond_loss=0.000517, train_loss=0.000529]\n"
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "775a2d088e304b2589631b176c9e99e2",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=100` reached.\n",
"GPU available: True (mps), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 248.63it/s, v_num=1, bound_cond_loss=2.29e-5, phys_cond_loss=0.00106, train_loss=0.00108] "
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 149.06it/s, v_num=1, bound_cond_loss=2.29e-5, phys_cond_loss=0.00106, train_loss=0.00108]\n"
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d858dc0a31214f5f86aae78823525b56",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=100` reached.\n",
"GPU available: True (mps), used: False\n",
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 254.65it/s, v_num=2, bound_cond_loss=0.00029, phys_cond_loss=0.00253, train_loss=0.00282] "
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "739bf2009f7a48a1b59b7df695276672",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 150.72it/s, v_num=2, bound_cond_loss=0.00029, phys_cond_loss=0.00253, train_loss=0.00282]\n"
"`Trainer.fit` stopped: `max_epochs=100` reached.\n"
]
}
],
"source": [
"from lightning.pytorch.loggers import TensorBoardLogger\n",
"\n",
"# three run of training, by default it trains for 1000 epochs\n",
"# three run of training, by default it trains for 1000 epochs, we set the max to 100\n",
"# we reinitialize the model each time otherwise the same parameters will be optimized\n",
"for _ in range(3):\n",
" model = FeedForward(\n",
" layers=[10, 10],\n",
" func=torch.nn.Tanh,\n",
" output_dimensions=len(problem.output_variables),\n",
" input_dimensions=len(problem.input_variables),\n",
" output_dimensions=1,\n",
" input_dimensions=1,\n",
" )\n",
" pinn = PINN(problem, model)\n",
" solver = SupervisedSolver(problem, model, use_lt=False)\n",
" trainer = Trainer(\n",
" solver=pinn,\n",
" solver=solver,\n",
" accelerator=\"cpu\",\n",
" logger=TensorBoardLogger(save_dir=\"training_log\"),\n",
" enable_model_summary=False,\n",
" train_size=1.0,\n",
" val_size=0.0,\n",
" test_size=0.0,\n",
" max_epochs=100\n",
" )\n",
" trainer.train()"
]
@@ -324,7 +279,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now visualize the logs by simply running `tensorboard --logdir=training_log/` on terminal, you should obtain a webpage as the one shown below:"
"We can now visualize the logs by simply running `tensorboard --logdir=training_log/` in the terminal. You should obtain a webpage similar to the one shown below if running for 1000 epochs:"
]
},
{
@@ -332,7 +287,7 @@
"metadata": {},
"source": [
"<p align=\\\"center\\\">\n",
"<img src=\"logging.png\" alt=\\\"Logging API\\\" width=\\\"400\\\"/>\n",
"<img src=\"../static/logging.png\" alt=\\\"Logging API\\\" width=\\\"400\\\"/>\n",
"</p>"
]
},
@@ -340,39 +295,28 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"as you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modify the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Trainer Callbacks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever we need to access certain steps of the training for logging, do static modifications (i.e. not changing the `Solver`) or updating `Problem` hyperparameters (static variables), we can use `Callabacks`. Notice that `Callbacks` allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.\n",
"Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.\n",
"As you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modifying the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks)).\n",
"\n",
"The following are best practices when using/designing callbacks.\n",
"## Trainer Callbacks\n",
"\n",
"Whenever we need to access certain steps of the training for logging, perform static modifications (i.e. not changing the `Solver`), or update `Problem` hyperparameters (static variables), we can use **Callbacks**. Notice that **Callbacks** allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.\n",
"\n",
"Lightning has a callback system to execute them when needed. **Callbacks** should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.\n",
"\n",
"The following are best practices when using/designing callbacks:\n",
"\n",
"* Callbacks should be isolated in their functionality.\n",
"* Your callback should not rely on the behavior of other callbacks in order to work properly.\n",
"* Do not manually call methods from the callback.\n",
"* Directly calling methods (eg. on_validation_end) is strongly discouraged.\n",
"* Directly calling methods (e.g., on_validation_end) is strongly discouraged.\n",
"* Whenever possible, your callbacks should not depend on the order in which they are executed.\n",
"\n",
"We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`.\n",
"\n",
"<!-- Suppose we want to log the accuracy on some validation poit -->"
"We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
@@ -398,12 +342,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see the results when applyed to the `SimpleODE` problem. You can define callbacks when initializing the `Trainer` by the `callbacks` argument, which expects a list of callbacks. "
"Let's see the results when applied to the problem. You can define **callbacks** when initializing the `Trainer` by using the `callbacks` argument, which expects a list of callbacks.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 19,
"metadata": {},
"outputs": [
{
@@ -416,24 +360,24 @@
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 278.93it/s, v_num=0, bound_cond_loss=6.94e-5, phys_cond_loss=0.00116, train_loss=0.00123] "
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "f38442d749ad4702a0c99715ecf08c59",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 140.62it/s, v_num=0, bound_cond_loss=6.94e-5, phys_cond_loss=0.00116, train_loss=0.00123]\n"
"`Trainer.fit` stopped: `max_epochs=10` reached.\n"
]
}
],
@@ -441,12 +385,12 @@
"model = FeedForward(\n",
" layers=[10, 10],\n",
" func=torch.nn.Tanh,\n",
" output_dimensions=len(problem.output_variables),\n",
" input_dimensions=len(problem.input_variables),\n",
" output_dimensions=1,\n",
" input_dimensions=1,\n",
")\n",
"pinn = PINN(problem, model)\n",
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
"trainer = Trainer(\n",
" solver=pinn,\n",
" solver=solver,\n",
" accelerator=\"cpu\",\n",
" logger=True,\n",
" callbacks=[NaiveMetricTracker()], # adding a callbacks\n",
@@ -454,6 +398,7 @@
" train_size=1.0,\n",
" val_size=0.0,\n",
" test_size=0.0,\n",
" max_epochs=10, # training only for 10 epochs\n",
")\n",
"trainer.train()"
]
@@ -467,24 +412,18 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'bound_cond_loss': tensor(0.9935),\n",
" 'phys_cond_loss': tensor(0.0303),\n",
" 'train_loss': tensor(1.0239)},\n",
" {'bound_cond_loss': tensor(0.9875),\n",
" 'phys_cond_loss': tensor(0.0293),\n",
" 'train_loss': tensor(1.0169)},\n",
" {'bound_cond_loss': tensor(0.9815),\n",
" 'phys_cond_loss': tensor(0.0284),\n",
" 'train_loss': tensor(1.0099)}]"
"[{'data_loss': tensor(126.2887), 'train_loss': tensor(126.2887)},\n",
" {'data_loss': tensor(126.2346), 'train_loss': tensor(126.2346)},\n",
" {'data_loss': tensor(126.1805), 'train_loss': tensor(126.1805)}]"
]
},
"execution_count": 8,
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
@@ -497,14 +436,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"PyTorch Lightning also has some built in `Callbacks` which can be used in **PINA**, [here an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks). \n",
"PyTorch Lightning also has some built-in `Callbacks` which can be used in **PINA**, [here is an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks). \n",
"\n",
"We can for example try the `EarlyStopping` routine, which automatically stops the training when a specific metric converged (here the `train_loss`). In order to let the training keep going forever set `max_epochs=-1`."
"We can, for example, try the `EarlyStopping` routine, which automatically stops the training when a specific metric converges (here the `train_loss`). In order to let the training keep going forever, set `max_epochs=-1`."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 22,
"metadata": {},
"outputs": [
{
@@ -515,25 +454,18 @@
"TPU available: False, using: 0 TPU cores\n",
"HPU available: False, using: 0 HPUs\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 2343: 100%|██████████| 1/1 [00:00<00:00, 64.24it/s, v_num=1, val_loss=4.79e-6, bound_cond_loss=1.15e-7, phys_cond_loss=2.33e-5, train_loss=2.34e-5] \n"
]
}
],
"source": [
"model = FeedForward(\n",
" layers=[10, 10],\n",
" func=torch.nn.Tanh,\n",
" output_dimensions=len(problem.output_variables),\n",
" input_dimensions=len(problem.input_variables),\n",
" output_dimensions=1,\n",
" input_dimensions=1,\n",
")\n",
"pinn = PINN(problem, model)\n",
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
"trainer = Trainer(\n",
" solver=pinn,\n",
" solver=solver,\n",
" accelerator=\"cpu\",\n",
" max_epochs=-1,\n",
" enable_model_summary=False,\n",
@@ -559,24 +491,23 @@
"source": [
"## Trainer Tips to Boost Accuracy, Save Memory and Speed Up Training\n",
"\n",
"Untill now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points by `callbacks`.\n",
"Now, we well focus on how boost your training by saving memory and speeding it up, while mantaining the same or even better degree of accuracy!\n",
"Until now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points via `callbacks`.\n",
"Now, we will focus on how to boost your training by saving memory and speeding it up, while maintaining the same or even better degree of accuracy!\n",
"\n",
"\n",
"There are several built in methods developed in PyTorch Lightning which can be applied straight forward in **PINA**, here we report some:\n",
"There are several built-in methods developed in PyTorch Lightning which can be applied straightforward in **PINA**. Here we report some:\n",
"\n",
"* [Stochastic Weight Averaging](https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/) to boost accuracy\n",
"* [Gradient Clippling](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy)\n",
"* [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption \n",
"* [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption \n",
"* [Gradient Clipping](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy)\n",
"* [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption\n",
"* [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption\n",
"\n",
"We will just demonstrate how to use the first two, and see the results compared to a standard training.\n",
"We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to take the times. Let's start by training a simple model without any optimization (train for 2000 epochs)."
"We will just demonstrate how to use the first two and see the results compared to standard training.\n",
"We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to track the times. Let's start by training a simple model without any optimization (train for 500 epochs)."
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 23,
"metadata": {},
"outputs": [
{
@@ -590,25 +521,31 @@
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 156.69it/s, v_num=2, bound_cond_loss=1.53e-6, phys_cond_loss=0.000169, train_loss=0.000171]"
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "822b8c60e73f49a486d3d702d413d6ff",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=2000` reached.\n"
"`Trainer.fit` stopped: `max_epochs=500` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 108.75it/s, v_num=2, bound_cond_loss=1.53e-6, phys_cond_loss=0.000169, train_loss=0.000171]\n",
"Total training time 15.36648 s\n"
"Total training time 15.49781 s\n"
]
}
],
@@ -622,16 +559,16 @@
"model = FeedForward(\n",
" layers=[10, 10],\n",
" func=torch.nn.Tanh,\n",
" output_dimensions=len(problem.output_variables),\n",
" input_dimensions=len(problem.input_variables),\n",
" output_dimensions=1,\n",
" input_dimensions=1,\n",
")\n",
"\n",
"pinn = PINN(problem, model)\n",
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
"trainer = Trainer(\n",
" solver=pinn,\n",
" solver=solver,\n",
" accelerator=\"cpu\",\n",
" deterministic=True, # setting deterministic=True ensure reproducibility when a seed is imposed\n",
" max_epochs=2000,\n",
" max_epochs=500,\n",
" enable_model_summary=False,\n",
" callbacks=[Timer()],\n",
") # adding a callbacks\n",
@@ -643,12 +580,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we do the same but with StochasticWeightAveraging"
"Now we do the same but with `StochasticWeightAveraging` enabled"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 24,
"metadata": {},
"outputs": [
{
@@ -662,39 +599,32 @@
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 224.16it/s, v_num=3, bound_cond_loss=5.7e-6, phys_cond_loss=0.000257, train_loss=0.000263] "
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dc5f3b47abff4facae7a60d0871f3bfe",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Swapping scheduler `ConstantLR` for `SWALR`\n"
"Swapping scheduler `ConstantLR` for `SWALR`\n",
"`Trainer.fit` stopped: `max_epochs=500` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 261.43it/s, v_num=3, bound_cond_loss=2.58e-7, phys_cond_loss=9.4e-5, train_loss=9.43e-5] "
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=2000` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 145.96it/s, v_num=3, bound_cond_loss=2.58e-7, phys_cond_loss=9.4e-5, train_loss=9.43e-5]\n",
"Total training time 17.78182 s\n"
"Total training time 15.52474 s\n"
]
}
],
@@ -707,15 +637,15 @@
"model = FeedForward(\n",
" layers=[10, 10],\n",
" func=torch.nn.Tanh,\n",
" output_dimensions=len(problem.output_variables),\n",
" input_dimensions=len(problem.input_variables),\n",
" output_dimensions=1,\n",
" input_dimensions=1,\n",
")\n",
"pinn = PINN(problem, model)\n",
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
"trainer = Trainer(\n",
" solver=pinn,\n",
" solver=solver,\n",
" accelerator=\"cpu\",\n",
" deterministic=True,\n",
" max_epochs=2000,\n",
" max_epochs=500,\n",
" enable_model_summary=False,\n",
" callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],\n",
") # adding StochasticWeightAveraging callbacks\n",
@@ -727,16 +657,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, the training time does not change at all! Notice that around epoch `1600`\n",
"As you can see, the training time does not change at all! Notice that around epoch 350\n",
"the scheduler is switched from the defalut one `ConstantLR` to the Stochastic Weight Average Learning Rate (`SWALR`).\n",
"This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `mean_loss` is lower when `StochasticWeightAveraging` is used.\n",
"This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `train_loss` is lower when `StochasticWeightAveraging` is used.\n",
"\n",
"We will now now do the same but clippling the gradient to be relatively small."
"We will now do the same but clippling the gradient to be relatively small."
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 25,
"metadata": {},
"outputs": [
{
@@ -750,39 +680,32 @@
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 251.76it/s, v_num=4, bound_cond_loss=5.98e-8, phys_cond_loss=3.88e-5, train_loss=3.88e-5] "
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d475613ad7f34fe6abd182eed8907004",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: | | 0/? [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Swapping scheduler `ConstantLR` for `SWALR`\n"
"Swapping scheduler `ConstantLR` for `SWALR`\n",
"`Trainer.fit` stopped: `max_epochs=500` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 239.11it/s, v_num=4, bound_cond_loss=0.000333, phys_cond_loss=0.000676, train_loss=0.00101] "
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"`Trainer.fit` stopped: `max_epochs=2000` reached.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 127.88it/s, v_num=4, bound_cond_loss=0.000333, phys_cond_loss=0.000676, train_loss=0.00101]\n",
"Total training time 15.12576 s\n"
"Total training time 15.94719 s\n"
]
}
],
@@ -793,14 +716,14 @@
"model = FeedForward(\n",
" layers=[10, 10],\n",
" func=torch.nn.Tanh,\n",
" output_dimensions=len(problem.output_variables),\n",
" input_dimensions=len(problem.input_variables),\n",
" output_dimensions=1,\n",
" input_dimensions=1,\n",
")\n",
"pinn = PINN(problem, model)\n",
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
"trainer = Trainer(\n",
" solver=pinn,\n",
" solver=solver,\n",
" accelerator=\"cpu\",\n",
" max_epochs=2000,\n",
" max_epochs=500,\n",
" enable_model_summary=False,\n",
" gradient_clip_val=0.1, # clipping the gradient\n",
" callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],\n",
@@ -813,17 +736,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see we by applying gradient clipping we were able to even obtain lower error!\n",
"As we can see, by applying gradient clipping, we were able to achieve even lower error!\n",
"\n",
"## What's next?\n",
"## What's Next?\n",
"\n",
"Now you know how to use efficiently the `Trainer` class **PINA**! There are multiple directions you can go now:\n",
"Now you know how to use the `Trainer` class efficiently in **PINA**! There are several directions you can explore next:\n",
"\n",
"1. Explore training times on different devices (e.g.) `TPU` \n",
"1. **Explore Training on Different Devices**: Test training times on various devices (e.g., `TPU`) to compare performance.\n",
"\n",
"2. Try to reduce memory cost by mixed precision training and gradient accumulation (especially useful when training Neural Operators)\n",
"2. **Reduce Memory Costs**: Experiment with mixed precision training and gradient accumulation to optimize memory usage, especially when training Neural Operators.\n",
"\n",
"3. Benchmark `Trainer` speed for different precisions."
"3. **Benchmark `Trainer` Speed**: Benchmark the training speed of the `Trainer` class for different precisions to identify potential optimizations.\n",
"\n",
"4. **...and many more!**: Consider expanding to **multi-GPU** setups or other advanced configurations for large-scale training.\n",
"\n",
"For more resources and tutorials, check out the [PINA Documentation](https://mathlab.github.io/PINA/).\n"
]
}
],

View File

@@ -1,388 +0,0 @@
#!/usr/bin/env python
# coding: utf-8
# # Tutorial: PINA and PyTorch Lightning, training tips and visualizations
#
# [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mathLab/PINA/blob/master/tutorials/tutorial11/tutorial.ipynb)
#
# In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers).
#
# The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team!
#
# Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial.
#
# Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver.
# In[ ]:
try:
import google.colab
IN_COLAB = True
except:
IN_COLAB = False
if IN_COLAB:
get_ipython().system('pip install "pina-mathlab"')
import torch
import warnings
from pina import Condition, Trainer
from pina.solver import PINN
from pina.model import FeedForward
from pina.problem import SpatialProblem
from pina.operator import grad
from pina.domain import CartesianDomain
from pina.equation import Equation, FixedValue
warnings.filterwarnings("ignore")
# Define problem and solver.
# In[2]:
# defining the ode equation
def ode_equation(input_, output_):
# computing the derivative
u_x = grad(output_, input_, components=["u"], d=["x"])
# extracting the u input variable
u = output_.extract(["u"])
# calculate the residual and return it
return u_x - u
class SimpleODE(SpatialProblem):
output_variables = ["u"]
spatial_domain = CartesianDomain({"x": [0, 1]})
domains = {
"x0": CartesianDomain({"x": 0.0}),
"D": CartesianDomain({"x": [0, 1]}),
}
# conditions to hold
conditions = {
"bound_cond": Condition(domain="x0", equation=FixedValue(1.0)),
"phys_cond": Condition(domain="D", equation=Equation(ode_equation)),
}
# defining the true solution
def solution(self, pts):
return torch.exp(pts.extract(["x"]))
# sampling for training
problem = SimpleODE()
problem.discretise_domain(1, "random", domains=["x0"])
problem.discretise_domain(20, "lh", domains=["D"])
# build the model
model = FeedForward(
layers=[10, 10],
func=torch.nn.Tanh,
output_dimensions=len(problem.output_variables),
input_dimensions=len(problem.input_variables),
)
# create the PINN object
pinn = PINN(problem, model)
# Till now we just followed the extact step of the previous tutorials. The `Trainer` object
# can be initialized by simiply passing the `PINN` solver
# In[3]:
trainer = Trainer(solver=pinn)
# ## Trainer Accelerator
#
# When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow:
# 1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)
# 2. [IPU](https://www.graphcore.ai/products/ipu)
# 3. [HPU](https://habana.ai/)
# 4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)
# 5. CPU
# For setting manually the `accelerator` run:
#
# * `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one
# In[4]:
trainer = Trainer(solver=pinn, accelerator="cpu")
# as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`.
# ## Trainer Logging
#
# In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.
#
# However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).
#
# We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed.
#
# In[5]:
from lightning.pytorch.loggers import TensorBoardLogger
# three run of training, by default it trains for 1000 epochs
# we reinitialize the model each time otherwise the same parameters will be optimized
for _ in range(3):
model = FeedForward(
layers=[10, 10],
func=torch.nn.Tanh,
output_dimensions=len(problem.output_variables),
input_dimensions=len(problem.input_variables),
)
pinn = PINN(problem, model)
trainer = Trainer(
solver=pinn,
accelerator="cpu",
logger=TensorBoardLogger(save_dir="training_log"),
enable_model_summary=False,
train_size=1.0,
val_size=0.0,
test_size=0.0,
)
trainer.train()
# We can now visualize the logs by simply running `tensorboard --logdir=training_log/` on terminal, you should obtain a webpage as the one shown below:
# <p align=\"center\">
# <img src="logging.png" alt=\"Logging API\" width=\"400\"/>
# </p>
# as you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modify the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks)).
# ## Trainer Callbacks
# Whenever we need to access certain steps of the training for logging, do static modifications (i.e. not changing the `Solver`) or updating `Problem` hyperparameters (static variables), we can use `Callabacks`. Notice that `Callbacks` allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.
# Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.
#
# The following are best practices when using/designing callbacks.
#
# * Callbacks should be isolated in their functionality.
# * Your callback should not rely on the behavior of other callbacks in order to work properly.
# * Do not manually call methods from the callback.
# * Directly calling methods (eg. on_validation_end) is strongly discouraged.
# * Whenever possible, your callbacks should not depend on the order in which they are executed.
#
# We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`.
#
# <!-- Suppose we want to log the accuracy on some validation poit -->
# In[6]:
from lightning.pytorch.callbacks import Callback
from lightning.pytorch.callbacks import EarlyStopping
import torch
# define a simple callback
class NaiveMetricTracker(Callback):
def __init__(self):
self.saved_metrics = []
def on_train_epoch_end(
self, trainer, __
): # function called at the end of each epoch
self.saved_metrics.append(
{key: value for key, value in trainer.logged_metrics.items()}
)
# Let's see the results when applyed to the `SimpleODE` problem. You can define callbacks when initializing the `Trainer` by the `callbacks` argument, which expects a list of callbacks.
# In[7]:
model = FeedForward(
layers=[10, 10],
func=torch.nn.Tanh,
output_dimensions=len(problem.output_variables),
input_dimensions=len(problem.input_variables),
)
pinn = PINN(problem, model)
trainer = Trainer(
solver=pinn,
accelerator="cpu",
logger=True,
callbacks=[NaiveMetricTracker()], # adding a callbacks
enable_model_summary=False,
train_size=1.0,
val_size=0.0,
test_size=0.0,
)
trainer.train()
# We can easily access the data by calling `trainer.callbacks[0].saved_metrics` (notice the zero representing the first callback in the list given at initialization).
# In[8]:
trainer.callbacks[0].saved_metrics[:3] # only the first three epochs
# PyTorch Lightning also has some built in `Callbacks` which can be used in **PINA**, [here an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks).
#
# We can for example try the `EarlyStopping` routine, which automatically stops the training when a specific metric converged (here the `train_loss`). In order to let the training keep going forever set `max_epochs=-1`.
# In[ ]:
model = FeedForward(
layers=[10, 10],
func=torch.nn.Tanh,
output_dimensions=len(problem.output_variables),
input_dimensions=len(problem.input_variables),
)
pinn = PINN(problem, model)
trainer = Trainer(
solver=pinn,
accelerator="cpu",
max_epochs=-1,
enable_model_summary=False,
enable_progress_bar=False,
val_size=0.2,
train_size=0.8,
test_size=0.0,
callbacks=[EarlyStopping("val_loss")],
) # adding a callbacks
trainer.train()
# As we can see the model automatically stop when the logging metric stopped improving!
# ## Trainer Tips to Boost Accuracy, Save Memory and Speed Up Training
#
# Untill now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points by `callbacks`.
# Now, we well focus on how boost your training by saving memory and speeding it up, while mantaining the same or even better degree of accuracy!
#
#
# There are several built in methods developed in PyTorch Lightning which can be applied straight forward in **PINA**, here we report some:
#
# * [Stochastic Weight Averaging](https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/) to boost accuracy
# * [Gradient Clippling](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy)
# * [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption
# * [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption
#
# We will just demonstrate how to use the first two, and see the results compared to a standard training.
# We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to take the times. Let's start by training a simple model without any optimization (train for 2000 epochs).
# In[10]:
from lightning.pytorch.callbacks import Timer
from lightning.pytorch import seed_everything
# setting the seed for reproducibility
seed_everything(42, workers=True)
model = FeedForward(
layers=[10, 10],
func=torch.nn.Tanh,
output_dimensions=len(problem.output_variables),
input_dimensions=len(problem.input_variables),
)
pinn = PINN(problem, model)
trainer = Trainer(
solver=pinn,
accelerator="cpu",
deterministic=True, # setting deterministic=True ensure reproducibility when a seed is imposed
max_epochs=2000,
enable_model_summary=False,
callbacks=[Timer()],
) # adding a callbacks
trainer.train()
print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s')
# Now we do the same but with StochasticWeightAveraging
# In[11]:
from lightning.pytorch.callbacks import StochasticWeightAveraging
# setting the seed for reproducibility
seed_everything(42, workers=True)
model = FeedForward(
layers=[10, 10],
func=torch.nn.Tanh,
output_dimensions=len(problem.output_variables),
input_dimensions=len(problem.input_variables),
)
pinn = PINN(problem, model)
trainer = Trainer(
solver=pinn,
accelerator="cpu",
deterministic=True,
max_epochs=2000,
enable_model_summary=False,
callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],
) # adding StochasticWeightAveraging callbacks
trainer.train()
print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s')
# As you can see, the training time does not change at all! Notice that around epoch `1600`
# the scheduler is switched from the defalut one `ConstantLR` to the Stochastic Weight Average Learning Rate (`SWALR`).
# This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `mean_loss` is lower when `StochasticWeightAveraging` is used.
#
# We will now now do the same but clippling the gradient to be relatively small.
# In[12]:
# setting the seed for reproducibility
seed_everything(42, workers=True)
model = FeedForward(
layers=[10, 10],
func=torch.nn.Tanh,
output_dimensions=len(problem.output_variables),
input_dimensions=len(problem.input_variables),
)
pinn = PINN(problem, model)
trainer = Trainer(
solver=pinn,
accelerator="cpu",
max_epochs=2000,
enable_model_summary=False,
gradient_clip_val=0.1, # clipping the gradient
callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],
)
trainer.train()
print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s')
# As we can see we by applying gradient clipping we were able to even obtain lower error!
#
# ## What's next?
#
# Now you know how to use efficiently the `Trainer` class **PINA**! There are multiple directions you can go now:
#
# 1. Explore training times on different devices (e.g.) `TPU`
#
# 2. Try to reduce memory cost by mixed precision training and gradient accumulation (especially useful when training Neural Operators)
#
# 3. Benchmark `Trainer` speed for different precisions.