BIN
tutorials/tutorial11/logging.png
vendored
BIN
tutorials/tutorial11/logging.png
vendored
Binary file not shown.
|
Before Width: | Height: | Size: 204 KiB |
493
tutorials/tutorial11/tutorial.ipynb
vendored
493
tutorials/tutorial11/tutorial.ipynb
vendored
@@ -4,17 +4,18 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial: PINA and PyTorch Lightning, training tips and visualizations \n",
|
||||
"\n",
|
||||
"# Tutorial: Introduction to `Trainer` class\n",
|
||||
"[](https://colab.research.google.com/github/mathLab/PINA/blob/master/tutorials/tutorial11/tutorial.ipynb)\n",
|
||||
"\n",
|
||||
"In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers). \n",
|
||||
"\n",
|
||||
"The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team!\n",
|
||||
"\n",
|
||||
"Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial.\n",
|
||||
"Our leading example will revolve around solving a simple regression problem where we want to approximate the following function with a Neural Net model $\\mathcal{M}_{\\theta}$:\n",
|
||||
"$$y = x^3$$\n",
|
||||
"by having only a set of $20$ observations $\\{x_i, y_i\\}_{i=1}^{20}$, with $x_i \\sim\\mathcal{U}[-3, 3]\\;\\;\\forall i\\in(1,\\dots,20)$.\n",
|
||||
"\n",
|
||||
"Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver."
|
||||
"Let's start by importing useful modules!"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -30,18 +31,15 @@
|
||||
"except:\n",
|
||||
" IN_COLAB = False\n",
|
||||
"if IN_COLAB:\n",
|
||||
" !pip install \"pina-mathlab\"\n",
|
||||
" !pip install \"pina-mathlab[tutorial]\"\n",
|
||||
"\n",
|
||||
"import torch\n",
|
||||
"import warnings\n",
|
||||
"\n",
|
||||
"from pina import Condition, Trainer\n",
|
||||
"from pina.solver import PINN\n",
|
||||
"from pina import Trainer\n",
|
||||
"from pina.solver import SupervisedSolver\n",
|
||||
"from pina.model import FeedForward\n",
|
||||
"from pina.problem import SpatialProblem\n",
|
||||
"from pina.operator import grad\n",
|
||||
"from pina.domain import CartesianDomain\n",
|
||||
"from pina.equation import Equation, FixedValue\n",
|
||||
"from pina.problem.zoo import SupervisedProblem\n",
|
||||
"\n",
|
||||
"warnings.filterwarnings(\"ignore\")"
|
||||
]
|
||||
@@ -59,55 +57,22 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# defining the ode equation\n",
|
||||
"def ode_equation(input_, output_):\n",
|
||||
"# defining the problem\n",
|
||||
"x_train = torch.empty((20, 1)).uniform_(-3, 3)\n",
|
||||
"y_train = x_train.pow(3) + 3 * torch.randn_like(x_train)\n",
|
||||
"\n",
|
||||
" # computing the derivative\n",
|
||||
" u_x = grad(output_, input_, components=[\"u\"], d=[\"x\"])\n",
|
||||
"\n",
|
||||
" # extracting the u input variable\n",
|
||||
" u = output_.extract([\"u\"])\n",
|
||||
"\n",
|
||||
" # calculate the residual and return it\n",
|
||||
" return u_x - u\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class SimpleODE(SpatialProblem):\n",
|
||||
"\n",
|
||||
" output_variables = [\"u\"]\n",
|
||||
" spatial_domain = CartesianDomain({\"x\": [0, 1]})\n",
|
||||
"\n",
|
||||
" domains = {\n",
|
||||
" \"x0\": CartesianDomain({\"x\": 0.0}),\n",
|
||||
" \"D\": CartesianDomain({\"x\": [0, 1]}),\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" # conditions to hold\n",
|
||||
" conditions = {\n",
|
||||
" \"bound_cond\": Condition(domain=\"x0\", equation=FixedValue(1.0)),\n",
|
||||
" \"phys_cond\": Condition(domain=\"D\", equation=Equation(ode_equation)),\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" # defining the true solution\n",
|
||||
" def solution(self, pts):\n",
|
||||
" return torch.exp(pts.extract([\"x\"]))\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# sampling for training\n",
|
||||
"problem = SimpleODE()\n",
|
||||
"problem.discretise_domain(1, \"random\", domains=[\"x0\"])\n",
|
||||
"problem.discretise_domain(20, \"lh\", domains=[\"D\"])\n",
|
||||
"problem = SupervisedProblem(x_train, y_train)\n",
|
||||
"\n",
|
||||
"# build the model\n",
|
||||
"model = FeedForward(\n",
|
||||
" layers=[10, 10],\n",
|
||||
" func=torch.nn.Tanh,\n",
|
||||
" output_dimensions=len(problem.output_variables),\n",
|
||||
" input_dimensions=len(problem.input_variables),\n",
|
||||
" output_dimensions=1,\n",
|
||||
" input_dimensions=1,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# create the PINN object\n",
|
||||
"pinn = PINN(problem, model)"
|
||||
"# create the SupervisedSolver object\n",
|
||||
"solver = SupervisedSolver(problem, model, use_lt=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -115,7 +80,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Till now we just followed the extact step of the previous tutorials. The `Trainer` object\n",
|
||||
"can be initialized by simiply passing the `PINN` solver"
|
||||
"can be initialized by simiply passing the `SupervisedSolver` solver"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -134,7 +99,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"trainer = Trainer(solver=pinn)"
|
||||
"trainer = Trainer(solver=solver)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -143,18 +108,13 @@
|
||||
"source": [
|
||||
"## Trainer Accelerator\n",
|
||||
"\n",
|
||||
"When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow:\n",
|
||||
"When creating the `Trainer`, **by default** the most performing `accelerator` for training which is available in your system will be chosen, ranked as follows:\n",
|
||||
"1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)\n",
|
||||
"2. [IPU](https://www.graphcore.ai/products/ipu)\n",
|
||||
"3. [HPU](https://habana.ai/)\n",
|
||||
"4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)\n",
|
||||
"5. CPU"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"5. CPU\n",
|
||||
"\n",
|
||||
"For setting manually the `accelerator` run:\n",
|
||||
"\n",
|
||||
"* `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one"
|
||||
@@ -162,7 +122,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -176,14 +136,14 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"trainer = Trainer(solver=pinn, accelerator=\"cpu\")"
|
||||
"trainer = Trainer(solver=solver, accelerator=\"cpu\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`."
|
||||
"As you can see, even if a `GPU` is available on the system, it is not used since we set `accelerator='cpu'`."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -192,16 +152,16 @@
|
||||
"source": [
|
||||
"## Trainer Logging\n",
|
||||
"\n",
|
||||
"In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.\n",
|
||||
"In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTracker` class from `pina.callbacks`, as seen in the [*Introduction to Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.\n",
|
||||
"\n",
|
||||
"However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).\n",
|
||||
"However, especially when we need to train multiple times to get an average of the loss across multiple runs, `lightning.pytorch.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).\n",
|
||||
"\n",
|
||||
"We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed.\n"
|
||||
"We will now import `TensorBoardLogger`, do three runs of training, and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters); set it to `True` if needed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -209,113 +169,108 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"GPU available: True (mps), used: False\n",
|
||||
"TPU available: False, using: 0 TPU cores\n",
|
||||
"TPU available: False, using: 0 TPU cores\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"HPU available: False, using: 0 HPUs\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 233.15it/s, v_num=0, bound_cond_loss=1.22e-5, phys_cond_loss=0.000517, train_loss=0.000529]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 137.95it/s, v_num=0, bound_cond_loss=1.22e-5, phys_cond_loss=0.000517, train_loss=0.000529]\n"
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "775a2d088e304b2589631b176c9e99e2",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Training: | | 0/? [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=100` reached.\n",
|
||||
"GPU available: True (mps), used: False\n",
|
||||
"TPU available: False, using: 0 TPU cores\n",
|
||||
"HPU available: False, using: 0 HPUs\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 248.63it/s, v_num=1, bound_cond_loss=2.29e-5, phys_cond_loss=0.00106, train_loss=0.00108] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 149.06it/s, v_num=1, bound_cond_loss=2.29e-5, phys_cond_loss=0.00106, train_loss=0.00108]\n"
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "d858dc0a31214f5f86aae78823525b56",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Training: | | 0/? [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=100` reached.\n",
|
||||
"GPU available: True (mps), used: False\n",
|
||||
"TPU available: False, using: 0 TPU cores\n",
|
||||
"HPU available: False, using: 0 HPUs\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 254.65it/s, v_num=2, bound_cond_loss=0.00029, phys_cond_loss=0.00253, train_loss=0.00282] "
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "739bf2009f7a48a1b59b7df695276672",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Training: | | 0/? [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 150.72it/s, v_num=2, bound_cond_loss=0.00029, phys_cond_loss=0.00253, train_loss=0.00282]\n"
|
||||
"`Trainer.fit` stopped: `max_epochs=100` reached.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from lightning.pytorch.loggers import TensorBoardLogger\n",
|
||||
"\n",
|
||||
"# three run of training, by default it trains for 1000 epochs\n",
|
||||
"# three run of training, by default it trains for 1000 epochs, we set the max to 100\n",
|
||||
"# we reinitialize the model each time otherwise the same parameters will be optimized\n",
|
||||
"for _ in range(3):\n",
|
||||
" model = FeedForward(\n",
|
||||
" layers=[10, 10],\n",
|
||||
" func=torch.nn.Tanh,\n",
|
||||
" output_dimensions=len(problem.output_variables),\n",
|
||||
" input_dimensions=len(problem.input_variables),\n",
|
||||
" output_dimensions=1,\n",
|
||||
" input_dimensions=1,\n",
|
||||
" )\n",
|
||||
" pinn = PINN(problem, model)\n",
|
||||
" solver = SupervisedSolver(problem, model, use_lt=False)\n",
|
||||
" trainer = Trainer(\n",
|
||||
" solver=pinn,\n",
|
||||
" solver=solver,\n",
|
||||
" accelerator=\"cpu\",\n",
|
||||
" logger=TensorBoardLogger(save_dir=\"training_log\"),\n",
|
||||
" enable_model_summary=False,\n",
|
||||
" train_size=1.0,\n",
|
||||
" val_size=0.0,\n",
|
||||
" test_size=0.0,\n",
|
||||
" max_epochs=100\n",
|
||||
" )\n",
|
||||
" trainer.train()"
|
||||
]
|
||||
@@ -324,7 +279,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now visualize the logs by simply running `tensorboard --logdir=training_log/` on terminal, you should obtain a webpage as the one shown below:"
|
||||
"We can now visualize the logs by simply running `tensorboard --logdir=training_log/` in the terminal. You should obtain a webpage similar to the one shown below if running for 1000 epochs:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -332,7 +287,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<p align=\\\"center\\\">\n",
|
||||
"<img src=\"logging.png\" alt=\\\"Logging API\\\" width=\\\"400\\\"/>\n",
|
||||
"<img src=\"../static/logging.png\" alt=\\\"Logging API\\\" width=\\\"400\\\"/>\n",
|
||||
"</p>"
|
||||
]
|
||||
},
|
||||
@@ -340,39 +295,28 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"as you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modify the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks))."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Trainer Callbacks"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Whenever we need to access certain steps of the training for logging, do static modifications (i.e. not changing the `Solver`) or updating `Problem` hyperparameters (static variables), we can use `Callabacks`. Notice that `Callbacks` allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.\n",
|
||||
"Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.\n",
|
||||
"As you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modifying the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks)).\n",
|
||||
"\n",
|
||||
"The following are best practices when using/designing callbacks.\n",
|
||||
"## Trainer Callbacks\n",
|
||||
"\n",
|
||||
"Whenever we need to access certain steps of the training for logging, perform static modifications (i.e. not changing the `Solver`), or update `Problem` hyperparameters (static variables), we can use **Callbacks**. Notice that **Callbacks** allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.\n",
|
||||
"\n",
|
||||
"Lightning has a callback system to execute them when needed. **Callbacks** should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.\n",
|
||||
"\n",
|
||||
"The following are best practices when using/designing callbacks:\n",
|
||||
"\n",
|
||||
"* Callbacks should be isolated in their functionality.\n",
|
||||
"* Your callback should not rely on the behavior of other callbacks in order to work properly.\n",
|
||||
"* Do not manually call methods from the callback.\n",
|
||||
"* Directly calling methods (eg. on_validation_end) is strongly discouraged.\n",
|
||||
"* Directly calling methods (e.g., on_validation_end) is strongly discouraged.\n",
|
||||
"* Whenever possible, your callbacks should not depend on the order in which they are executed.\n",
|
||||
"\n",
|
||||
"We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`.\n",
|
||||
"\n",
|
||||
"<!-- Suppose we want to log the accuracy on some validation poit -->"
|
||||
"We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -398,12 +342,12 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's see the results when applyed to the `SimpleODE` problem. You can define callbacks when initializing the `Trainer` by the `callbacks` argument, which expects a list of callbacks. "
|
||||
"Let's see the results when applied to the problem. You can define **callbacks** when initializing the `Trainer` by using the `callbacks` argument, which expects a list of callbacks.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -416,24 +360,24 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 278.93it/s, v_num=0, bound_cond_loss=6.94e-5, phys_cond_loss=0.00116, train_loss=0.00123] "
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "f38442d749ad4702a0c99715ecf08c59",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Training: | | 0/? [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=1000` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 140.62it/s, v_num=0, bound_cond_loss=6.94e-5, phys_cond_loss=0.00116, train_loss=0.00123]\n"
|
||||
"`Trainer.fit` stopped: `max_epochs=10` reached.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -441,12 +385,12 @@
|
||||
"model = FeedForward(\n",
|
||||
" layers=[10, 10],\n",
|
||||
" func=torch.nn.Tanh,\n",
|
||||
" output_dimensions=len(problem.output_variables),\n",
|
||||
" input_dimensions=len(problem.input_variables),\n",
|
||||
" output_dimensions=1,\n",
|
||||
" input_dimensions=1,\n",
|
||||
")\n",
|
||||
"pinn = PINN(problem, model)\n",
|
||||
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
|
||||
"trainer = Trainer(\n",
|
||||
" solver=pinn,\n",
|
||||
" solver=solver,\n",
|
||||
" accelerator=\"cpu\",\n",
|
||||
" logger=True,\n",
|
||||
" callbacks=[NaiveMetricTracker()], # adding a callbacks\n",
|
||||
@@ -454,6 +398,7 @@
|
||||
" train_size=1.0,\n",
|
||||
" val_size=0.0,\n",
|
||||
" test_size=0.0,\n",
|
||||
" max_epochs=10, # training only for 10 epochs\n",
|
||||
")\n",
|
||||
"trainer.train()"
|
||||
]
|
||||
@@ -467,24 +412,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'bound_cond_loss': tensor(0.9935),\n",
|
||||
" 'phys_cond_loss': tensor(0.0303),\n",
|
||||
" 'train_loss': tensor(1.0239)},\n",
|
||||
" {'bound_cond_loss': tensor(0.9875),\n",
|
||||
" 'phys_cond_loss': tensor(0.0293),\n",
|
||||
" 'train_loss': tensor(1.0169)},\n",
|
||||
" {'bound_cond_loss': tensor(0.9815),\n",
|
||||
" 'phys_cond_loss': tensor(0.0284),\n",
|
||||
" 'train_loss': tensor(1.0099)}]"
|
||||
"[{'data_loss': tensor(126.2887), 'train_loss': tensor(126.2887)},\n",
|
||||
" {'data_loss': tensor(126.2346), 'train_loss': tensor(126.2346)},\n",
|
||||
" {'data_loss': tensor(126.1805), 'train_loss': tensor(126.1805)}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -497,14 +436,14 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"PyTorch Lightning also has some built in `Callbacks` which can be used in **PINA**, [here an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks). \n",
|
||||
"PyTorch Lightning also has some built-in `Callbacks` which can be used in **PINA**, [here is an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks). \n",
|
||||
"\n",
|
||||
"We can for example try the `EarlyStopping` routine, which automatically stops the training when a specific metric converged (here the `train_loss`). In order to let the training keep going forever set `max_epochs=-1`."
|
||||
"We can, for example, try the `EarlyStopping` routine, which automatically stops the training when a specific metric converges (here the `train_loss`). In order to let the training keep going forever, set `max_epochs=-1`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -515,25 +454,18 @@
|
||||
"TPU available: False, using: 0 TPU cores\n",
|
||||
"HPU available: False, using: 0 HPUs\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 2343: 100%|██████████| 1/1 [00:00<00:00, 64.24it/s, v_num=1, val_loss=4.79e-6, bound_cond_loss=1.15e-7, phys_cond_loss=2.33e-5, train_loss=2.34e-5] \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model = FeedForward(\n",
|
||||
" layers=[10, 10],\n",
|
||||
" func=torch.nn.Tanh,\n",
|
||||
" output_dimensions=len(problem.output_variables),\n",
|
||||
" input_dimensions=len(problem.input_variables),\n",
|
||||
" output_dimensions=1,\n",
|
||||
" input_dimensions=1,\n",
|
||||
")\n",
|
||||
"pinn = PINN(problem, model)\n",
|
||||
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
|
||||
"trainer = Trainer(\n",
|
||||
" solver=pinn,\n",
|
||||
" solver=solver,\n",
|
||||
" accelerator=\"cpu\",\n",
|
||||
" max_epochs=-1,\n",
|
||||
" enable_model_summary=False,\n",
|
||||
@@ -559,24 +491,23 @@
|
||||
"source": [
|
||||
"## Trainer Tips to Boost Accuracy, Save Memory and Speed Up Training\n",
|
||||
"\n",
|
||||
"Untill now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points by `callbacks`.\n",
|
||||
"Now, we well focus on how boost your training by saving memory and speeding it up, while mantaining the same or even better degree of accuracy!\n",
|
||||
"Until now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points via `callbacks`.\n",
|
||||
"Now, we will focus on how to boost your training by saving memory and speeding it up, while maintaining the same or even better degree of accuracy!\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"There are several built in methods developed in PyTorch Lightning which can be applied straight forward in **PINA**, here we report some:\n",
|
||||
"There are several built-in methods developed in PyTorch Lightning which can be applied straightforward in **PINA**. Here we report some:\n",
|
||||
"\n",
|
||||
"* [Stochastic Weight Averaging](https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/) to boost accuracy\n",
|
||||
"* [Gradient Clippling](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy)\n",
|
||||
"* [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption \n",
|
||||
"* [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption \n",
|
||||
"* [Gradient Clipping](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy)\n",
|
||||
"* [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption\n",
|
||||
"* [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption\n",
|
||||
"\n",
|
||||
"We will just demonstrate how to use the first two, and see the results compared to a standard training.\n",
|
||||
"We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to take the times. Let's start by training a simple model without any optimization (train for 2000 epochs)."
|
||||
"We will just demonstrate how to use the first two and see the results compared to standard training.\n",
|
||||
"We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to track the times. Let's start by training a simple model without any optimization (train for 500 epochs)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -590,25 +521,31 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 156.69it/s, v_num=2, bound_cond_loss=1.53e-6, phys_cond_loss=0.000169, train_loss=0.000171]"
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "822b8c60e73f49a486d3d702d413d6ff",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Training: | | 0/? [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=2000` reached.\n"
|
||||
"`Trainer.fit` stopped: `max_epochs=500` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 108.75it/s, v_num=2, bound_cond_loss=1.53e-6, phys_cond_loss=0.000169, train_loss=0.000171]\n",
|
||||
"Total training time 15.36648 s\n"
|
||||
"Total training time 15.49781 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -622,16 +559,16 @@
|
||||
"model = FeedForward(\n",
|
||||
" layers=[10, 10],\n",
|
||||
" func=torch.nn.Tanh,\n",
|
||||
" output_dimensions=len(problem.output_variables),\n",
|
||||
" input_dimensions=len(problem.input_variables),\n",
|
||||
" output_dimensions=1,\n",
|
||||
" input_dimensions=1,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"pinn = PINN(problem, model)\n",
|
||||
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
|
||||
"trainer = Trainer(\n",
|
||||
" solver=pinn,\n",
|
||||
" solver=solver,\n",
|
||||
" accelerator=\"cpu\",\n",
|
||||
" deterministic=True, # setting deterministic=True ensure reproducibility when a seed is imposed\n",
|
||||
" max_epochs=2000,\n",
|
||||
" max_epochs=500,\n",
|
||||
" enable_model_summary=False,\n",
|
||||
" callbacks=[Timer()],\n",
|
||||
") # adding a callbacks\n",
|
||||
@@ -643,12 +580,12 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we do the same but with StochasticWeightAveraging"
|
||||
"Now we do the same but with `StochasticWeightAveraging` enabled"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 24,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -662,39 +599,32 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 224.16it/s, v_num=3, bound_cond_loss=5.7e-6, phys_cond_loss=0.000257, train_loss=0.000263] "
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "dc5f3b47abff4facae7a60d0871f3bfe",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Training: | | 0/? [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Swapping scheduler `ConstantLR` for `SWALR`\n"
|
||||
"Swapping scheduler `ConstantLR` for `SWALR`\n",
|
||||
"`Trainer.fit` stopped: `max_epochs=500` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 261.43it/s, v_num=3, bound_cond_loss=2.58e-7, phys_cond_loss=9.4e-5, train_loss=9.43e-5] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=2000` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 145.96it/s, v_num=3, bound_cond_loss=2.58e-7, phys_cond_loss=9.4e-5, train_loss=9.43e-5]\n",
|
||||
"Total training time 17.78182 s\n"
|
||||
"Total training time 15.52474 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -707,15 +637,15 @@
|
||||
"model = FeedForward(\n",
|
||||
" layers=[10, 10],\n",
|
||||
" func=torch.nn.Tanh,\n",
|
||||
" output_dimensions=len(problem.output_variables),\n",
|
||||
" input_dimensions=len(problem.input_variables),\n",
|
||||
" output_dimensions=1,\n",
|
||||
" input_dimensions=1,\n",
|
||||
")\n",
|
||||
"pinn = PINN(problem, model)\n",
|
||||
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
|
||||
"trainer = Trainer(\n",
|
||||
" solver=pinn,\n",
|
||||
" solver=solver,\n",
|
||||
" accelerator=\"cpu\",\n",
|
||||
" deterministic=True,\n",
|
||||
" max_epochs=2000,\n",
|
||||
" max_epochs=500,\n",
|
||||
" enable_model_summary=False,\n",
|
||||
" callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],\n",
|
||||
") # adding StochasticWeightAveraging callbacks\n",
|
||||
@@ -727,16 +657,16 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As you can see, the training time does not change at all! Notice that around epoch `1600`\n",
|
||||
"As you can see, the training time does not change at all! Notice that around epoch 350\n",
|
||||
"the scheduler is switched from the defalut one `ConstantLR` to the Stochastic Weight Average Learning Rate (`SWALR`).\n",
|
||||
"This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `mean_loss` is lower when `StochasticWeightAveraging` is used.\n",
|
||||
"This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `train_loss` is lower when `StochasticWeightAveraging` is used.\n",
|
||||
"\n",
|
||||
"We will now now do the same but clippling the gradient to be relatively small."
|
||||
"We will now do the same but clippling the gradient to be relatively small."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 25,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -750,39 +680,32 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1598: 100%|██████████| 1/1 [00:00<00:00, 251.76it/s, v_num=4, bound_cond_loss=5.98e-8, phys_cond_loss=3.88e-5, train_loss=3.88e-5] "
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "d475613ad7f34fe6abd182eed8907004",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Training: | | 0/? [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Swapping scheduler `ConstantLR` for `SWALR`\n"
|
||||
"Swapping scheduler `ConstantLR` for `SWALR`\n",
|
||||
"`Trainer.fit` stopped: `max_epochs=500` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 239.11it/s, v_num=4, bound_cond_loss=0.000333, phys_cond_loss=0.000676, train_loss=0.00101] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"`Trainer.fit` stopped: `max_epochs=2000` reached.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Epoch 1999: 100%|██████████| 1/1 [00:00<00:00, 127.88it/s, v_num=4, bound_cond_loss=0.000333, phys_cond_loss=0.000676, train_loss=0.00101]\n",
|
||||
"Total training time 15.12576 s\n"
|
||||
"Total training time 15.94719 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -793,14 +716,14 @@
|
||||
"model = FeedForward(\n",
|
||||
" layers=[10, 10],\n",
|
||||
" func=torch.nn.Tanh,\n",
|
||||
" output_dimensions=len(problem.output_variables),\n",
|
||||
" input_dimensions=len(problem.input_variables),\n",
|
||||
" output_dimensions=1,\n",
|
||||
" input_dimensions=1,\n",
|
||||
")\n",
|
||||
"pinn = PINN(problem, model)\n",
|
||||
"solver = SupervisedSolver(problem, model, use_lt=False)\n",
|
||||
"trainer = Trainer(\n",
|
||||
" solver=pinn,\n",
|
||||
" solver=solver,\n",
|
||||
" accelerator=\"cpu\",\n",
|
||||
" max_epochs=2000,\n",
|
||||
" max_epochs=500,\n",
|
||||
" enable_model_summary=False,\n",
|
||||
" gradient_clip_val=0.1, # clipping the gradient\n",
|
||||
" callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],\n",
|
||||
@@ -813,17 +736,21 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see we by applying gradient clipping we were able to even obtain lower error!\n",
|
||||
"As we can see, by applying gradient clipping, we were able to achieve even lower error!\n",
|
||||
"\n",
|
||||
"## What's next?\n",
|
||||
"## What's Next?\n",
|
||||
"\n",
|
||||
"Now you know how to use efficiently the `Trainer` class **PINA**! There are multiple directions you can go now:\n",
|
||||
"Now you know how to use the `Trainer` class efficiently in **PINA**! There are several directions you can explore next:\n",
|
||||
"\n",
|
||||
"1. Explore training times on different devices (e.g.) `TPU` \n",
|
||||
"1. **Explore Training on Different Devices**: Test training times on various devices (e.g., `TPU`) to compare performance.\n",
|
||||
"\n",
|
||||
"2. Try to reduce memory cost by mixed precision training and gradient accumulation (especially useful when training Neural Operators)\n",
|
||||
"2. **Reduce Memory Costs**: Experiment with mixed precision training and gradient accumulation to optimize memory usage, especially when training Neural Operators.\n",
|
||||
"\n",
|
||||
"3. Benchmark `Trainer` speed for different precisions."
|
||||
"3. **Benchmark `Trainer` Speed**: Benchmark the training speed of the `Trainer` class for different precisions to identify potential optimizations.\n",
|
||||
"\n",
|
||||
"4. **...and many more!**: Consider expanding to **multi-GPU** setups or other advanced configurations for large-scale training.\n",
|
||||
"\n",
|
||||
"For more resources and tutorials, check out the [PINA Documentation](https://mathlab.github.io/PINA/).\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
388
tutorials/tutorial11/tutorial.py
vendored
388
tutorials/tutorial11/tutorial.py
vendored
@@ -1,388 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
# # Tutorial: PINA and PyTorch Lightning, training tips and visualizations
|
||||
#
|
||||
# [](https://colab.research.google.com/github/mathLab/PINA/blob/master/tutorials/tutorial11/tutorial.ipynb)
|
||||
#
|
||||
# In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers).
|
||||
#
|
||||
# The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more thanks to the amazing job done by the PyTorch Lightning team!
|
||||
#
|
||||
# Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial.
|
||||
#
|
||||
# Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver.
|
||||
|
||||
# In[ ]:
|
||||
|
||||
|
||||
try:
|
||||
import google.colab
|
||||
|
||||
IN_COLAB = True
|
||||
except:
|
||||
IN_COLAB = False
|
||||
if IN_COLAB:
|
||||
get_ipython().system('pip install "pina-mathlab"')
|
||||
|
||||
import torch
|
||||
import warnings
|
||||
|
||||
from pina import Condition, Trainer
|
||||
from pina.solver import PINN
|
||||
from pina.model import FeedForward
|
||||
from pina.problem import SpatialProblem
|
||||
from pina.operator import grad
|
||||
from pina.domain import CartesianDomain
|
||||
from pina.equation import Equation, FixedValue
|
||||
|
||||
warnings.filterwarnings("ignore")
|
||||
|
||||
|
||||
# Define problem and solver.
|
||||
|
||||
# In[2]:
|
||||
|
||||
|
||||
# defining the ode equation
|
||||
def ode_equation(input_, output_):
|
||||
|
||||
# computing the derivative
|
||||
u_x = grad(output_, input_, components=["u"], d=["x"])
|
||||
|
||||
# extracting the u input variable
|
||||
u = output_.extract(["u"])
|
||||
|
||||
# calculate the residual and return it
|
||||
return u_x - u
|
||||
|
||||
|
||||
class SimpleODE(SpatialProblem):
|
||||
|
||||
output_variables = ["u"]
|
||||
spatial_domain = CartesianDomain({"x": [0, 1]})
|
||||
|
||||
domains = {
|
||||
"x0": CartesianDomain({"x": 0.0}),
|
||||
"D": CartesianDomain({"x": [0, 1]}),
|
||||
}
|
||||
|
||||
# conditions to hold
|
||||
conditions = {
|
||||
"bound_cond": Condition(domain="x0", equation=FixedValue(1.0)),
|
||||
"phys_cond": Condition(domain="D", equation=Equation(ode_equation)),
|
||||
}
|
||||
|
||||
# defining the true solution
|
||||
def solution(self, pts):
|
||||
return torch.exp(pts.extract(["x"]))
|
||||
|
||||
|
||||
# sampling for training
|
||||
problem = SimpleODE()
|
||||
problem.discretise_domain(1, "random", domains=["x0"])
|
||||
problem.discretise_domain(20, "lh", domains=["D"])
|
||||
|
||||
# build the model
|
||||
model = FeedForward(
|
||||
layers=[10, 10],
|
||||
func=torch.nn.Tanh,
|
||||
output_dimensions=len(problem.output_variables),
|
||||
input_dimensions=len(problem.input_variables),
|
||||
)
|
||||
|
||||
# create the PINN object
|
||||
pinn = PINN(problem, model)
|
||||
|
||||
|
||||
# Till now we just followed the extact step of the previous tutorials. The `Trainer` object
|
||||
# can be initialized by simiply passing the `PINN` solver
|
||||
|
||||
# In[3]:
|
||||
|
||||
|
||||
trainer = Trainer(solver=pinn)
|
||||
|
||||
|
||||
# ## Trainer Accelerator
|
||||
#
|
||||
# When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow:
|
||||
# 1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)
|
||||
# 2. [IPU](https://www.graphcore.ai/products/ipu)
|
||||
# 3. [HPU](https://habana.ai/)
|
||||
# 4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)
|
||||
# 5. CPU
|
||||
|
||||
# For setting manually the `accelerator` run:
|
||||
#
|
||||
# * `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one
|
||||
|
||||
# In[4]:
|
||||
|
||||
|
||||
trainer = Trainer(solver=pinn, accelerator="cpu")
|
||||
|
||||
|
||||
# as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`.
|
||||
|
||||
# ## Trainer Logging
|
||||
#
|
||||
# In **PINA** you can log metrics in different ways. The simplest approach is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.
|
||||
#
|
||||
# However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one).
|
||||
#
|
||||
# We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed.
|
||||
#
|
||||
|
||||
# In[5]:
|
||||
|
||||
|
||||
from lightning.pytorch.loggers import TensorBoardLogger
|
||||
|
||||
# three run of training, by default it trains for 1000 epochs
|
||||
# we reinitialize the model each time otherwise the same parameters will be optimized
|
||||
for _ in range(3):
|
||||
model = FeedForward(
|
||||
layers=[10, 10],
|
||||
func=torch.nn.Tanh,
|
||||
output_dimensions=len(problem.output_variables),
|
||||
input_dimensions=len(problem.input_variables),
|
||||
)
|
||||
pinn = PINN(problem, model)
|
||||
trainer = Trainer(
|
||||
solver=pinn,
|
||||
accelerator="cpu",
|
||||
logger=TensorBoardLogger(save_dir="training_log"),
|
||||
enable_model_summary=False,
|
||||
train_size=1.0,
|
||||
val_size=0.0,
|
||||
test_size=0.0,
|
||||
)
|
||||
trainer.train()
|
||||
|
||||
|
||||
# We can now visualize the logs by simply running `tensorboard --logdir=training_log/` on terminal, you should obtain a webpage as the one shown below:
|
||||
|
||||
# <p align=\"center\">
|
||||
# <img src="logging.png" alt=\"Logging API\" width=\"400\"/>
|
||||
# </p>
|
||||
|
||||
# as you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modify the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks)).
|
||||
|
||||
# ## Trainer Callbacks
|
||||
|
||||
# Whenever we need to access certain steps of the training for logging, do static modifications (i.e. not changing the `Solver`) or updating `Problem` hyperparameters (static variables), we can use `Callabacks`. Notice that `Callbacks` allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.
|
||||
# Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.
|
||||
#
|
||||
# The following are best practices when using/designing callbacks.
|
||||
#
|
||||
# * Callbacks should be isolated in their functionality.
|
||||
# * Your callback should not rely on the behavior of other callbacks in order to work properly.
|
||||
# * Do not manually call methods from the callback.
|
||||
# * Directly calling methods (eg. on_validation_end) is strongly discouraged.
|
||||
# * Whenever possible, your callbacks should not depend on the order in which they are executed.
|
||||
#
|
||||
# We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks in `pina.callbacks`.
|
||||
#
|
||||
# <!-- Suppose we want to log the accuracy on some validation poit -->
|
||||
|
||||
# In[6]:
|
||||
|
||||
|
||||
from lightning.pytorch.callbacks import Callback
|
||||
from lightning.pytorch.callbacks import EarlyStopping
|
||||
import torch
|
||||
|
||||
|
||||
# define a simple callback
|
||||
class NaiveMetricTracker(Callback):
|
||||
def __init__(self):
|
||||
self.saved_metrics = []
|
||||
|
||||
def on_train_epoch_end(
|
||||
self, trainer, __
|
||||
): # function called at the end of each epoch
|
||||
self.saved_metrics.append(
|
||||
{key: value for key, value in trainer.logged_metrics.items()}
|
||||
)
|
||||
|
||||
|
||||
# Let's see the results when applyed to the `SimpleODE` problem. You can define callbacks when initializing the `Trainer` by the `callbacks` argument, which expects a list of callbacks.
|
||||
|
||||
# In[7]:
|
||||
|
||||
|
||||
model = FeedForward(
|
||||
layers=[10, 10],
|
||||
func=torch.nn.Tanh,
|
||||
output_dimensions=len(problem.output_variables),
|
||||
input_dimensions=len(problem.input_variables),
|
||||
)
|
||||
pinn = PINN(problem, model)
|
||||
trainer = Trainer(
|
||||
solver=pinn,
|
||||
accelerator="cpu",
|
||||
logger=True,
|
||||
callbacks=[NaiveMetricTracker()], # adding a callbacks
|
||||
enable_model_summary=False,
|
||||
train_size=1.0,
|
||||
val_size=0.0,
|
||||
test_size=0.0,
|
||||
)
|
||||
trainer.train()
|
||||
|
||||
|
||||
# We can easily access the data by calling `trainer.callbacks[0].saved_metrics` (notice the zero representing the first callback in the list given at initialization).
|
||||
|
||||
# In[8]:
|
||||
|
||||
|
||||
trainer.callbacks[0].saved_metrics[:3] # only the first three epochs
|
||||
|
||||
|
||||
# PyTorch Lightning also has some built in `Callbacks` which can be used in **PINA**, [here an extensive list](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html#built-in-callbacks).
|
||||
#
|
||||
# We can for example try the `EarlyStopping` routine, which automatically stops the training when a specific metric converged (here the `train_loss`). In order to let the training keep going forever set `max_epochs=-1`.
|
||||
|
||||
# In[ ]:
|
||||
|
||||
|
||||
model = FeedForward(
|
||||
layers=[10, 10],
|
||||
func=torch.nn.Tanh,
|
||||
output_dimensions=len(problem.output_variables),
|
||||
input_dimensions=len(problem.input_variables),
|
||||
)
|
||||
pinn = PINN(problem, model)
|
||||
trainer = Trainer(
|
||||
solver=pinn,
|
||||
accelerator="cpu",
|
||||
max_epochs=-1,
|
||||
enable_model_summary=False,
|
||||
enable_progress_bar=False,
|
||||
val_size=0.2,
|
||||
train_size=0.8,
|
||||
test_size=0.0,
|
||||
callbacks=[EarlyStopping("val_loss")],
|
||||
) # adding a callbacks
|
||||
trainer.train()
|
||||
|
||||
|
||||
# As we can see the model automatically stop when the logging metric stopped improving!
|
||||
|
||||
# ## Trainer Tips to Boost Accuracy, Save Memory and Speed Up Training
|
||||
#
|
||||
# Untill now we have seen how to choose the right `accelerator`, how to log and visualize the results, and how to interface with the program in order to add specific parts of code at specific points by `callbacks`.
|
||||
# Now, we well focus on how boost your training by saving memory and speeding it up, while mantaining the same or even better degree of accuracy!
|
||||
#
|
||||
#
|
||||
# There are several built in methods developed in PyTorch Lightning which can be applied straight forward in **PINA**, here we report some:
|
||||
#
|
||||
# * [Stochastic Weight Averaging](https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/) to boost accuracy
|
||||
# * [Gradient Clippling](https://deepgram.com/ai-glossary/gradient-clipping) to reduce computational time (and improve accuracy)
|
||||
# * [Gradient Accumulation](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption
|
||||
# * [Mixed Precision Training](https://lightning.ai/docs/pytorch/stable/common/optimization.html#id3) to save memory consumption
|
||||
#
|
||||
# We will just demonstrate how to use the first two, and see the results compared to a standard training.
|
||||
# We use the [`Timer`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.Timer.html#lightning.pytorch.callbacks.Timer) callback from `pytorch_lightning.callbacks` to take the times. Let's start by training a simple model without any optimization (train for 2000 epochs).
|
||||
|
||||
# In[10]:
|
||||
|
||||
|
||||
from lightning.pytorch.callbacks import Timer
|
||||
from lightning.pytorch import seed_everything
|
||||
|
||||
# setting the seed for reproducibility
|
||||
seed_everything(42, workers=True)
|
||||
|
||||
model = FeedForward(
|
||||
layers=[10, 10],
|
||||
func=torch.nn.Tanh,
|
||||
output_dimensions=len(problem.output_variables),
|
||||
input_dimensions=len(problem.input_variables),
|
||||
)
|
||||
|
||||
pinn = PINN(problem, model)
|
||||
trainer = Trainer(
|
||||
solver=pinn,
|
||||
accelerator="cpu",
|
||||
deterministic=True, # setting deterministic=True ensure reproducibility when a seed is imposed
|
||||
max_epochs=2000,
|
||||
enable_model_summary=False,
|
||||
callbacks=[Timer()],
|
||||
) # adding a callbacks
|
||||
trainer.train()
|
||||
print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s')
|
||||
|
||||
|
||||
# Now we do the same but with StochasticWeightAveraging
|
||||
|
||||
# In[11]:
|
||||
|
||||
|
||||
from lightning.pytorch.callbacks import StochasticWeightAveraging
|
||||
|
||||
# setting the seed for reproducibility
|
||||
seed_everything(42, workers=True)
|
||||
|
||||
model = FeedForward(
|
||||
layers=[10, 10],
|
||||
func=torch.nn.Tanh,
|
||||
output_dimensions=len(problem.output_variables),
|
||||
input_dimensions=len(problem.input_variables),
|
||||
)
|
||||
pinn = PINN(problem, model)
|
||||
trainer = Trainer(
|
||||
solver=pinn,
|
||||
accelerator="cpu",
|
||||
deterministic=True,
|
||||
max_epochs=2000,
|
||||
enable_model_summary=False,
|
||||
callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],
|
||||
) # adding StochasticWeightAveraging callbacks
|
||||
trainer.train()
|
||||
print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s')
|
||||
|
||||
|
||||
# As you can see, the training time does not change at all! Notice that around epoch `1600`
|
||||
# the scheduler is switched from the defalut one `ConstantLR` to the Stochastic Weight Average Learning Rate (`SWALR`).
|
||||
# This is because by default `StochasticWeightAveraging` will be activated after `int(swa_epoch_start * max_epochs)` with `swa_epoch_start=0.7` by default. Finally, the final `mean_loss` is lower when `StochasticWeightAveraging` is used.
|
||||
#
|
||||
# We will now now do the same but clippling the gradient to be relatively small.
|
||||
|
||||
# In[12]:
|
||||
|
||||
|
||||
# setting the seed for reproducibility
|
||||
seed_everything(42, workers=True)
|
||||
|
||||
model = FeedForward(
|
||||
layers=[10, 10],
|
||||
func=torch.nn.Tanh,
|
||||
output_dimensions=len(problem.output_variables),
|
||||
input_dimensions=len(problem.input_variables),
|
||||
)
|
||||
pinn = PINN(problem, model)
|
||||
trainer = Trainer(
|
||||
solver=pinn,
|
||||
accelerator="cpu",
|
||||
max_epochs=2000,
|
||||
enable_model_summary=False,
|
||||
gradient_clip_val=0.1, # clipping the gradient
|
||||
callbacks=[Timer(), StochasticWeightAveraging(swa_lrs=0.005)],
|
||||
)
|
||||
trainer.train()
|
||||
print(f'Total training time {trainer.callbacks[0].time_elapsed("train"):.5f} s')
|
||||
|
||||
|
||||
# As we can see we by applying gradient clipping we were able to even obtain lower error!
|
||||
#
|
||||
# ## What's next?
|
||||
#
|
||||
# Now you know how to use efficiently the `Trainer` class **PINA**! There are multiple directions you can go now:
|
||||
#
|
||||
# 1. Explore training times on different devices (e.g.) `TPU`
|
||||
#
|
||||
# 2. Try to reduce memory cost by mixed precision training and gradient accumulation (especially useful when training Neural Operators)
|
||||
#
|
||||
# 3. Benchmark `Trainer` speed for different precisions.
|
||||
Reference in New Issue
Block a user