Value iteration | Digi-pedia

Solving MDPs using Value iteration

To download the notebook and other relevant files please visit the Gitlab repository:

https://gitlab.tudelft.nl/pbhustali/mdp_tutorials

> Application examples

Paper Title Graph-based reinforcement learning for discrete cross-section optimization of planar steel frames

Year 2022

Author(s) Kazuki Hayashi, Makoto Ohsaki

Link https://www.sciencedirect.com/science/article/pii/S1474034621002603

ML Tags

Reinforcement Learning

Value iteration

Q-learning

Topic Tags

Structural Optimization

Graph Embedding

Summary

This paper proposes a combined method of Graph Embedding and Reinforcement Learning to optimize the 2-dimensional cross section of steel frames. The cross sections are chosen from a given set of standard dimensions and constraints related to the structural performance of the frame are integrated into the problem. The cross-section size of each member changes monotonically along its length to allow for realistic learning time.

Graph Embedding process is adjusted to extract member features, which are later transformed into action values. Q-learning is used to formulate the loss function to be minimized. Besides the large computational time needed and the complexity of the training process, the agent showed strong performance and resulted in reasonable solutions in the case of planar profiles. However, the method cannot be applied in 3D shapes or frames with irregular cross sections.

Paper Title Energy optimization associated with thermal comfort and indoor air control via a deep

reinforcement learning algorithm

Year 2019

Author(s) William Valladares, Marco Galindo, Jorge Gutiérrez, Wu-Chieh Wu, Kuo-Kai Liao, Jen-Chung Liao,

Kuang-Chin Lu, Chi-Chuan Wang

Link https://www.sciencedirect.com/science/article/pii/S0360132319302008?casa_token=eULhiqiCwHIAAAAA:aXFZGFWg8QbQjEvYNWCVxS3E2sYVcVDTqM5ucuOYZUU3CIhiV6o_QFLj54jn8OrexlKzDt2Z

ML Tags

Reinforcement Learning

Q-learning

Double Q-learning

Topic Tags

Building performance optimization

Thermal comfort

Indoor air quality

HVAC control

Software & Plug-ins Used

EnergyPlus for energy simulation (in combination with SketchUp Make 2017 Edition and Open Studio v1.12.4)
BCVTB as co-simulation open-source framework
Python as programming interface for the DRL agent

Summary

This paper proposes a RL-based method that preserves sufficient levels of thermal comfort and air quality in indoor environments while spending minimum energy from mechanical ventilation, such as air-conditioning and ventilation fans. The RL framework is developed based on the Q-learning approach and, particularly, uses double Q-learning which can process the subsequent complex interactions.

Particularly, 3 main stages are defined; pre-training, training and control of actions. During the pre-training phase, some initial data are created and saved in order to be leveraged during the training. After the experience data are aggregated, the learning process and control loop are initialized. The algorithm is applied in a classroom and a laboratory case study. In both environments, the agent manages to control sufficiently the given needs, maintaining satisfactory PMV and air quality values.