Solving MDPs using Value iteration
To download the notebook and other relevant files please visit the Gitlab repository:
> Application examples
Paper Title Graph-based reinforcement learning for discrete cross-section optimization of planar steel frames
Year 2022
Author(s) Kazuki Hayashi, Makoto Ohsaki
Link https://www.sciencedirect.com/science/article/pii/S1474034621002603
ML Tags
Reinforcement Learning
Value iteration
Q-learning
Topic Tags
Structural Optimization
Graph Embedding
Summary
​
This paper proposes a combined method of Graph Embedding and Reinforcement Learning to optimize the 2-dimensional cross section of steel frames. The cross sections are chosen from a given set of standard dimensions and constraints related to the structural performance of the frame are integrated into the problem. The cross-section size of each member changes monotonically along its length to allow for realistic learning time.
​
Graph Embedding process is adjusted to extract member features, which are later transformed into action values. Q-learning is used to formulate the loss function to be minimized. Besides the large computational time needed and the complexity of the training process, the agent showed strong performance and resulted in reasonable solutions in the case of planar profiles. However, the method cannot be applied in 3D shapes or frames with irregular cross sections.
Paper Title Energy optimization associated with thermal comfort and indoor air control via a deep
reinforcement learning algorithm
Year 2019
Author(s) William Valladares, Marco Galindo, Jorge Gutiérrez, Wu-Chieh Wu, Kuo-Kai Liao, Jen-Chung Liao,
Kuang-Chin Lu, Chi-Chuan Wang
ML Tags
Reinforcement Learning
Q-learning
Double Q-learning
Topic Tags
Building performance optimization
Thermal comfort
Indoor air quality
HVAC control
Software & Plug-ins Used
​
-
EnergyPlus for energy simulation (in combination with SketchUp Make 2017 Edition and Open Studio v1.12.4)
-
BCVTB as co-simulation open-source framework
-
Python as programming interface for the DRL agent
​
Summary
​
This paper proposes a RL-based method that preserves sufficient levels of thermal comfort and air quality in indoor environments while spending minimum energy from mechanical ventilation, such as air-conditioning and ventilation fans. The RL framework is developed based on the Q-learning approach and, particularly, uses double Q-learning which can process the subsequent complex interactions.
​
Particularly, 3 main stages are defined; pre-training, training and control of actions. During the pre-training phase, some initial data are created and saved in order to be leveraged during the training. After the experience data are aggregated, the learning process and control loop are initialized. The algorithm is applied in a classroom and a laboratory case study. In both environments, the agent manages to control sufficiently the given needs, maintaining satisfactory PMV and air quality values.