Solving MDPs using Policy iteration
To download the notebook and other relevant files please visit the Gitlab repository:
> Application examples
Paper Title Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a
Differentiable MPC Policy
Year 2019
Author(s) Bingqing Chen, Zicheng Cai, Mario Bergés
ML Tags
Deep Reinforcement Learning
Policy iteration
Topic Tags
HVAC control
Model Predictive Control (MPC) Policy
Policy Gradient algorithm
Software & Plug-ins Used
​
-
EnergyPlus simulation engine to train and evaluate the agent (OpenAI Gym wrapper for EnergyPlus)
-
PyTorch for RL implementation
-
PI DataLink to access real time observations from BAS
-
Dark Sky API for predictive information for weather
​
Summary
​
The paper proposes a method (Gnu-RL) to allow for practical implementation of RL strategies for HVAC control. The method adopts a Differentiable Model Predictive Control (MPC) policy and leverages historical data from existing HVAC systems to pre-train the agent. When interacting with environment, the agent utilizes a policy gradient algorithm to keep enhancing its policy end-to-end.
​
The proposed method was implemented both to a virtual and a physical example. Gnu-RL showed improved results in both cases compared to published RL results for the same environment and data from existing controllers respectively. Lastly, probabilistic occupancy was suggested as direction for further development, since occupancy information is not usually available.
Paper Title Generative Design by Reinforcement Learning: Enhancing the Diversity of Topology Optimization
Designs
Year 2022
Author(s) Seowoo Jang, Soyoung Yoo, Namwoo Kang
ML Tags
Reinforcement Learning
Policy Iteration
Proximal Policy optimization
Variational Autoencoders
Topic Tags
Generative design
Generative Deep Learning
Topology Optimization
Data Augmentation
Software & Plug-ins Used
​
-
TopOpNet (topology optimization)
​
Summary
​
In the framework of generative design, the paper proposes a RL- based method to enhance the design diversity of topology optimization outcomes. Particularly, the problem is formulated as a sequence of defining the optimal design parameter combinations in reference to a given initial design.
​
The RL framework that is implemented is Proximal Policy optimization, while, in order to enhance its feature extracting capability and accelerate the training process, a Variational Autoencoder regularizer is also added to it. The design variation is considered into the rewarding function through pixel difference and structure dissimilarity. Comparing the latter, it is concluded that pixel difference is a more adequate rewarding metric for the given problem.