Policy iteration | Digi-pedia

Solving MDPs using Policy iteration

To download the notebook and other relevant files please visit the Gitlab repository:

https://gitlab.tudelft.nl/pbhustali/mdp_tutorials

> Application examples

Paper Title Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a

Differentiable MPC Policy

Year 2019

Author(s) Bingqing Chen, Zicheng Cai, Mario Bergés

Link https://dl.acm.org/doi/pdf/10.1145/3360322.3360849

ML Tags

Deep Reinforcement Learning

Policy iteration

Topic Tags

HVAC control

Model Predictive Control (MPC) Policy

Policy Gradient algorithm

Software & Plug-ins Used

EnergyPlus simulation engine to train and evaluate the agent (OpenAI Gym wrapper for EnergyPlus)
PyTorch for RL implementation
PI DataLink to access real time observations from BAS
Dark Sky API for predictive information for weather

Summary

The paper proposes a method (Gnu-RL) to allow for practical implementation of RL strategies for HVAC control. The method adopts a Differentiable Model Predictive Control (MPC) policy and leverages historical data from existing HVAC systems to pre-train the agent. When interacting with environment, the agent utilizes a policy gradient algorithm to keep enhancing its policy end-to-end.

The proposed method was implemented both to a virtual and a physical example. Gnu-RL showed improved results in both cases compared to published RL results for the same environment and data from existing controllers respectively. Lastly, probabilistic occupancy was suggested as direction for further development, since occupancy information is not usually available.

Paper Title Generative Design by Reinforcement Learning: Enhancing the Diversity of Topology Optimization

Designs

Year 2022

Author(s) Seowoo Jang, Soyoung Yoo, Namwoo Kang

Link https://www.sciencedirect.com/science/article/pii/S0010448522000239?casa_token=QEV-qJ8HqYsAAAAA:FwGkTyv-fpViaKE5KJWpmAahzNBecR_Liffrw-yr_CjpL4ZOKtAvi-bWllQmmUjeT-J3lBYc

ML Tags

Reinforcement Learning

Policy Iteration

Proximal Policy optimization

Variational Autoencoders

Topic Tags

Generative design

Generative Deep Learning

Topology Optimization

Data Augmentation

Software & Plug-ins Used

TopOpNet (topology optimization)

Summary

In the framework of generative design, the paper proposes a RL- based method to enhance the design diversity of topology optimization outcomes. Particularly, the problem is formulated as a sequence of defining the optimal design parameter combinations in reference to a given initial design.

The RL framework that is implemented is Proximal Policy optimization, while, in order to enhance its feature extracting capability and accelerate the training process, a Variational Autoencoder regularizer is also added to it. The design variation is considered into the rewarding function through pixel difference and structure dissimilarity. Comparing the latter, it is concluded that pixel difference is a more adequate rewarding metric for the given problem.