top of page

Solving MDPs using Policy iteration

To download the notebook and other relevant files please visit the Gitlab repository:

 

https://gitlab.tudelft.nl/pbhustali/mdp_tutorials

> Application examples

Paper Title    Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a 

                       Differentiable MPC Policy  

Year              2019

Author(s)      Bingqing Chen, Zicheng Cai, Mario Bergés  

Link              https://dl.acm.org/doi/pdf/10.1145/3360322.3360849

ML Tags

Deep Reinforcement Learning

Policy iteration

Topic Tags

HVAC control

Model Predictive Control (MPC) Policy

Policy Gradient algorithm

Software & Plug-ins Used 

​

  • EnergyPlus simulation engine to train and evaluate the agent (OpenAI Gym wrapper for EnergyPlus)

  • PyTorch for RL implementation

  • PI DataLink to access real time observations from BAS

  • Dark Sky API for predictive information for weather

​

Summary 

​

The paper proposes a method (Gnu-RL) to allow for practical implementation of RL strategies for HVAC control. The method adopts a Differentiable Model Predictive Control (MPC) policy and leverages historical data from existing HVAC systems to pre-train the agent. When interacting with environment, the agent utilizes a policy gradient algorithm to keep enhancing its policy end-to-end. 

​

The proposed method was implemented both to a virtual and a physical example. Gnu-RL showed improved results in both cases compared to published RL results for the same environment and data from existing controllers respectively. Lastly, probabilistic occupancy was suggested as direction for further development, since occupancy information is not usually available. 

Paper Title    Generative Design by Reinforcement Learning: Enhancing the Diversity of Topology Optimization

                       Designs 

Year              2022

Author(s)      Seowoo Jang, Soyoung Yoo, Namwoo Kang  

Link              https://www.sciencedirect.com/science/article/pii/S0010448522000239?casa_token=QEV-qJ8HqYsAAAAA:FwGkTyv-fpViaKE5KJWpmAahzNBecR_Liffrw-yr_CjpL4ZOKtAvi-bWllQmmUjeT-J3lBYc 

ML Tags

Reinforcement Learning

Policy Iteration

Proximal Policy optimization

Variational Autoencoders 

Topic Tags

Generative design

Generative Deep Learning

Topology Optimization

Data Augmentation

Software & Plug-ins Used 

​

  • TopOpNet (topology optimization)

​

Summary 

​

In the framework of generative design, the paper proposes a RL- based method to enhance the design diversity of topology optimization outcomes. Particularly, the problem is formulated as a sequence of defining the optimal design parameter combinations in reference to a given initial design.  

​

The RL framework that is implemented is Proximal Policy optimization, while, in order to enhance its feature extracting capability and accelerate the training process, a Variational Autoencoder regularizer is also added to it. The design variation is considered into the rewarding function through pixel difference and structure dissimilarity. Comparing the latter, it is concluded that pixel difference is a more adequate rewarding metric for the given problem. 

bottom of page