Description:
Traditionally, the operation of the battery is optimised using 24h of forecasted data of load demand and renewable energy sources generation using offline optimisation techniques, where the battery actions (charge/discharge/idle) are determined before the start of the day. Reinforcement Learning (RL), a machine learning algorithm, has recently been suggested as an alternative to these traditional techniques due to its ability to learn optimal policy online using real data. Two approaches of RL have been suggested in the literature namely offline and online. In offline RL, the agent learns the optimum policy using predicted generation and load data. Once convergence is achieved, battery commands are dispatched in real time. This method is similar to traditional methods because it relies on forecasted data. In online RL, on the other hand, the agent learns the optimum policy by interacting with the system in real time using real data. However, the effectiveness of the online RL algorithm as compared to that of the offline approach when the prediction error increases has never been investigated before. The use of a year's worth of data in this thesis is another novel aspect of this thesis comparing offline versus online RL.
Motivated by this shortcoming in the existing literature, this thesis provides a comprehensive comparison between the two approaches. The result shows when the error is in between 0%-1.5% the offline perform 5% better in terms of cost with respect to online RL. In contrast, when the difference between real and predicted data is greater than 1.6%, the online RL produces better results in terms of cost savings up to a minimum of 3%. While the online RL produces better results, it takes relatively long time to converge during which the performance is suboptimal. Therefore, online RL deals better with the forecasted error but can take longer time to converge. This challenge is also addressed by suggesting and implementing a novel dual layer Q-learning strategy. This not only decreases the overall operating cost of the microgrid as compared to online RL but also reduces the convergence time. Last but not the least, this thesis also performs sensitivity analysis investigating different discrete system spaces such as actions space, state space and hyperparameters selection during control of a battery in a grid connected microgrid. As a result of tuning, the best optimal values of hyper parameters are 0.85, 0.9, and 0.8, respectively. This thesis found that higher discretization levels (e.g. 8 vs. 5) take longer to converge but save more money in the long run, about 1% a day on average.