In distribution systems shunt capacitor banks are widely used for reactive power compensation, power and energy loss reduction and improving voltage profile. In this study by using Reinforcement Learning (RL) approach and heuristic strategies a method for reactive power optimization in distribution systems is presented. The approach is consist of determining values and locations of capacitor banks and also optimal position of tap in an Under Load Tap Changer (ULTC) transformer under voltage and current constraints for total loads curve duration. The optimization problem has to be solved in the way so that the load demand loss and systems energy loss are being minimized. By using double agent Q-Learning a new method for this problem is proposed and the results are compared to other similar researches.