Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning