stochastic_value_gradient Implementation of Learning Continuous Control Policies by Stochastic Value Gradients