A reinforcement learning agent has been developed to determine optimal operating policies in a multi-part serial line. The agent interacts with a discrete event simulation model of a stochastic production facility. This study identifies issues important to the simulation developer who wishes to optimise a complex simulation or develop a robust operating policy. Critical parameters pertinent to 'tuning' an agent quickly and enabling it to rapidly learn the system were investigated.