Statistics for Uncovering the Reliability and Consistency of AI Language Models: A Systematic Study