Statistics for What Do You Mean? Using Large Language Models for Semantic Evaluation of NL2SQL Queries