Statistics for Querying Large Collections of Semistructured Data