.. _SectionScoresObservations:

*****************************
Observations and Skill Scores
*****************************

Simulation quality is assessed through comparisons with regionally representative observations centered on the ARM ENA main facility and is quantified using skill scores. The skill scores monotonically increase with improved skill from 0 to 1, where 0 indicates no skill and 1 indicates perfect agreement in terms of the metric. Users can use these skill scores to ascertain general model behavior for their application of interest when sorting through the available simulations. More detailed comparisons with observations are left to users to perform based on their own research needs. 

The observations and skill scores are described below.

This section of the documentation focuses on how the skill scores and diagnostics are calculated. The skill scores and their plots are accessible via the `LASSO-ENA Bundle Browser <https://lasso-ena.arm.gov>`_. Specifics on where in the Browser to go are described in the :ref:`Bundle Browser section <SectionBundleBrowser>`.


Cloud Fraction
======================================================

The time evolution of simulated stratocumulus cloud fields within the ENA region are first assessed using multiple definitions of cloud fraction (CF). One reference set is taken from the cloud mask products derived from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on board the Meteosat satellites. A second reference for CF is taken from the standard products as available from ARM’s Total Sky Imager (TSI). In both cases, the CF is calculated as the average cloud fraction over all observations that occurred within an hour block. The horizontal domain for those averages is different for each sensor. For SEVIRI, this domain size is the 4° lat/lon box centered on the ENA main facility. For the TSI estimates, the standard ARM files output coverage for a hemispheric sky cover within a 160° FOV :cite:p:`{e.g.,}Wu2014`, which for lower cloud scenes with higher cloud fraction may represent an effective horizontal extent relative to zenith O(10 km).

Model performance for hourly CF time series over the associated model domains are quantified by a methodology that was successfully used for the evaluation of time series data in the LASSO shallow convection scenario---see Section 4.2 of :cite:t:`Gustafson2020a`. Performance is quantified using two skill scores: one characterizes the agreement of the variation/shape of the time series and the other characterizes its mean. The Taylor Skill score (Equation 4 in :cite:t:`Taylor2001`), S\ :sub:`T`, is used for the variation/shape of the distribution, as:

.. math::
    :label: EqTaylor

    S_T = \frac
      {4(1+R)}
      {\left| \left( \sigma _r + \frac{1}{\sigma _r} \right)^2 (1+R_0) \right| }

where :math:`\sigma _r` is the normalized standard deviation given by model root mean square (RMS) divided by the observed RMS, R is the correlation coefficient, and R\ :sub:`0` is the maximum correlation attainable, which we set to 1. Thus, if the correlation coefficient and normalized standard deviation are 1, the Taylor Skill is 1. When applying the Taylor Skill score to a specific variable, we add the variable name, as done for the other skill scores, e.g., S\ :sub:`T`\ (CF).


.. figure:: images/TaylorCFTSI.jpg
    :name: FigTaylorCFTSI
    :align: center
    :alt: Taylor diagram for model performance on CF estimates as compared to TSI
    :figclass: align-center

    Illustration of a Taylor diagram for CF estimates from the ARM TSI versus LASSO-ENA model performance. 


However, the Taylor Skill alone cannot characterize the time series performance because it does not include information regarding the mean. This information is included when using a skill score for the relative mean, S\ :sub:`RM`. 


To obtain this relative mean skill score, we estimate a variation of the Frequency Bias (FB) :cite:p:`Gilbert1884,Schaefer1990,Mesinger1992` as

.. math:: FB = \frac{N_H + N_{FA}}{N_H+N_M}
    :label: EqFB

where,

* :math:`N_H` = number of hits (correctly forecast events)
* :math:`N_{FA}` = number of false alarms (incorrectly forecast non-events)
* :math:`N_M` = number of misses (incorrectly forecast events)

The range for this FB estimate is typically [0, :math:`\infty`], with perfect agreement at 1. To cast this FB estimate in a form that conforms to a range [0,1], we create the Frequency Bias Skill, S\ :sub:`FB`\ , score as 

.. math::
    :label: EqFBS

    S_{FB} = \begin{cases}
      FB \text{  if } FB \leq 1\\
      \frac{1}{FB} \text{  if } FB > 1
    \end{cases}

where the most important detail for our LASSO application is how close the simulation is to 1, not the details of skill depreciation far from 1 (e.g., near infinity). 

Instead of following the standard FB formulations above, we use the ratio of the model mean divided by the observed mean in place of FB. This produces a skill score with the range [0,1] and symmetric around 1. This is designed to quantify the relative difference from 1 and will yield the same value if the model underestimates or overestimates by the same factor. This is denoted as S\ :sub:`RM`\ (CF). For example, two relative means that are different from observations by a factor of 2 on the low and high side, i.e., relative means of 0.5 and 2.0, would have the same skill score of 0.5 implying comparable performance relative to 1.

Finally, we find it useful to combine the S\ :sub:`T`\ (CF) and S\ :sub:`RM`\ (CF) scores into a single Net CF Skill Score, S\ :sub:`Net`\ (CF). It is computed using the expression:


.. math:: S_{Net}(CF) = \left(S_{T}(CF) * S_{RM}(CF) \right)^{1/2}
    :label: EqNetTb
 

.. figure:: images/NetskillCFTSI.jpg
    :name: FigNetskillCFTSI
    :align: center
    :alt: Net Skill Score diagram for CF estimates from the TSI versus model performance
    :figclass: align-center

    Illustration of a Net Skill Score diagram for CF estimates from the ARM TSI versus LASSO-ENA model performance. 


Liquid Water Path
===============================================

The performance of LES is also evaluated against hourly averages of estimates for the Liquid Water Path (LWP). This LWP estimate is primarily drawn from the Tropospheric Remotely Observed Profiling via Optimal Estimation approach :cite:p:`{TROPoe,}Turner2019,Turner2014`. These ARM value-added products (VAPs) were available at the ENA site starting in 2016. This VAP uses atmospheric emitted radiance interferometer and MWR observations simultaneously and assume single-layer clouds. For select scenario cases in 2015 and in the event of missing TROPoe products, the LWP retrievals are taken from the ARM Microwave Radiometers :cite:p:`{MWR,}Cadeddu2021,Morris2019`. The MWR retrievals are from the “MWRRETv2” VAP :cite:p:`{e.g.,}Turner2007`. Overall, LWP estimates obtained from ARM products are typically reported to carry uncertainties O(10 g m\ :sup:`-2`\ ), though retrievals may overestimate LWP in the presence of larger drizzle hydrometeors.

The model performance skill for hourly LWP estimates follow the Taylor Skill and Net Skill scores described above.