Hopes: Selection
================

Roadmap
-------

- [x] Confidence Interval estimation using Bootstrap
- [x] Confidence Interval estimation using t-test

Introduction
------------

Running an Off-Policy Evaluation (OPE) experiment and then a selection of the best policies with Hopes is simple.

Example with a synthetic, random, dataset.

.. code-block:: python

    # create the behavior policy
    behavior_policy = ClassificationBasedPolicy(
        obs=obs, act=act, classification_model="logistic"
    )
    behavior_policy.fit()

    # create the target policies
    target_policy_1 = RandomPolicy(num_actions=num_actions).with_name("p1")
    target_policy_2 = RandomPolicy(num_actions=num_actions).with_name("p2")
    target_policy_3 = ClassificationBasedPolicy(
        obs=obs, act=act, classification_model="random_forest"
    ).with_name("p3")
    target_policy_3.fit()

    # initialize the estimators
    estimators = [
        InverseProbabilityWeighting(),
        SelfNormalizedInverseProbabilityWeighting(),
    ]

    # run the off-policy evaluation
    ope = OffPolicyEvaluation(
        obs=obs,
        rewards=rew,
        behavior_policy=behavior_policy,
        estimators=estimators,
        fail_fast=True,
        ci_method="t-test",
        ci_significance_level=0.1,
    )

    results = [
        ope.evaluate(target_policy)
        for target_policy in [target_policy_1, target_policy_2, target_policy_3]
    ]

    # select the top k policies based on lower bound (confidence interval +-90%)
    top_k_results = OffPolicySelection.select_top_k(results, metric="lower_bound", top_k=1)
    print(top_k_results[0])

This should produce an output similar to:

.. code-block:: python

    Policy: p2
    Confidence interval: +- 90.0%
    =====  ========  ==========  =============  =============
    ..         mean         std    lower_bound    upper_bound
    =====  ========  ==========  =============  =============
    IPW    0.510251  0.00788465       0.497324       0.522907
    SNIPW  0.499158  0.00523288       0.490235       0.507513
    =====  ========  ==========  =============  =============

Note that confidence interval (CI) calculation can be based on several methods:

- `bootstrap` (default)
- `t-test`

The documentation of the CI calculation can be found in `BaseEstimator.estimate_policy_value_with_confidence_interval`. See
implementation details for more information.

Implementation details
----------------------

.. autoclass:: hopes.ope.evaluation.OffPolicyEvaluation
    :members:
    :undoc-members:
    :show-inheritance:

.. autoclass:: hopes.ope.selection.OffPolicySelection
    :members:
    :undoc-members:
    :show-inheritance:

.. autoclass:: hopes.ope.estimators.BaseEstimator
    :no-index:
    :members: estimate_policy_value_with_confidence_interval