OK, RL exists end results like the protein design or Go are impressive, but does exist a RL solving the benchmark problem?