rlportfolio.environment.portfolio_optimization_env module

class PortfolioOptimizationEnv

Bases: Env

A portfolio allocation environment for Gymnasium.

This environment simulates the interactions between an agent and the financial market based on data provided by a dataframe. The dataframe contains the time series of features defined by the user (such as closing, high and low prices) and must have a time and a tic column with a list of datetimes and ticker symbols respectively. An example of dataframe is shown below:

    date        high            low             close           tic
0   2020-12-23  0.157414        0.127420        0.136394        ADA-USD
1   2020-12-23  34.381519       30.074295       31.097898       BNB-USD
2   2020-12-23  24024.490234    22802.646484    23241.345703    BTC-USD
3   2020-12-23  0.004735        0.003640        0.003768        DOGE-USD
4   2020-12-23  637.122803      560.364258      583.714600      ETH-USD
... ...         ...             ...             ...             ...

Based on this dataframe, the environment will create an observation space that can be a Dict or a Box. The Box observation space is a three-dimensional array of shape (f, n, t), where f is the number of features, n is the number of stocks in the portfolio and t is the user-defined time window. If the environment is created with the parameter return_last_action set to True, the observation space is a Dict with the following keys:

{
"state": three-dimensional Box (f, n, t) representing the time series,
"last_action": one-dimensional Box (n+1,) representing the portfolio weights
}

Note that the action space of this environment is an one-dimensional Box with size n + 1 because the portfolio weights must contains the weights related to all the stocks in the portfolio and to the remaining cash.

action_space

Action space.

observation_space

Observation space.

episode_length

Number of timesteps of an episode.

portfolio_size

Number of stocks in the portfolio.

__init__(df: DataFrame, initial_amount: float, order_df: bool = True, return_last_action: bool = False, data_normalization: str | None = None, state_normalization: str | None = None, reward_scaling: float = 1, comission_fee_model: str = 'trf', comission_fee_pct: float = 0, features: list[str] = ['close', 'high', 'low'], valuation_feature: str = 'close', time_column: str = 'date', time_format: str | None = None, tic_column: str = 'tic', tics_in_portfolio: str | list[str] = 'all', time_window: int = 50, print_metrics: bool = True, plot_graphs: bool = True, cwd: str = './') PortfolioOptimizationEnv

Initializes environment’s instance.

Parameters:
  • df – Dataframe with market information over a period of time.

  • initial_amount – Initial amount of cash available to be invested.

  • order_df – If True input dataframe is ordered by time.

  • return_last_action – If True, observations also return the last performed action. Note that, in that case, the observation space is a Dict.

  • data_normalization – Defines the normalization method applied to input dataframe. Possible values are “by_previous_time”, “by_COLUMN_NAME” (where COLUMN_NAME must be changed to a real column name) and a custom function. If None, no normalization is done.

  • state_normalization – Defines the normalization method applied to the state output during simulation. Possible values are “by_initial_value”, “by_last_value”, “by_initial_FEATURE_NAME”, “by_last_FEATURE_NAME” (where FEATURE_NAME must be change to the name of the feature used as normalizer) and a custom function. If None, no normalization is done.

  • reward_scaling – A scaling factor to multiply the reward function. This factor can help training.

  • comission_fee_model – Model used to simulate comission fee. Possible values are “trf” (for transaction remainder factor model), “trf_approx” (for a faster approximate version of “trf”) and “wvm” (for weights vector modifier model). If None, commission fees are not considered.

  • comission_fee_pct – Percentage to be used in comission fee. It must be a value between 0 and 1.

  • features – List of features to be considered in the observation space. The items of the list must be names of columns of the input dataframe.

  • valuation_feature – Feature to be considered in the portfolio value calculation.

  • time_column – Name of the dataframe’s column that contain the datetimes that index the dataframe.

  • time_format – Formatting string of time column (if format string is invalid, an error will be raised). If None, time column will not be transformed to datetime.

  • tic_column – Name of the dataframe’s column that contain ticker symbols.

  • tics_in_portfolio – List of ticker symbols to be considered as part of the portfolio. If “all”, all tickers of input data are considered.

  • time_window – Size of time window.

  • print_metrics – If True, performance metrics will be printed at the end of episode.

  • plot_graphs – If True, graphs will be ploted and saved in the specified folder.

  • cwd – Local repository in which resulting graphs will be saved.

enumerate_portfolio() None

Enumerates the current porfolio by showing the ticker symbols of all the investments considered in the portfolio.

metadata: dict[str, Any] = {'render_fps': 1, 'render_modes': ['human']}
render(mode: str = 'human') ndarray | dict[str, ndarray]

Renders the environment.

Returns:

Observation of current simulation step.

reset(seed: int | None = None, options: dict[str, Any] | None = None) tuple[ndarray | dict[str, ndarray], dict[str, Any]]

Resets the environment and returns it to its initial state (the fist date of the dataframe).

Parameters:
  • seed – A seeding number to configure random values.

  • options – A dictionary with reset options. It’s not used.

Note

If the environment was created with “return_last_action” set to True, the initial observation returned will be a Dict. If it’s set to False, the initial observation will be a Box. You can check the observation space through the attribute “observation_space”.

Returns:

A tuple (observation, info) with the initial observation and the initial info dictionary.

step(action: list[float] | ndarray) tuple[ndarray | dict[str, ndarray], float, bool, bool, dict[str, Any]]

Performs a simulation step.

Parameters:

action – A unidimensional array containing portfolio weights to be considered in the simulation.

Note

If the environment was created with “return_last_action” set to True, the observation returned will be a Dict. If it’s set to False, the observation will be a Box. You can check the observation space through the attribute “observation_space”.

Returns:

A tuple containing, respectively, a numpy array (the current simulation observation), a float number (the reward related to the last performed action), a boolean (if True, denotes that the environment is in a terminal state), another boolean (Currently, it is always False, since the simulation has no time limit) and a dictionary (informations about the last simulation step).