rlportfolio.environment.portfolio_optimization_env module

class PortfolioOptimizationEnv

Bases: Env

A portfolio allocation environment for Gymnasium.

This environment simulates the interactions between an agent and the financial market based on data provided by a dataframe. The dataframe contains the time series of features defined by the user (such as closing, high and low prices) and must have a time and a tic column with a list of datetimes and ticker symbols respectively. An example of dataframe is shown below:

    date        high            low             close           tic
 2020-12-23  0.157414        0.127420        0.136394        ADA-USD
 2020-12-23  34.381519       30.074295       31.097898       BNB-USD
 2020-12-23  24024.490234    22802.646484    23241.345703    BTC-USD
 2020-12-23  0.004735        0.003640        0.003768        DOGE-USD
 2020-12-23  637.122803      560.364258      583.714600      ETH-USD
... ...         ...             ...             ...             ...

Based on this dataframe, the environment will create an observation space that can be a Dict or a Box. The Box observation space is a three-dimensional array of shape (f, n, t), where f is the number of features, n is the number of stocks in the portfolio and t is the user-defined time window. If the environment is created with the parameter return_last_action set to True, the observation space is a Dict with the following keys:

{
"state": three-dimensional Box (f, n, t) representing the time series,
"last_action": one-dimensional Box (n+1,) representing the portfolio weights
}

Note that the action space of this environment is an one-dimensional Box with size n + 1 because the portfolio weights must contains the weights related to all the stocks in the portfolio and to the remaining cash.

action_space: Action space.

observation_space: Observation space.

episode_length: Number of timesteps of an episode.

portfolio_size: Number of stocks in the portfolio.

__init__(df: DataFrame, initial_amount: float, order_df: bool = True, return_last_action: bool = False, data_normalization: str | None = None, state_normalization: str | None = None, reward_scaling: float = 1, comission_fee_model: str = 'trf', comission_fee_pct: float = 0, features: list[str] = ['close', 'high', 'low'], valuation_feature: str = 'close', time_column: str = 'date', time_format: str | None = None, tic_column: str = 'tic', tics_in_portfolio: str | list[str] = 'all', time_window: int = 50, print_metrics: bool = True, plot_graphs: bool = True, cwd: str = './') → PortfolioOptimizationEnv

Initializes environment’s instance.

Parameters:

df – Dataframe with market information over a period of time.
initial_amount – Initial amount of cash available to be invested.
order_df – If True input dataframe is ordered by time.
return_last_action – If True, observations also return the last performed action. Note that, in that case, the observation space is a Dict.
data_normalization – Defines the normalization method applied to input dataframe. Possible values are “by_previous_time”, “by_COLUMN_NAME” (where COLUMN_NAME must be changed to a real column name) and a custom function. If None, no normalization is done.
state_normalization – Defines the normalization method applied to the state output during simulation. Possible values are “by_initial_value”, “by_last_value”, “by_initial_FEATURE_NAME”, “by_last_FEATURE_NAME” (where FEATURE_NAME must be change to the name of the feature used as normalizer) and a custom function. If None, no normalization is done.
reward_scaling – A scaling factor to multiply the reward function. This factor can help training.
comission_fee_model – Model used to simulate comission fee. Possible values are “trf” (for transaction remainder factor model), “trf_approx” (for a faster approximate version of “trf”) and “wvm” (for weights vector modifier model). If None, commission fees are not considered.
comission_fee_pct – Percentage to be used in comission fee. It must be a value between 0 and 1.
features – List of features to be considered in the observation space. The items of the list must be names of columns of the input dataframe.
valuation_feature – Feature to be considered in the portfolio value calculation.
time_column – Name of the dataframe’s column that contain the datetimes that index the dataframe.
time_format – Formatting string of time column (if format string is invalid, an error will be raised). If None, time column will not be transformed to datetime.
tic_column – Name of the dataframe’s column that contain ticker symbols.
tics_in_portfolio – List of ticker symbols to be considered as part of the portfolio. If “all”, all tickers of input data are considered.
time_window – Size of time window.
print_metrics – If True, performance metrics will be printed at the end of episode.
plot_graphs – If True, graphs will be ploted and saved in the specified folder.
cwd – Local repository in which resulting graphs will be saved.

enumerate_portfolio() → None: Enumerates the current porfolio by showing the ticker symbols of all the investments considered in the portfolio.

metadata: dict[str, Any] = {'render_fps': 1, 'render_modes': ['human']}

render(mode: str = 'human') → ndarray | dict[str, ndarray]

Renders the environment.

Returns:: Observation of current simulation step.

reset(seed: int | None = None, options: dict[str, Any] | None = None) → tuple[ndarray | dict[str, ndarray], dict[str, Any]]

Resets the environment and returns it to its initial state (the fist date of the dataframe).

Parameters:

seed – A seeding number to configure random values.
options – A dictionary with reset options. It’s not used.

Note

If the environment was created with “return_last_action” set to True, the initial observation returned will be a Dict. If it’s set to False, the initial observation will be a Box. You can check the observation space through the attribute “observation_space”.

Returns:: A tuple (observation, info) with the initial observation and the initial info dictionary.

step(action: list[float] | ndarray) → tuple[ndarray | dict[str, ndarray], float, bool, bool, dict[str, Any]]

Performs a simulation step.

Parameters:: action – A unidimensional array containing portfolio weights to be considered in the simulation.

Note

If the environment was created with “return_last_action” set to True, the observation returned will be a Dict. If it’s set to False, the observation will be a Box. You can check the observation space through the attribute “observation_space”.

Returns:: A tuple containing, respectively, a numpy array (the current simulation observation), a float number (the reward related to the last performed action), a boolean (if True, denotes that the environment is in a terminal state), another boolean (Currently, it is always False, since the simulation has no time limit) and a dictionary (informations about the last simulation step).