rlportfolio.environment.portfolio_optimization_env module
- class PortfolioOptimizationEnv
Bases:
EnvA portfolio allocation environment for Gymnasium.
This environment simulates the interactions between an agent and the financial market based on data provided by a dataframe. The dataframe contains the time series of features defined by the user (such as closing, high and low prices) and must have a time and a tic column with a list of datetimes and ticker symbols respectively. An example of dataframe is shown below:
date high low close tic 0 2020-12-23 0.157414 0.127420 0.136394 ADA-USD 1 2020-12-23 34.381519 30.074295 31.097898 BNB-USD 2 2020-12-23 24024.490234 22802.646484 23241.345703 BTC-USD 3 2020-12-23 0.004735 0.003640 0.003768 DOGE-USD 4 2020-12-23 637.122803 560.364258 583.714600 ETH-USD ... ... ... ... ... ...
Based on this dataframe, the environment will create an observation space that can be a Dict or a Box. The Box observation space is a three-dimensional array of shape (f, n, t), where f is the number of features, n is the number of stocks in the portfolio and t is the user-defined time window. If the environment is created with the parameter return_last_action set to True, the observation space is a Dict with the following keys:
{ "state": three-dimensional Box (f, n, t) representing the time series, "last_action": one-dimensional Box (n+1,) representing the portfolio weights }
Note that the action space of this environment is an one-dimensional Box with size n + 1 because the portfolio weights must contains the weights related to all the stocks in the portfolio and to the remaining cash.
- action_space
Action space.
- observation_space
Observation space.
- episode_length
Number of timesteps of an episode.
- portfolio_size
Number of stocks in the portfolio.
- __init__(df: DataFrame, initial_amount: float, order_df: bool = True, return_last_action: bool = False, data_normalization: str | None = None, state_normalization: str | None = None, reward_scaling: float = 1, comission_fee_model: str = 'trf', comission_fee_pct: float = 0, features: list[str] = ['close', 'high', 'low'], valuation_feature: str = 'close', time_column: str = 'date', time_format: str | None = None, tic_column: str = 'tic', tics_in_portfolio: str | list[str] = 'all', time_window: int = 50, print_metrics: bool = True, plot_graphs: bool = True, cwd: str = './') PortfolioOptimizationEnv
Initializes environment’s instance.
- Parameters:
df – Dataframe with market information over a period of time.
initial_amount – Initial amount of cash available to be invested.
order_df – If True input dataframe is ordered by time.
return_last_action – If True, observations also return the last performed action. Note that, in that case, the observation space is a Dict.
data_normalization – Defines the normalization method applied to input dataframe. Possible values are “by_previous_time”, “by_COLUMN_NAME” (where COLUMN_NAME must be changed to a real column name) and a custom function. If None, no normalization is done.
state_normalization – Defines the normalization method applied to the state output during simulation. Possible values are “by_initial_value”, “by_last_value”, “by_initial_FEATURE_NAME”, “by_last_FEATURE_NAME” (where FEATURE_NAME must be change to the name of the feature used as normalizer) and a custom function. If None, no normalization is done.
reward_scaling – A scaling factor to multiply the reward function. This factor can help training.
comission_fee_model – Model used to simulate comission fee. Possible values are “trf” (for transaction remainder factor model), “trf_approx” (for a faster approximate version of “trf”) and “wvm” (for weights vector modifier model). If None, commission fees are not considered.
comission_fee_pct – Percentage to be used in comission fee. It must be a value between 0 and 1.
features – List of features to be considered in the observation space. The items of the list must be names of columns of the input dataframe.
valuation_feature – Feature to be considered in the portfolio value calculation.
time_column – Name of the dataframe’s column that contain the datetimes that index the dataframe.
time_format – Formatting string of time column (if format string is invalid, an error will be raised). If None, time column will not be transformed to datetime.
tic_column – Name of the dataframe’s column that contain ticker symbols.
tics_in_portfolio – List of ticker symbols to be considered as part of the portfolio. If “all”, all tickers of input data are considered.
time_window – Size of time window.
print_metrics – If True, performance metrics will be printed at the end of episode.
plot_graphs – If True, graphs will be ploted and saved in the specified folder.
cwd – Local repository in which resulting graphs will be saved.
- enumerate_portfolio() None
Enumerates the current porfolio by showing the ticker symbols of all the investments considered in the portfolio.
- metadata: dict[str, Any] = {'render_fps': 1, 'render_modes': ['human']}
- render(mode: str = 'human') ndarray | dict[str, ndarray]
Renders the environment.
- Returns:
Observation of current simulation step.
- reset(seed: int | None = None, options: dict[str, Any] | None = None) tuple[ndarray | dict[str, ndarray], dict[str, Any]]
Resets the environment and returns it to its initial state (the fist date of the dataframe).
- Parameters:
seed – A seeding number to configure random values.
options – A dictionary with reset options. It’s not used.
Note
If the environment was created with “return_last_action” set to True, the initial observation returned will be a Dict. If it’s set to False, the initial observation will be a Box. You can check the observation space through the attribute “observation_space”.
- Returns:
A tuple (observation, info) with the initial observation and the initial info dictionary.
- step(action: list[float] | ndarray) tuple[ndarray | dict[str, ndarray], float, bool, bool, dict[str, Any]]
Performs a simulation step.
- Parameters:
action – A unidimensional array containing portfolio weights to be considered in the simulation.
Note
If the environment was created with “return_last_action” set to True, the observation returned will be a Dict. If it’s set to False, the observation will be a Box. You can check the observation space through the attribute “observation_space”.
- Returns:
A tuple containing, respectively, a numpy array (the current simulation observation), a float number (the reward related to the last performed action), a boolean (if True, denotes that the environment is in a terminal state), another boolean (Currently, it is always False, since the simulation has no time limit) and a dictionary (informations about the last simulation step).