Borges 2020

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Applied Soft Computing Journal 90 (2020) 106187

Contents lists available at ScienceDirect

Applied Soft Computing Journal


journal homepage: www.elsevier.com/locate/asoc

Ensemble of machine learning algorithms for cryptocurrency


investment with different data resampling methods
∗,1
Tomé Almeida Borges 1 , Rui Ferreira Neves
Instituto de Telecomunicações, Instituto Superior Técnico, Torre Norte, Av. Rovisco Pais, 1, Lisboa 1049-001, Portugal

article info a b s t r a c t

Article history: This work proposes a system based on machine learning aimed at creating an investment strategy
Received 3 July 2019 capable of trading on the cryptocurrency exchange markets. Additionally, with the goal of generating
Received in revised form 10 January 2020 investments with higher returns and lower risk, rather than investing on predictions based on time
Accepted 19 February 2020
sampled financial series, a novel method for resampling financial series was developed and employed
Available online 26 February 2020
in this work. For this purpose, the originally time sampled financial series are resampled according
Keywords: to a closing value threshold, thus creating a series prone to obtaining higher returns and lower risk
Financial markets than the original series. Out of these resampled series as well as the original, technical indicators
Cryptocurrencies are calculated and fed as inputs to four machine learning algorithms: Logistic Regression, Random
Technical analysis Forest, Support Vector Classifier, and Gradient Tree Boosting. Each of these algorithms is responsible for
Machine learning generating a transaction signal. Afterwards, a fifth transaction signal is generated by simply calculating
Ensemble classification the unweighted average of the four trading signals outputted from the previous algorithms, to improve
Financial data resampling
on their results. In the end, the investment results obtained with the resampled series are compared
to the commonly utilized fixed time interval sampling. This work demonstrates that independently
of using or not a resampling method, all learning algorithms outperform the Buy and Hold (B&H)
strategy in the overwhelming majority of the 100 markets tested. Nevertheless, out of the learning
algorithms, the unweighted average obtains the best overall results, namely accuracies up to 59.26% for
time resampled series. But most importantly, it is concluded that both alternative resampling methods
tested are capable of generating far greater returns and with lower risk relatively to time resampled
data.
© 2020 Elsevier B.V. All rights reserved.

1. Introduction successfully accomplish the task of forecasting financial time


series have driven the adoption of more sophisticated methods,
As a result of the tremendous growth and remarkably high models and simulation techniques. Lately, machine learning or
market capitalizations reached, cryptocurrencies are now emerg- data mining techniques, widely applied in forecasting financial
ing as a new financial instrument and their popularity has re- markets, have been offering improved results relatively to simple
cently skyrocketed. Despite being relatively recent, with the rise technical or fundamental analysis strategies. Machine learning
of Bitcoin, the cryptocurrency exchange market has become a methodologies are able of uncovering patterns and predict future
global phenomenon, particularly known for its volatility and di- trends in order to identify the best entry and exit points in a
versity, attracting the attention of many new and old investors financial time series with the intention of achieving the highest
[1]. returns with the lowest risk [3].
Financial time series forecasting is a challenging task as these In this work a major objective is using technical indicators as
series are characterized by non-stationarity, heteroscedasticity, input data to forecast, better than random guessing, the dichoto-
discontinuities, outliers and high-frequency multi-polynomial mous event: will a specific currency pair be bullish or otherwise
components making the prediction of market movements quite (bearish or sideways) in the next instant of a time series. To
complex [2]. The complex characteristics of financial time series achieve this goal, several supervised machine learning classifi-
and the immense volumes of data that must be analysed to cation approaches are suggested. This concept has been widely
used in diverse financial markets, such as stocks, bonds or For-
∗ Corresponding author. eign Exchange markets [4]. However, this work focuses on the
E-mail addresses: [email protected], [email protected], cryptocurrency exchange market.
[email protected] (R.F. Neves). The main contributions of this paper are: development of
1 All authors contributed equally to all sections. a framework consisting of several supervised machine learning

https://doi.org/10.1016/j.asoc.2020.106187
1568-4946/© 2020 Elsevier B.V. All rights reserved.
2 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

procedures to trade in a relatively new market, the Cryptocurren- detect patterns and predict future data. Moreover, this approach
cies Market; compare the performance of 5 different methods for provides a reaction time much faster than any human investor
forecasting trading signals amongst themselves and with a B&H could deliver [3].
strategy as baseline; lastly, the main contribution of this paper In this thesis, four total multivariate learning methods were
is the development of an innovative procedure for resampling used. Two of them, the Logistic Regression and the Support Vector
financial time series with the intention of obtaining improved Machine methods are linear and the other two, the Random Forest
results. In this paper, the financial series resampled according to a and the Decision Tree Gradient Boosting methods are non-linear. In
parameter derived from trading activity, in particular a variation the end a combined solution, an ensemble of these 4 algorithms,
percentage as well as a fixed variation, are compared in terms of is calculated.
profitability, risk and predictability to the commonly used time A brief summary of each classification model utilized in this
sampled financial series as baseline. study follows:
This paper is organized as follows: in Section 2 the funda-
mental concepts and related work are discussed; Section 3 docu- 2.1.1. Logistic Regression (LR)
ments the entire proposed system architecture; In Section 4 the A binomial logistic regression [11] is used to model a binary
case studies and results are presented and analysed; Section 5 dependent variable. In this type of learning algorithm, a single
provides the conclusions to the work developed. outcome variable Yi follows a Bernoulli probability function that
takes the outcome of interest, usually represented with the class
2. Background & State-of-the-Art label Y = 1, with probability p while the unwanted outcome,
usually represented with the class label Y = 0, has probability
In the year 2008, in a white paper called: ‘‘Bitcoin: A Peer-to- 1 − p.
Peer Electronic Cash System’’ [5], self published by the pseudony- The odds in favour of a particular outcome can be represented
mous Satoshi Nakamoto, first described Bitcoin and introduced p
as (1−p) , where p stands for the probability of the wanted out-
the concept of decentralized cryptocurrency. Bitcoin and the re- come. The logit function is capable of transforming input values
maining cryptocurrencies filled in an important niche by provid- in the range of 0 to 1, to values over the entire real number
ing a virtual currency decentralized system without any trusted range, which can be used to express a linear relationship between
parties and without pre-assumed identities among the partici- feature values and the logarithm of the odds, also known as
pants that supports user-to-user transactions. log-odds, as such [12]:
Cryptocurrencies can be purchased, sold and exchanged for ∑
other currencies at specialized currency exchanges. On the cryp- logit(P(Y = 1|x)) = β0 + βi xi , (1)
tocurrency trading market, much like the underlying principle
where the Foreign Exchange market was built on, traders are es- where P(Y = 1|x) is the conditional probability of Y belonging to
sentially exchanging a cryptocurrency for another cryptocurrency class label 1 given xi , the feature values, β0 is the intercept (point
or fiat currency [6]. The cryptocurrency market is known for the that intercepts the function in the Y axis) and βi corresponds to
large fluctuations in price, in other words, it is known for its the coefficient associated with each respective feature.
volatility [7]. Joining the log-odds and Eq. (1), the conditional probability
To study the change of rates in multiple cryptocurrency pairs, can be represented as:
a time series can be built from sampling the market at a fixed 1
time rate using historical data from a specific cryptocurrency P(Y = 1|x) = . (2)
βj xij )

1 + exp(−β0 −
exchange platform. Using a simple application programming in-
terface (API) all historic data used in this thesis was retrieved Eq. (2) is called the logistic or sigmoid function (due to its S-
from one single exchange: Binance.2 shape). From this function it can be seen that P(Y = 1|x) varies
There are several tools to analyse different markets, but the between 0 (as x approaches −∞) and 1 (as x approaches +∞).
two major categories are Fundamental and Technical analysis [8]. Thus, it is clear that the logistic function is able of transforming
These two approaches to analyse and forecast the market are any real input into the range of 0–1. As a matter of fact, the class
not exclusive, they may be applied together and attempt to de- probabilities are obtained as such.
termine the direction prices are likely to move. Since a typical In this work the optimization problem utilized to obtain the
fundamental analysis cannot be executed for the cryptocurrencies coefficients and intercept, minimizes the following cost func-
market and because technical analysis is more suited for short- tion [11]:
term trading [9], only technical analysis was employed in this n
β2 ∑
work. A more elaborate description of technical analysis and each min +C log(exp(−Yi (xTi β + β0 )) + 1), (3)
β,β0 2
technical indicator can be found in [8]. i=1

β2
2.1. Time series forecasting where 2 is the L2 regularization penalty,
∑n C is a parameter inverse
i=1 log(exp(−Yi (xi β + β0 )) +
T
to the regularization strength and
Analysing financial time series is extremely challenging due to 1) corresponds to the negative log-likelihood equation [13]. In
the dynamic, non-linear, non-stationary, noisy and chaotic nature the negative log-likelihood equation, β represents the coefficients
of any financial market [10]. This analysis is carried out as an associated with each respective feature value, xi , both are in a
attempt to find the best entry (buy) and exit (sell) points to gain vector structure. A method of finding a good bias–variance trade-
advantage over the market, increasing gains and minimizing the off for a model is by tuning its complexity via the regularization
risk. strength parameter, C .
Machine learning procedures are capable of analysing large
amounts of seemingly noisy and uncorrelated data in order to 2.1.2. Random Forest (RF)
A Random Forest [14] is a method of ensemble learning where
2 The python package ‘‘Binance official API’’ (https://github.com/binance- multiple classifiers are generated and their results are aggre-
exchange/binance-official-api-docs/) was utilized to retrieve the used historical gated. RF is an enhancement of the method bootstrap aggregating
data from Binance’s API (https://api.binance.com). (bagging) of classification trees.
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 3

Before all else, a classification or decision tree is a simple In Eq. (4), the first term, L(yi , ŷi ), can be any convex differentiable
model that takes into account the whole dataset and all available loss function that measures the difference between the predicted
features. These trees tend to have high variance and overfit on label ŷi and its respective true label yi for a given instance. In
training data, leading to a poor generalization ability on unseen this work’s proposed system the log-likelihood loss will be used
data [13]. as loss function. Through the usage of this loss function, the
In the bagging method however, multiple decision trees are calculation of probability estimates is enabled. Combining the
independently constructed using random samples drawn with principles of decision trees and logistic regression, the conditional
replacement (known as a bootstrap sample) of the dataset in probability of Y given x can be obtained [22].
order to reduce the variance. Bagging is capable of reducing The second term of Eq. (4), Ω (fk ), is used to measure the
overfitting while increasing the accuracy of unstable models [15]. complexity of a tree fk and is defined as:
Random Forests improve the variance reduction on bagging by
λ∥w∥2
reducing the correlation between trees [16]. In order to do so, an Ω (fk ) = γ T + , (5)
additional layer of randomness is added to the bagging procedure. 2
Instead of using all the available features, only a random subset of where T is the number of leaves of tree fk and w is the leaf
features is used to grow each tree of the forest. This strategy turns weights (i.e. predicted values stored at the leaf nodes). Including
out to be robust against overfitting [17]. Reducing the amount of Eq. (5) in the objective function Eq. (4) forces the optimization
features will reduce the correlation between any pair of trees in of a less complex tree, which assists in reducing overfitting.
λ∥w∥2
the ensemble, hence, the variance of the model is reduced [14]. The second term of this equation, 2 , corresponds to the L2
Let N be the number of data points in the original dataset, regularization utilized previously in LR. λ is the L2 regularization
briefly, each tree in the random forest algorithm is created as strength and γ T provides a constant penalty for each additional
follows [16]: tree leaf [23].
1. Draw a random sample of size N with replacement (hence)
2.1.4. Support Vector Machine (SVM)
from the original data (bootstrap sample);
A Support Vector Machine [24] is a classifier algorithm whose
2. Grow a random forest tree to the bootstrapped data. Until
primary objective is maximizing the margin, of a separating hy-
the minimum node size is reached, recursively repeat the
perplane (decision boundary) in an n-dimensional space, where
following steps for each terminal node of the tree:
‘n’ coincides with the number of features used. The margin cor-
(a) Select a fixed amount of variables at random from responds to the distance between the separating hyperplane and
the whole set; the training samples that are closest to this hyperplane, the
(b) Split the node using the feature that provides the support vectors.
best split according to the objective function; The hyperplane is supposed to separate the different classes,
(c) Split the node into two daughter nodes. that is, in a binary classification problem, the samples of the first
class should stay on one side of the surface and the samples of
Note that the size of the bootstrap sample is typically chosen the second class should stay on the other side.
to be equal to the number of samples in the original dataset as Decision boundaries with large margins tend to have a lower
it provides a good bias–variance trade-off [12]. Each time the generalization error of the classifier, whereas models with small
previous steps are repeated a new tree is added to the ensemble. margins are more prone to overfitting, hence, it is important
Repeating this processes multiple times outputs an ensemble to maximize the margins [12]. With the purpose of achieving
of trees with as many trees as this process was repeated. The a better generalization ability, a slack variable, ξ , indicating the
predicted class probabilities of an input sample are computed as proportional amount by which a prediction is misclassified on the
the mean predicted class probabilities of all trees in the forest. wrong size of its margin is introduced. This formulation, called
soft-margin SVM [24], enables controlling the width of the margin
2.1.3. Gradient Decision Tree Boosting (GTB) and consequently can be used to tune the bias–variance trade-off.
Gradient Decision Tree Boosting in this work is utilized With this soft-margin formulation, data points on the incorrect
through the XGBoost framework, a scalable machine learning side of the decision boundary have a penalty that increases with
system for tree boosting [18]. the distance from the margin.
Boosting is a general method for improving the accuracy of In order to maximize the margin, the hyperplane has to be
any given learning algorithm [19]. Boosting is the process of oriented as far from the support vectors as possible. Through
combining many weak classifiers3 with limited predictive ability simple vector geometry this margin is equal to ∥w 1

[16], hence
into a single more robust classifier capable of producing better maximizing this margin is equivalent to finding the minimum
predictions of a target [20]. Boosting is an ensemble method ∥w∥. In turn, minimizing ∥w∥ is equivalent to minimizing 12 ∥w∥2 ,
very resistant to overfitting that creates each individual members the L2 regularization penalty term previously utilized in LR [16].
sequentially [13]. Therefore, in order to reduce the number of misclassifications, the
Gradient boosting replaces the potentially difficult function objective function may be written as follows:
or optimization problem existent in boosting. It represents the n
learning problem as a gradient descent on some arbitrary differ- ∥w∥2 ∑
min +C ξi s.t. Yi (w · xi + b) − 1 + ξi ≥ 0 ∀i , (6)
entiable loss function in order to measure the performance of the w,b,ξ 2
i=1
model on the training set [21].
In this paper the objective function for this model is applied where ∥w∥2 is the squared norm of the normal vector w. C is the
as follows: regularization parameter, responsible for controlling the trade-
∑ ∑ off between the slack variable penalty, ξ , and the size of the
Obj = L(yi , ŷi ) + Ω (fk ). (4) margin, consequently tuning the bias–variance trade-off. In order
i k to calculate probability estimates, Platt scaling is employed [25].
It is likely that a non-linear margined SVM could provide
3 Classifier whose error rate performs only slightly better than random better results, nonetheless, in this work a linear SVM is utilized
guessing. for the simple reason that throughout the development of this
4 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

work the available computational power was limited, and Platt ended with a final portfolio value of 0.87, −1.54 sharpe ratio and
scaling by itself is already a complex procedure that results in a 38.2% maximum drawdown.
significantly slower execution. Hence, in order to avoid increasing Nakano et al. [29] utilize a seven layered artificial neural
computational complexity even further, only a linear SVM was network (ANN) to create trading signals in the Bitcoin exchange
employed. Kernel methods, though, can deal with such linearly market. Only technical indicators derived from Bitcoin/USD 15-
inseparable data by creating non-linear combinations of the orig- minute return data were used as input. The authors defined three
inal features to project them onto a higher dimensional space via strategies of generating a trading signal. Two of them entered
a mapping function [12]. long and short and long positions but one only used long posi-
tions (similarly to this work’s strategy). The strategy using only
2.1.5. Ensemble Voting (EV) long positions without considering the bid–ask spread, yielded a
The goal behind ensemble voting is to combine different clas- final ROI of 12.14% while the simple B&H strategy obtained only
sifiers into a meta-classifier with better generalization perfor- 2.28%. The two remaining strategies, as expected, performed sub-
mance than each individual classifier alone. The weaknesses of stantially better, exceeding a 50% ROI on both cases. Finally, the
one method can be balanced by the strengths of others by achiev- authors noted that increasing the amount of technical indicators
ing a systematic effect [26]. (from 2 to 5) originated better results.
In this work, a heterogeneous ensemble4 was combined in a McNally et al. [30] try to predict the price direction of Bitcoin.
linear manner: the prediction’s probability estimates from each For this, a Bayesian optimized recurrent neural network (RNN),
different individual classifier were combined according to a sim- a Long Short Term Memory (LSTM) network and ARIMA were
ple unweighted average, giving equal probability to each individ- explored and compared. The daily historical prices combined
ual output. This process, also known as soft majority voting [26], with two daily blockchain related features were the only inputs.
is the reason why all previous learning algorithms ought to yield The LSTM achieved the highest classification accuracy of 52.7%
a probability estimate for each class label, often with the cost of followed by RNN with 50.2% and ARIMA with 50.05%. The author
additional computer complexity. concluded that Deep Learning models require significant amounts
of data and 1-minute precision data would have been used if
2.2. State-of-the-art available while developing the work.
Greaves and Au [31] tried to predict whether Bitcoin’s price in-
Having described the essential background for this thesis, creased or decreased in the next hour using accuracy as the clas-
a literature review on works dedicated to forecasting financial sification metric. In this paper LR, SVM, Neural Network (NN) and
markets using technical analysis now follows. Table 1 summarizes a Baseline (percentage of the average price increase) were used.
some of the most relevant studies applied to financial market However, the distinctive aspect of this paper are the used inputs.
analysis that were considered throughout the development of this Apart from using ‘‘current Bitcoin price’’, the remaining features
paper’s approach. were all related to blockchain network features (e.g.‘‘mined bit-
De Prado in [4], rather than using, as is customary, fixed time coin in the last hour’’ or ‘‘number of transactions made by new
interval bars (minute, hour, day, week, etc.), proposes forming addresses in a given hour’’). They obtained accuracies of 53.4% for
bars as a subordinated process of trading activity. Fixed time in- baseline, SVM followed with 53.7%, LR with 54.3% and finally NN
terval bars often exhibit oversampling, which leads to penalizing were the best with 55.1%. The authors concluded that using input
low-activity periods and undersampling penalizing high-activity data from the blockchain alone offers limited predictability as
periods, as well as poor statistical properties, like serial correla- price is dictated by exchanges and these fall outside the range of
tion, heteroscedasticity and non-normality of returns. According blockchains. Finally, it is presumed that price related features ob-
to the author, alternative bars, relatively to the commonly used tained from cryptocurrency exchanges are the most informative
time bars, achieve better statistical properties, are more intuitive in regards to future price prediction. Similarly Tan and Yao [32]
particularly when the analysis involves significant price fluctu- concluded that a series with technical indicators yields better
ations and the conclusions tend to be more robust. The author results in terms of returns relatively to time series with weekly
mentions, the concept of resampled interval bars is not common data in forecasting foreign exchange rates on a weekly basis with
yet in literature. As a matter of fact, throughout literature, no a NN model.
actual experimentation was found that could validate De Prado’s Żbikowski in [33] used a set of 10 technical indicators cal-
claims. culated from Bitcoin’s historical price (with a 15-min precision)
Cardoso and Neves [27] proposed a system based on genetic as input to investigate the application of SVM with Box The-
algorithms to create an investment strategy intended to be ap- ory and Volume Weighted in forecasting price direction in the
plied on the Credit Default Swaps market. This market, similarly Bitcoin Market with the purpose of creating trading strategies.
to the cryptocurrency market is still growing, is subject to specu- A simple B&H strategy used as baseline, which obtained a ROI
lation and is quite volatile. The employed strategy utilized several of 4.86%, was outperformed by the BOX-SVM, with 10.6% ROI,
instances of genetic algorithms with the objective of increasing and the VW-SVM, with 33.5% ROI. Mallqui and Fernandes [34]
profitability. The obtained results suggest that it is possible to similarly attempt to predict the price direction of Bitcoin, but on
create a profitable strategy using only technical analysis as input a daily basis. The authors, besides the OHLC values and volume,
data, reaching commonly a return on investment (ROI) over 50% experimented adding several blockchain indicators, as well as a
in the CDS market. Jiang and Liang [28] present a model-less few ‘‘external’’ indicators (such as crude oil and gold future prices,
convolutional neural network trained using a deterministic deep S&P500 future, etc.). Several attribute selection techniques were
reinforcement method in order to manage a portfolio of cryp- employed and always considered the OHLC values and volume
tocurrency pairs. Historic data sampled every 30 min from the 12 as the most relevant attributes. Several ensemble and individual
most-volumed cryptocurrencies of a given exchange is the only learning methods were experimented in this work. Nonetheless,
input of their system. The authors obtained an increase of 16.3% the best performers were the SVM by itself and an ensemble
in their portfolio value, a sharpe ratio of 0.036 and a maximum of a Recurrent Neural Network and a Tree Classifier. The SVM
drawdown of 29.6%, while the B&H method, used as benchmark, obtained a final accuracy of 59.4% (utilizing 80% of the original
dataset dedicated to training) and the ensemble an accuracy of
4 Ensemble containing different learning techniques. 62.9% (utilizing 75% of the original dataset dedicated to training).
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 5

Akyildirim et al. [35] predict in the most liquid twelve cryp- 3.2. Data preparation module
tocurrencies utilizing data with different sampling frequencies
ranging from daily to minute level. The authors utilized the The starting point for this work’s proposed system is a collec-
methodologies SVM, LR, RF and ANN and historical price and tion of homogeneously sampled time series that will be processed
technical indicators as inputs. The objective is predicting in a in this module in order to acquire informative features. In this
binary form the price direction in the next time step. ANN per- work the several alternative bar representations attained through
data resampling are to be tested and compared to the common
formed the worse with an accuracy slightly under 55%, it was
time sampling, thus, the original time series dataset contain-
concluded that no significant gain was acquired from using ANN,
ing columns for Open time, Volume and Open, High, Low and
however, a larger sample size should be experimented with. LR
Close prices is now to be resampled. The original datasets are
obtained accuracies averaging 55%, SVM averaged slightly above
as detailed as one can get from Binance: 1 sample per minute.
56% and finally, RF obtained the best accuracies at around 59%. Contrarily to the customary time representation, the resampled
data, is intended to place more importance in high-frequency
intervals by overrepresenting the constituting individual samples,
3. Implementation
when compared to low-frequency intervals.
The reasoning behind the resampling procedures is that from
In order to validate that utilizing resampled data rather than a machine learning perspective, if both high and low-frequency
time sampled data is in fact advantageous, a single trading system intervals are equally represented, a mistaken prediction would
capable of forecasting financial movements on these two types yield an equivalent penalization for both cases. Now, assum-
of data was developed. The forecasting results obtained from ing high-frequency intervals are overrepresented relatively to
the various resampling datasets are represented and compared low-frequency intervals, if a high-frequency interval with many
against each other in Section 4. samples (all consisting of small time intervals) were to be erro-
neously predicted, a penalization for each single misclassification
would be appointed resulting in a collectively large penalization.
3.1. System’s architecture On the other hand, an erroneous prediction on a low-frequency
interval would also be penalized, but not as heavily since it con-
tains less samples to classify when compared to a high-frequency
This trading system contains 5 different methodologies for
interval. Additionally, from a financial perspective, it is more im-
forecasting used to detect the best entry and exit points in the portant to successfully forecast high-frequency periods as larger
cryptocurrency market. In order to predict these best entry and returns or losses can be obtained when compared to more stable
exit points in a financial market, the direction of price, rather than low-frequency periods.
price levels, is forecast in this work. This method has proven to This work’s resampling is intended to sample data according
be effective and profitable by many authors in literature as can be to a fixed variation threshold rather than to a fixed time period.
seen in the state-of-the-art results mentioned in Section 2. Simply To exemplify the resampling procedure, a detailed description of
put, this work attempts to solve a binary classification problem. the resampling mechanism for a fixed threshold consisting on a
It is expected beforehand that the predictions made from the fixed percentual variation now follows.
ensemble will live up to expectations by exceeding the perfor- Firstly, the percentual first order differencing of the closing
mance made from each individual learner’s predictions. In order values is calculated. Secondly, the sets of consecutive samples
to create a good ensemble, it is generally believed that the base whose individual variations aggregated together reach or exceed
learners should be as accurate and diverse as possible [26]. Thus, a pre-defined threshold of absolute percentual variation must be
identified. For this purpose, a customized cumulative sum of the
in this proposed system a set of individual learners with these
percentual first order differencing is responsible for defining the
characteristics was chosen.
boundaries of each group through assigning different numerical
LR is one of the most widely used linear statistical model in
identifiers to each sample. To do so, starting on a specific sample,
binary classification situations [12]. It is a simple and easily im- the total cumulative absolute variation of the specific sample as
plementable model that offers a reasonable performance, as was well as the consecutively posterior samples are added up until
seen in Table 1. The linear SVM and non-linear RF methods, are the variation sum reaches or exceeds the fixed threshold. This
used due to their well above average performances. Throughout occurrence dictates the starting and ending boundary for each
supervised learning literature [36] and as was seen in Table 1, group: whenever a threshold is crossed, a group ends on the
when compared to other learning methods, these two methods actual sample and a new one begins on the next sample with the
generally achieve top performances and are recommended by next numerical identifier and cumulative sum reset back to zero.
the authors. GTB is a non-linear method employed through the In the end, all samples of the original dataset with the same
extreme gradient boosting (XGBoost) framework. This method identifiers are grouped into a single sample of the final dataset. To
is an efficient and scalable implementation of gradient boosting accomplish this, the Open and Close values of the new resampled
known for winning many machine learning competitions. These data point are respectively the open value of the first entry and
algorithms were considered preferable over Artificial Neural Net- the close value of the last entry in the set of samples that make
part of the respective group. The new High value is the highest of
work related models where due to its multi-layered process,
high values out of all the entries in the group and the new Low
an idea of the relationship between inputs and outputs is not
value is the lowest. Lastly, the new Volume corresponds to the
provided [16].
sum of all volumes for each data point in the group. This process
In this system, as shown in Fig. 1 the common systematic pro- is done in a orderly manner that iterates through all samples
cedure to predict time series using machine learning algorithms (in a descendant order, where the oldest entry is at the top and
was put into practice by following these modules in this specific the most recent is at the bottom), therefore, only consecutive
order: Data module; Technical Rules module; Machine Learning samples can be grouped together. This process is illustrated in
module and Investment Simulator module. In this same figure, Fig. 2. As can be observed, whenever the value of the seventh
the input and output of each module is also represented. column (Cumulative sum w/restart when threshold is reached) is
6 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

Table 1
Summary of the most relevant works covered in the state-of-the-art.
Ref. Year Financial market Dataset time period Used methodologies Evaluation System B&H
(Data frequency) function performance performance
[27] 2017 177 credit default 1/12/2007–1/12/2016 Genetic algorithm ROI 87.84% (ROI) NA
swap markets (Daily data)
[28] 2016 12 most-volumed 27/08/2015–27/08/2016 Model-less Portfolio Value 16.3% (ROI 0.876% (ROI
cryptocurrencies (30-min data) convolutional Neural Maximization portfolio value) portfolio
exchange data Network value)
[29] 2018 Bitcoin/USD exchange 31/07/2016–24/01/2018 ANN (only w/long ROI 6.68% (ROI) 2.28% (ROI)
data (15-min data) positions
implemented)
[30] 2016 Bitcoin/USD exchange 19/08/2013–19/07/2016 Long short term Accuracy, root 52.7% (Accuracy) NA
data (Daily data) Memory network mean square error
[31] 2015 Bitcoin/USD exchange 01/02/2012–01/04/2013 SVM, LR, NN Accuracy 53.7% for SVM, NA
data (Hourly data) 54.3% for LR and
55.1% for NN
(Accuracy)
[33] 2015 Bitcoin/USD exchange 09/01/2015–02/02/2015 BOX-SVM, VW-SVM Accuracy 10.58% for 4.86% (ROI)
data (15-min data) BOX-SVM, 33.52%
for VW-SVM (ROI)
[34] 2019 Bitcoin/USD exchange 01/04/2013–01/04/2017 SVM and Ensemble Accuracy and 59.4% and 62.9% NA
data (daily data) (RNN and tree mean square error (Accuracy)
classifier)
[35] 2018 12 most liquid 10/8/2017–23/6/2018 RF (best performing Accuracy 53% (Accuracy) NA
cryptocurrencies (15-min data) model)
exchange data

Fig. 1. Simplified overview of the proposed system’s modular architecture.

equal or larger than a fixed threshold, in this case 2%, the group for any of two methods, thus, only the percentage resampling
being numbered with the identifier ‘i’ is closed, the value for the procedure is described in full detail.
cumulative sum is restarted and a new group with identifier ‘i + 1’ In spite of the original dataset being already sampled accord-
is initiated in the next sample. ing to time, due to the lack of rearrangement, it contains plenty
It is worth adding that financial markets generate a discrete more data points than each resulting resampled dataset of the
time series, making it impossible to achieve a perfectly even-sized three previous rearrangements. To make comparisons between
resampled dataset with data from any cryptocurrency market. the different resampling methods fair, a simple time grouping
Nevertheless, utilizing a finer-grained periodic (smallest possible was implemented. This type of resampling consists on simply
time period between data points) original data, is more likely to grouping consecutive data points. In the end both percentage
build a more regularly and consistently sized resampled dataset. and time resampled datasets ought to have a similar amount of
samples in order to generate valid comparisons.
Similarly to utilizing finer-grained data, if the threshold is in-
creased, uneven resampled data becomes less noticeable as differ-
3.3. Technical indicators module
ences between resampled data points become percentually more
insignificant resulting in a seemingly more consistent dataset. The
This module is responsible for computing and outputting the
values considered for the threshold should be larger than the respective technical indicators for each data sample of the resam-
payed fees, so each profitable transaction covers the fees com- pled time series outputted from the Data Module. Using techni-
pletely, but not too large, otherwise too many data points would cal analysis simplifies the problem of forecasting future market
be grouped resulting in loss of information and consequently movements, to a pattern recognition problem (also referred to
profit loss. as a classification problem), where inputs are technical indicators
To resample the data according to different parameters, simply derived from historical prices and outputs are an estimate of the
the first order difference series and a fixed threshold must be de- price or its trend [37].
fined according to the chosen resampling parameter. In this work, The remainder of this section will be dedicated to explaining
besides time resampling, two alternative resampling processes in further detail the technical indicators applied on the histor-
were tested. Nevertheless, the process of resampling is analogous ical data of cryptocurrency markets as well as the motivations
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 7

Fig. 2. Regrouping process illustration for a percentage of 2%.

for their choice. A more elaborate description of each technical • Moving Average Convergence–Divergence (MACD): The
indicator can be found in [8,38,39]. MACD is a simple momentum oscillator technique calcu-
lated using the difference between two exponential moving
• Exponential Moving Average (EMA): The EMA is used to averages. To calculate the MACD line, traditionally, a 26-
dampen the effects of short term oscillations, through a
period EMA of the price is subtracted from a 12-period EMA,
smooth line representing the successive average. In an EMA,
also of the price [8]. The MACD line is usually plotted at the
a body of data to be averaged moves forward with each new bottom of a price chart along with the signal line. The signal
trading period. Old data is dropped as new data becomes line is an EMA of the MACD, commonly a 9-period EMA
available which causes the average to move along the time is used [8]. Finally, the difference between the two former
scale. By definition, this indicator is based on past prices, lines composes the MACD histogram. In Eq. (8) the three
it is a trend following indicator unable to anticipate, only components of the MACD indicator are presented.
to react, thus, the moving average is a lagging indicator: it
is able to follow a market and announce that a trend has MACD = EMA12 − EMA26 , (8a)
begun, but only after the fact. Signal Line = EMA9 (MACD), (8b)
The major difference between a regular moving average
and an EMA, is that the latter assigns a greater weight to MACDhistogram = MACD − Signal Line. (8c)
recent data, having the ability to react faster to recent price The real value of the histogram is spotting whether the
variations. Because in this work the volatile cryptocurrency difference between the MACD and signal line is widening or
market is being forecast, the EMA is employed. Its formula narrowing. When the histogram is positive but starts to fall
is defined as following: toward the zero line, the uptrend is weakening. Conversely,
when the histogram is below negative but starts to move
( )
2
EMAp (n) = EMAp−1 + [Closep − EMAp−1 ]. (7) upward towards the zero line, the downtrend is losing its
n+1
momentum. Although no actual buy or sell signal ought
In Eq. (7), n refers to the number of time periods in the to be given until the histogram crosses its zero line, the
body of data to be averaged and p refers to the current histogram turns provide earlier warnings that the current
period. Note that the first value coincides with the closing trend is losing momentum. The actual buy and sell signals
value. In this work six different EMA signals with time are given when the MACD line and signal line cross, that
periods corresponding to n = {5, 10, 20, 50, 100, 200} were is, when the histogram is zero. A crossing by the MACD line
implemented. above the signal line can be translated into a buy signal. The
8 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

opposite would be a sell signal. Histogram turns are best n periods. SMAp corresponds to a simple moving average of
used for spotting early exit signals from existing positions. p periods.
• Relative Strength Index (RSI): The RSI is a momentum os- In addition to these 3 signals, the histogram method from
cillator that measures the speed and change of price move- the MACD indicator was replicated for the stochastic oscilla-
ments. This technical indicator is used to evaluate whether tors to indicate trend reversals (when a sample of this indi-
a market is overbought or oversold. The formula used on its cator crosses zero) as well as whether the trend is upwards
calculation is: or downwards.
• Commodity Channel Index (CCI): The CCI is an oscillator
100 Av erageGains
RSI(n) = 100 − , with RS(n) = . used to measure the variation of a price from its statistical
1 + RS(n) Av erageLosses mean. A high CCI value indicates that prices are unusually
(9) high compared to the average price, meaning it is over-
bought. Whereas a low CCI value indicates that prices are
In Eq. (9), n refers to the number of time periods be- unusually low, meaning it is oversold. Traditionally, high
ing analysed (traditionally 14 time periods are used [8]), and low CCI values respectively correspond to over 100 and
Av erageGains refers to the average gain of up periods during under −100. The formula for calculating this indicator is as
the last n periods and Av erageLosses refers to the average follows:
loss of down periods during the last n periods. The RSI
1 TPp − SMAn (TPp )
varies between a low of 0 (indicating no up periods) to a CCI(n) = × , with
high of 100 (indicating exclusively up periods). Tradition- 0.015 σn (TPp )
ally, movements above 70 are considered overbought, while Highp + Lowp + Closep
an oversold condition would be a move under 30. A RSI
TPp = , (12)
3
divergence with price is a warning of trend reversal. where TPp is referred to as the typical price and Highp ,
• Rate Of Change (ROC): The ROC is a simple momentum Lowp and Closep represent the respective prices for the time
oscillator used for measuring the percentual amount that period p. The item SMAn (TPp ) is the simple moving average
prices have changed over a given number of past periods. of the typical price for the previous n time periods under
Traditionally 10 time periods are used [8]. A high ROC value consideration, and σn (TPp ) corresponds to the mean devia-
indicates an overbought market, while a low value indi- tion of the SMA during the previous n periods. Commonly,
cates an oversold market. The formula for calculating this 14 periods are used for the SMA used in this indicator [8].
indicator is as follows: Lambert, the creator of this indicator, set the constant 0.015
Closep − Closep−n for scaling purposes, to ensure that approximately 70 to 80
ROC (n) = 100 × . (10)
Closep−n percent of CCI values would fall between −100 and +100.
• On Balance Volume (OBV): The OBV indicator is a running
In Eq. (10), n refers to the number of time periods being
total of volume. It relates volume to price change in order
analysed, and p corresponds to the current period.
to measure if volume is flowing into or out of a market,
• Stochastic Oscillator: The Stochastic Oscillator’s intent is to
assuming that volume changes precede price changes.
determine where the most recent closing price is in relation
The total volume for each day is assigned a plus or minus
to the price range of a given time period. Three lines are
sign depending on whether the price closes higher or lower
used in this indicator: the %K, the fast %D line and the slow
than the previous close. A higher close causes the volume
%D line. The %K line, the most sensitive of the three, simply
for that day to be given a plus value, while a lower close
measures percentually where the closing price is in relation
counts for negative volume. A running cumulative total is
to the total price range for a selected time period, typically
then maintained by adding or subtracting each day’s volume
of 14 periods [8]. The second line, the fast %D is a simple
based on the direction of the market close. The formula used
moving average of the %K line, usually of 3 periods [8]. The
on its calculation is:
previously mentioned %K line compared with a three-period
⎨OBVp−1 + Volumep , if Closep > Closep−1

simple moving average of itself, the fast %D line, corresponds
to the fast stochastic. For the fast stochastic, when the %K OBVp = OBVp−1 − Volumep , if Closep < Closep−1 (13)
line is above the %D line, an upward trend is indicated. ⎩
OBVp−1 , if Closep = Closep−1 .
The opposite indicates a downward trend. If the lines cross,
the trend is losing momentum and a reversal is indicated. In Eq. (13), p refers to the current period and p − 1 refers to
However, the %K line is too sensitive to price changes and the previous period. The OBV line should follow the same
due to the erratic volatility of the fast %D line, many false direction of the price trend. If prices show a series of higher
signals occur with rapidly fluctuating prices. To combat this peaks and troughs (an uptrend), the OBV line should do the
problem, the slow stochastic was created. The slow stochas- same. If prices are trending lower, so should the OBV line. It
tic consists on comparing the original fast %D line with a is when the OBV line fails to move in the same direction as
3-period simple moving average smoothed version of this prices that a divergence exists and warns of a possible trend
same line, called slow %D line [8]. In other words, the slow reversal. It is the trend of the OBV line that is relevant, not
%D line is a doubly smoothed moving average of the %K line. the actual numbers themselves.
The formulae for the %K and both %D lines are as follow: • Average True Range (ATR): The ATR indicator is used as
Closet − min(Low )n a measurement of price volatility. Strong movements, in
%Kn = 100 × ; (11a) either direction, are often accompanied by large ranges (or
max(High)n − min(Low )n large True Ranges). Weak movements, on the other hand,
Fast%D = SMAp (%Kn ); (11b) are accompanied by relatively narrow ranges. This way, ATR
can be used to validate the enthusiasm behind a move or
Slow %D = SMAp (Fast%D). (11c)
breakout. A bullish reversal with an increase in ATR would
In Eq. (11), Closet corresponds to the current close, min show strong buying pressure and reinforce the reversal.
(Low )n refers to the minimum Low of the previous n periods, A bearish support break with an increase in ATR would
and max(High)n refers to the maximum High of the previous show strong selling pressure and reinforce the breaking of
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 9

support. To calculate the ATR, the True Range must be firstly (whose financial worth is typically in the USD cents), the per-
calculated: centual difference between the best buyer and the best seller
may become large enough so that the common assumption of
TRp = max{Highp − Lowp ; |Highp − Closep−1 |;
utilizing the closing value as both bid and ask values becomes
|Lowp − Closep−1 |}. (14) quite far-fetched leading to a disparity with reality that may no
longer become acceptable. However, to the author’s knowledge,
In Eq. (14), p refers to the current period and p − 1 refers
enquiring specific exchanges or sources for the spread of the
to the previous period. Having calculated the True Range,
current instant is the only method of acquiring this data free of
the next step is calculating the Average True Range. The ATR
charge. Regardless, in order to assemble a decent amount of data,
is a simple average of the previous n (traditionally 14 time
years of data gathering would be required. In this work, due to
periods are used [8]) True Range values:
the unavailability of bid–ask spread value, only transaction fees
ATRp−1 × (p − 1) + TRp are considered. Nevertheless, it should be pointed that overlook-
ATRp (n) = . (15)
ing the importance of bid–ask spreads is not ideal and prior to
n
In Eq. (15), p refers to the current period, p − 1 refers to the applying this system in a real scenario, a database containing
previous period and n refers to the number of time periods the spread values should be built and taken into account to fully
to be analysed. validate this strategy.
The buy and sell trading fees were taken into account when
To summarize this section, Table 2 lists the used technical determining the ideal long position start and ending points. The
indicators. An additional column was added to contain the re- trading fees applied in this system are Binance’s general fee of
spective time period parameter (previously denoted as n) for all 0.1%.
indicators that possess it. For the indicators that do not require In order to decide whether the signal in the next instant has
an adjustable time period parameter, a ‘-’ is placed instead. In this one of the two possible outcomes, a deterministic binary classi-
work, the time period corresponds to the set of older instants to fication is proposed. In this proposed system, a binary outcome
be considered for each specific calculation. was taken into account: the cryptocurrency market either has a
positive or a negative variation in the next instant. In the rare
3.4. Machine learning module cases where the variation is precisely null, the outcome of the
previous instant is duplicated into the current one.
This module is the most complex and the main core of the A vector called vector y (represented in Eq. (17)) contains the
system, as so it was divided into 3 main components, each with target classification for each sample in the dataset. Each entry
a different responsibility. In this system, the resampled data follows a binomial probability distribution, i.e., y ∈ {0, 1}, where
is firstly scaled by standardization, then the target prediction 1 symbolizes a positive or bullish signal variation and 0 symbol-
vector is defined and lastly the actual machine learning training izes a negative or bearish signal variation. For a given currency
and forecasting procedures are executed. Fig. 3 contains a brief pair, being closet the closing price for a given time period t, and
illustration of this module’s main steps. closet −1 the closing price for the previous time period, t − 1, the
target for the given time period, yt , is defined in a probabilistic
3.4.1. Feature scaling way to employ the four learning algorithms as follows:
Feature scaling is a data preprocessing step, applied on the
⎨0, if Closet −1 × (1 + Fee) < Closet × (1 − Fee)

technical indicator series outputted from the previous module
through a technique called standardization. Standardization is the yt = 1, if Closet −1 × (1 + Fee) > Closet × (1 − Fee) , (17)
yt −1 , if Closet −1 × (1 + Fee) = Closet × (1 − Fee)

process of centring features by subtracting the features’ mean and
scaling by dividing the result by the features’ standard deviation.
Given that the utilized data was retrieved from Binance, to be
The standardized value, of a given sample, x, is calculated as
truly faithful to a real case scenario, only long positions can be
follows:
adopted by this system, short positions are disabled.
x−µ
xstandardized = , (16)
σ 3.4.3. Classifiers fitting and predicting
where µ is the mean of the training samples and σ is the standard This is the last component of the Machine Learning module.
deviation of the training samples. The objective of this component is, out of a dataset received as
Feature scaling is not only important if we are comparing input, creating a model with the learned data by training each of
measurements that have different units, but it is also a general the four classification algorithms and, in the end, creating a clas-
requirement for many machine learning algorithms. When the sifier ensemble with the output of each algorithm by averaging
input dataset contains features with completely different units each sample. The four individual learning algorithms can be fit in
of measurement, which is the case in this system, using feature whatever order.
scaling avoids the problem in which some features come to Before the actual training procedure begins, both the features
dominate solely because they tend to have larger values than (matrix X) and the target vector (vector y), must be split into
others [12]. train dataset and test dataset. All training procedures, including
cross-validation and grid searching, occur on the training datasets
3.4.2. Target formulation while the test datasets are kept isolated throughout the following
The objective of the second component of this module is procedures mentioned in this section. In this work, 4 splits divide
defining the target discrete values to be predicted, the class label, the time series in 5 equal sized intervals, this is, each interval
for each data point. This process is a requirement when using contains the same number of samples. It is worth noting that
methods of supervised learning. because the splits divide the number of samples, only the time
Throughout literature, the bid–ask spread is an issue that is rearranged series will be split in 5 time intervals with same
often overlooked. While its impact may be negligible on specific time duration. While the test interval size is consistent for all
cases, it may also be responsible for harming the system’s in- iterations, the train set successively increases in size with every
vestments in a real situation if unaccounted for. In less liquid iteration, it may be regarded as a superset of the train sets from
markets, specifically in the most ‘‘exotic’’ cryptocurrency pairs the previous iterations. Ultimately, when using this method, no
10 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

Table 2
List of all indicators and (if applied) their respective parameters, fed as input to the Machine learning algorithms
of this work’s system.
Technical indicator Time period value
Exponential moving average {5, 10, 20, 50, 100, 200} periods
Moving average convergence divergence histogram [signal EMA 9, fast EMA 12, slow EMA 26] periods
Relative strength index 14 periods
Rate of change 10 periods
Stochastic oscillator %K line 14 periods
Stochastic oscillator fast %D line 3 periods
Stochastic oscillator slow %D line 3 periods
Histogram between %K line & fast %D line –
Histogram between %K line & slow %D line –
Commodity channel index 14 periods
On balance volume –
Average true range 14 periods

Fig. 3. Machine learning module overview.

future data leaking occurs and forecasting since a much earlier best instance is also later used as the final model to issue the
time point is enabled. final predictions on the test dataset.
As mentioned in Section 2 each learning algorithm requires In this system, even though the grid search is not too exten-
the initialization of a set of parameters before the actual learning sive, each test dataset utilizes a set of hyper-parameters com-
process begins. These parameters are called hyper-parameters puted during its respective training procedure on the training
[13]. dataset, thus no single optimal case was used for all datasets.
In order to tune the hyper-parameters of each learning algo- Besides not compromising the partitioning of future data what-
rithm, a simple grid search was carried out during a specific cross- soever, this method was employed because there are countless
validation procedure: time series cross-validation [40], a variation cryptocurrencies with entirely different behaviours and charac-
of the commonly used k-fold cross-validation destined specifically teristics: a single model could be accurate for a specific cryptocur-
for time series data samples. In the kth split, the first k folds rency but disastrous for another.
are returned as train set and the (k + 1)th fold as validation set. This employed cross-validation is identical to the previously
Grid-searching is a simple exhaustive search through a manually explained process of splitting train and test data, the only differ-
specified subspace of values with the purpose of finding the best ences are that rather than only 4 splits, 10 splits are done and
values for each hyper-parameter [12]. During the model training instead of splitting train and test data, train and validation data
step, in each fold, various instances of each learning algorithm are is being split. This is a simple notation difference. In this work,
trained with all the possible hyper-parameter combinations and a simple grid search was carried out in this proposed system in
tested on the respective validation set. This way, the performance order to find the best hyper-parameters according to the negative
according to the negative log-loss metric is obtained on un- log-loss metric.
seen data for each hyper-parameter combination, the values that
generate on average the best negative log-losses were chosen. 3.4.4. Forecasting and ensemble voting
Table 3 contains the grid employed for each hyper-parameter of At this point the second part of supervised learning, the fore-
each learning algorithm. It is worth adding that not too many casting, commences. For each specific training dataset, all learn-
parameters were grid searched due to computational and time ing algorithms apply the fit model on the respective testing
limitations. dataset. All 4 algorithms generate a probability estimate of the
It is worth adding that the negative log-loss classification respective class labels for each sample of the test dataset. The
metric was employed in this system because investment strate- classification odds for each test sample of the 4 models, are now
gies profit from predicting the right label with high confidence: combined through soft majority voting into an ensemble. In soft
contrarily to other classification metrics, the negative log-loss majority voting, the probabilities for the prediction of each class
takes into account not only whether each predictions is correct label of the test dataset previously calculated are unweightedly
or not, but also the probability of each prediction [4]. averaged, designated as Ensemble Voting (EV).
The results for each hyper-parameter combination are av- At last, this module outputs five different trading signals gen-
eraged through all validation sets. The instance of the model erated containing the forecasting data. Four trading signals are
containing the average highest cross-validated scoring parame- originated from the individual learning algorithms while the last
ters is then refit on the whole training dataset to measure its is originated from the unweighted average of these four. All the
performance on the whole training dataset continuously. This trading signals have the same time frame, and will be simulated
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 11

Table 3
Respective hyper-parameters utilized for each learning algorithm and its value (or set of values if grid searched).
Learning algorithm Hyper-parameter Grid of values
C [0.1, 0.01, 0.001, 0.0001]
Logistic regression Penalty L2
Solver Stochastic average gradient [41]
Number of trees 400
Min. num. of samples required to be leaf node 9
Min. num. of samples required to split node 9
Random forest
Splitting criteria Gini
Minimum impurity threshold 10−7

Amount of features considered num of features
Gradient tree Number of trees 100
Max tree depth 3
Min loss reduction required to make partition 1
Boosting Minimum sum of instance weight [1, 2]
L2 regularization strength [0, 1]
Step size shrinkage 0.01
Support vector C 1
Classifier Kernel linear

in a real world environment in the next module, the investment 4. Results


simulator.
This section starts by describing the financial data, evaluation
metrics and an additional strategy utilized as comparison baseline
3.5. Investment simulator module
for this work’s system. Afterwards, the overall results obtained
are reported to evince that this system is valid and posteriorly
The investment simulator model is responsible for translating the results for each different type of resampling are presented and
the five trading signals obtained as input from predicted data analysed in form of case studies. Lastly, follows a discussion and
in the Machine Learning module, into market orders with the comparison of the results obtained with the time and alternative
purpose of simulating the forecasts’ performance in a real market resampling procedures.
environment. Regarding the utilized financial data, instead of using the
The trading signal received as input, consists on a series con- nearly 400 currently available pairs in Binance, only the 100 pairs
taining the probability of each sample being classified with the with the most traded volume in USD were selected. This selection
label 1. This series containing probabilities is easily converted filters out many pairs that have been recently listed and do not
into a dichotomous class label series according to the largest have a large enough dataset. Moreover, it becomes more likely
likelihood. After being converted into a discrete classification that the selected markets are in conformity with the two backtest
series, each entry of this series is interpreted as follows: A class trading assumptions.
label of 1 means the currency is forecast to be bullish in the In order to use the maximum amount of available data, no
next instant representing a buy or hold signal; A class label of 0 common starting date was chosen for all currency pairs. Each
currency pair’s data begins at the moment Binance started trading
means the currency is forecast to be bearish in the next instant
and recording the specific currency pair. On the other hand, the
representing a sell or out signal. Binance trading fees are taken
ending date, is fixed at 30th October 2018 at 00h00. Out of the
into consideration in the trading simulator with the purpose of
used cryptocurrency pairs, the largest pairs originally contain
reproducing a real market. When a class label 1 signals buy, all
676 827 trading periods while the smallest pairs contain 68 671
available capital is applied in opening a long position. Similarly,
trading periods. Each trading period, as previously mentioned, has
a class label 0 orders the system to exit the market and retrieve
the duration of 1 min. The starting date varies from 14th July
the available capital. 2017 (Binance’s official launch date) for the oldest pairs, up to
This module starts with a fixed capital for investment (by 12th October 2018 for the most recently introduced pair.
default it is one unit of quote currency) and invests that capital The original data for the 100 selected currency pairs, prior to
according to the previously mentioned interpretation of the class its usage in forecasting, must be resampled. As was explained in
label series. Section 3.4, to carry out the multiple resampling procedures, a
To simulate the market orders, backtest trading is employed fixed percentual variation threshold must be picked to define the
in this work. Because this backtest process utilizes historical approximate final size of each candle in the final dataset. In the
data, the following two assumptions must be imposed for this subsequent results, a fixed percentage of 10% was chosen for the
proposed system: resampling procedures.
1. Market liquidity: The markets have enough liquidity to
conclude each trade placed by the system instantaneously 4.1. Evaluation metrics
and at the current price of their placement.
2. Capital impact: The capital invested by the algorithm has As was mentioned before, the main goal of this proposed
no influence on the market as it is relatively insignificant. system is maximizing the negative logarithmic loss and returns
while minimizing the associated risk of this work’s trading strat-
Lastly, as a method to control risk on a trade-by-trade basis, in egy. With this goal in mind, the following metrics of market
this system stop-loss orders were implemented. performance were additionally calculated for each currency pair
In the end, a series of metrics, revealed and explained in with the intention of providing a better analysis of the obtained
Section 4.1, are calculated, plotted and stored into memory. results:
12 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

4.1.1. Return on Investment (ROI) of each other, hence no profit can be made from information
The return on investment measures the amount of return based trading. In conformity with this theory, the best strategy
gained or lost in an investment relative to the initially invested is employing a B&H strategy, regardless of market fluctuations.
amount. The simple standard formula of this metric is repre- Due to the limited number of existent solutions for trading in
sented as follows: multiple cryptocurrency pair markets and to put the Efficient
FinalCapital − InitialCapital markets theory to the test, B&H was defined as a benchmark
ROI = × 100%, (18) strategy intended to be an additional term of comparison for this
InitialCapital
work’s proposed system.
where FinalCapital corresponds to the capital obtained from the
investment bought with InitialCapital. 4.2. Case studies

4.1.2. Maximum Drawdown (MDD) In this section, the case studies and the main results obtained
The maximum drawdown measures the maximum decline, through the application of the described strategies are presented.
from a peak to a through before a new peak is attained. It is used
to assess the relative downside risk of a given strategy [42]. This 4.2.1. General overview
metric is calculated as follows: First of all, a general idea of the overall performance obtained
with each different methodology independently of the resam-
MDD = max [ max (ROIt ) − ROIT ], (19)
t ∈(StartDate,EndDate) t ∈(StartDate,T ) pling method utilized is provided. To achieve this, whilst utilizing
the B&H strategy as benchmark, the five trading signals generated
where ROI corresponds to the return on investment at the sub- by the four individual learning algorithms and the ensemble vot-
script’s point in time and maxt ∈(StartDate,T ) (ROIt ) corresponds to the ing method are individually averaged and subsequently compared
highest peak from the starting point until the instant T . In this against each other. The results are represented in Table 4.
work, as is customary, MDD it is quoted as a percentage of the Through analysis of Table 4, it is clear that the B&H strategy
peak value. yields the worst results. Out of all individual learning algorithms,
the SVC generates the highest ROI and is the least risky, as so,
4.1.3. Sharpe ratio it can be considered as the best individual learner. LR follows in
The Sharpe Ratio is a method for calculating the risk-adjusted terms of the metric ROI but is quite risky as a forecasting method.
return. This ratio describes the excess return received for hold- Both RF and GTB performed averagely and both yielded modest
ing a given investment with a specific risk. The sharpe ratio is ROIs. Nonetheless, EV is by far the most robust alternative. As
calculated as follows: can be seen, the trading signal obtained from EV on average
ROI − Rf outperforms the remaining ones according to the majority of
SharpeRatio = , (20)
σ metrics.
where ROI corresponds to the return on investment, Rf is the The obtained accuracies are on par or, for the case of EV,
current risk-free rate, 3.5%, and σ is the standard deviation of the exceed the accuracy of most papers regarding cryptocurrency
investment’s excess return. exchange market forecasting throughout the state-of-the-art (Ta-
ble 1). The risk measures and returns on investment are also
superior on average.
4.1.4. Sortino ratio
In conclusion, there is no clear performance hierarchy for
The Sortino Ratio is a modification of the Sharpe ratio met-
each of the 4 individual learning algorithms. Nonetheless, the
ric. In contrast to Sharpe ratio, the Sortino ratio includes only
trading signals generated by the EV methodology obtain the
negative variations. The formula for calculating this metric is
top performance out of all tested methods. In any case, all five
identical to the formula represented in Eq. (20), however, the
methodologies clearly outperform the plain B&H strategy.
denominator corresponds only to the standard deviation values
observed during periods of negative performance.
4.2.2. Time resampling
Time resampling is the most widely used method of resam-
4.1.5. Additional performance parameters
pling, therefore, it is the first sampling method analysed to be
Besides the four presented metrics used to evaluate the per-
considered as the comparison baseline.
formance of an investor, the following parameters are also used
Firstly, a temporal graph showing the evolution of the average
in the classification of this proposed system:
ROI per market for each trading signal is represented in Fig. 4.
• Percentage of periods in market: Percentage of time periods In other words, this figure, contains for each time instant, the
where a long position was in effect out of all the available respective average ROI of all 100 analysed markets. In this figure
time periods of the testing set; it can be observed that from May until September 2018 the
• Percentage of profitable positions: This parameter is cal- analysed market’s price drops considerably, as can be confirmed
culated by dividing the number of trades that generated a by B&H’s signal. EV and SVC were the methodologies that suffered
profit (with fees included), by the total number of trades; the smallest losses in this period. Overall, most methodologies
This probability is complementary to the percentage of non- are quite conservative when the market suddenly rises, hence a
profitable positions; B&H strategy earns more in these periods. On the other hand, this
• Average Profit per position: Average percentual profit or loss conservative behaviour is also responsible for minimizing losses
per position; in sudden price drops, contrarily to the B&H strategy.
• Largest Percentual Gain: Most profitable position; Secondly, Table 5 contains the general statistics obtained for
• Largest Percentual Loss: Greatest loss. the time resampling method. In this table, a better method can-
not be clearly defined. Relatively to EV, SVC achieved a worse
With the purpose of validating this system, besides testing predictive power and is slightly more risky, yet obtained nearly
with real market data through backtest trading and analysing the twice the profits. Nonetheless, EV stands out due to the better
results with the just introduced metrics, a previously mentioned accuracies and NLL as well as due to the remarkably small per-
investment strategy, the Buy and Hold (B&H) is applied. Accord- centage of periods in market and top values in the risk metrics,
ing to the Efficient Markets theory [43], prices are independent thus suggesting that this is clearly the least risky alternative.
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 13

Table 4
Comparison between the Buy & Hold strategy and each of the five methodologies employed.
Parameter B&H LR RF GTB SVC EV
Average obtained results (for all markets and resampling methods)
Final ROI −10.5% 519% 295% 335% 538% 615%
Accuracy 40.20% 53.50% 53.62% 53.51% 53.38% 56.28%
Negative log-loss −20.6 −0.7031 −0.6992 −0.6918 −0.6975 −0.6829
Periods in market 100% 56.0% 52.9% 55.0% 50.4% 39.9%
Profitable positions 19.5% 60.2% 56.2% 58.3% 58.8% 57.6%
Profit per position −10.6% 0.57% 0.31% 0.33% 0.61% 0.69%
Largest gain 35.6% 17.6% 17.7% 17.3% 18.4% 15.0%
Largest loss −46.0% −14.4% −15.0% −15.1% −14.5% −13.2%
Max drawdown 79.9% 57.6% 60.6% 62.0% 54.7% 49.3%
Annual sharpe ratio −0.164 0.769 0.413 0.312 0.848 0.945
Annual sortino ratio 0.169 2.374 1.665 1.407 2.568 2.821

Table 5
Average obtained results for the Buy & Hold and each of the five methodologies employed for time resampling.
Parameter B&H LR RF GTB SVC EV
Average obtained results (all markets are considered)
Final ROI −27.9% 25.6% 3.92% 3.30% 39.5% 18.7%
Accuracy 37.40% 54.77% 55.58% 54.84% 54.70% 59.26%
Negative log-loss −21.6 −0.6931 −0.6896 −0.6890 −0.6757 −0.6746
Periods in market 100% 51.4% 44.4% 48.9% 44.9% 27.7%
Profitable positions 15.0% 57.6% 50.8% 53.4% 56.0% 55.0%
Profit per position −27.9% 0.05% 0.01% −0.02% 0.09% 0.06%
Largest gain 20.8% 18.1% 20.8% 19.6% 18.4% 15.3%
Largest loss −48.7% −15.2% −14.5% −14.9% −16.2% −12.3%
Max drawdown 77.0% 55.6% 56.8% 59.6% 52.5% 42.5%
Annual sharpe ratio −0.352 0.075 −0.135 −0.327 0.228 0.288
Annual sortino ratio −0.137 0.608 0.330 −0.074 0.985 1.102

Fig. 4. Average accumulated ROI [%] for each instant from the starting until the last test point with time resampled data.

Thirdly, two specific markets and their return on investment 4.2.3. Percentage resampling
evolution are represented in Figs. 5 and 6 and their statistics are The results from this type of resampling, were only narrowly
shown in Table 6. In these figures, the top subfigure contains a outperformed by amount resampling in terms of predictive power
candlestick representation of the utilized resampled data for each and final returns. Nevertheless, on average these results are no-
cryptocurrency pair. As can be seen, the background is divided ticeably less risky. Due to its low risk and only slightly lower
into five different coloured periods. Each coloured period repre- profits and predictive power, this resampling method originates
a more desirable outcome, hence, it is worth carrying out a more
sents a different sub-dataset of the train-test splitting procedure.
in depth analysis of the obtained results.
This splitting procedure divides the data in five intervals with the
Firstly, a temporal graph showing the instantaneous average
same amount of samples. Additionally, it is also visible that the ROI of the 100 analysed markets for each of the methodologies is
ROI line only begins at the start of the second dataset, which is presented in Fig. 7. In this figure, although not as significant as the
in accordance with the fact that the first sub-dataset is only used previous resampling method, the drop in the markets from May
during the training procedure. Note that, for the trading signals until September 2018 is also visible in this case. Once again, the
where stop-loss orders are active (all except the ones originated signal from B&H was clearly surpassed by all remaining trading
by B&H) throughout this whole chapter, the largest loss peaks at signals in terms of profits.
approximately −20%, the stop-loss activation percentage. Secondly, a table containing the general statistics obtained for
Fig. 5 contains the example of a market where this system percentage resampling method follow in Table 7. In this table,
performed remarkably well. From this figure and the upper half of even though, on average EV’s returns are slightly inferior to
Table 6 it is clear that the trading signal from the LR outperformed SVC, the former distinctly obtains the highest predictive power
and the lowest risk out of all other trading signals, thus it may
the remaining signals. EV obtained a clear second place with
be concluded that the EV method clearly generates the trading
mostly above average results. Fig. 6, on the other hand, contains
signals with the top results. Once again, the B&H strategy clearly
a market with one of the worst performances obtained with this obtained the worst average performance.
system. From this figure and the lower half of Table 6 it can Thirdly, the results for two specific markets and their return
be concluded that EV yielded the best results with lowest risk. on investment evolution are represented in Figs. 8 and 9 and their
However, if the trading signals from which it is derived from do statistics are shown in Table 8. These figures follow the same
not perform well, there is only so much EV can do. In any case, the structure and contain the same markets as the figures presented
B&H strategy was outperformed by the remaining trading signals. time resampling.
14 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

Fig. 5. ROI variations for currency pair POEETH (Po.et/Ethereum) with time resampling applied.

Fig. 6. ROI variations for currency pair ADABTC (Cardano/Bitcoin) with time resampling applied.

Fig. 7. Average accumulated ROI [%] for each instant from the starting until the last test point with percentage resampled data.

Fig. 8. ROI variations for currency pair POEETH (Po.et/Ethereum) with percentage resampling applied.
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 15

Table 6
Results obtained with time resampling for the POEETH and ADABTC markets.
Parameter B&H LR RF GTB SVC EV
POEETH market with time resampling
Final ROI −28.0% 180% −15.6% 51.8% 39.7% 71.3%
Accuracy 39.02% 55.51% 58.75% 58.36% 50.99% 60.06%
Negative log-loss −21.06 −0.6851 −0.6732 −0.6840 −0.6660 −0.6730
Periods in market 100% 60.9% 35.2% 41.4% 73.3% 31.0%
Profitable positions 0% 63.5% 52.2% 54.4% 55.3% 55.7%
Profit per position −28.0% 0.205% −0.02% 0.060% 0.048% 0.102%
Largest gain NA 3.55% 21.3% 16.7% 25.9% 16.7%
Largest loss −28.0% −20.6% −20.3% −7.7% −21.0% −20.5%
Max drawdown 70.5% 27.14% 63.51% 19.16% 57.27% 26.93%
Annual sharpe ratio 0.222 2.190 −0.057 1.174 0.900 1.471
Annual sortino ratio 0.368 3.405 −0.152 1.869 1.383 2.130
ADABTC market with time resampling
Final ROI −76.3% −72.6% −50.3% −44.4% −67.1% −34.1%
Accuracy 35.54% 45.31% 55.21% 52.78% 43.41% 57.22%
Negative log-loss −22.26 −0.7053 −0.6905 −0.6925 −0.6632 −0.6826
Periods in market 100% 80.0% 50.7% 54.2% 84.7% 44.5%
Profitable positions 0% 48.3% 44.8% 39.1% 49.1% 51.2%
Profit per position −76.3% −0.19% −0.13% −0.13% −0.21% −0.08%
Largest gain NA 8.12% 10.8% 21.8% 19.4% 10.8%
Largest loss −76.3% −20.9% −14.4% −20.4% −15.4% −15.2%
Max drawdown 79.30% 74.20% 57.61% 53.87% 71.34% 45.58%
Annual sharpe ratio −2.057 −2.350 −1.541 −1.051 −1.842 −0.872
Annual sortino ratio −3.005 −3.372 −2.370 −1.719 −2.743 −1.380

Table 7
Average obtained results for the Buy & Hold and each of the five methodologies employed for percentage resampling.
Parameter B&H LR RF GTB SVC EV
Average obtained results (all markets are considered)
Final ROI 16.4% 792% 494% 692% 1063% 923%
Accuracy 41.38% 53.03% 53.00% 53.06% 52.64% 55.23%
Negative log-loss −20.2 −0.7112 −0.7028 −0.6928 −0.7187 −0.6866
Periods in market 100% 56.7% 55.8% 58.0% 52.3% 43.3%
Profitable positions 25.0% 61.9% 55.2% 58.0% 59.7% 59.0%
Profit per position 16.4% 0.82% 0.51% 0.67% 1.17% 0.97%
Largest gain 56.3% 20.9% 15.0% 16.3% 19.1% 13.6%
Largest loss −39.9% −13.2% −14.2% −14.2% −13.9% −12.7%
Max drawdown 80.5% 56.6% 59.9% 60.5% 55.4% 50.1%
Annual sharpe ratio 0.066 1.297 0.868 0.824 1.360 1.496
Annual sortino ratio 0.605 3.907 2.911 2.689 3.765 4.325

Fig. 9. ROI variations for currency pair ADABTC (Cardano/Bitcoin) with percentage resampling applied.

Fig. 10. Entry and exit points for the TRXBTC (Tron/Bitcoin) pair during 17 and 18 January 2018 for percentage resampling.
16 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

Fig. 11. Entry and exit points for the IOTABTC (Internet of Things Application/Bitcoin) pair from mid September 26 until mid October 2 2018 for percentage
resampling.

Table 8
Results obtained with percentage resampling for the POEETH and ADABTC markets.
Parameter B&H LR RF GTB SVC EV
POEETH market with percentage resampling
Final ROI 64.2% 1436% 850% 3145% 5747% 4416%
Accuracy 42.59% 56.60% 52.33% 56.71% 53.92% 56.54%
Negative log-loss −19.83 −0.6860 −0.7058 −0.6894 −0.6824 −0.6859
Periods in market 100% 46.5% 64.4% 43.8% 59.5% 44.9%
Profitable positions 100% 60.9% 54.1% 61.0% 62.3% 61.3%
Profit per position 64.2% 1.6% 1.0% 3.7% 2.1% 4.9%
Largest gain 64.2% 2.6% 2.5% 4.8% 6.4% 7.3%
Largest loss NA −2.2% −8.9% −3.7% −8.2% −4.3%
Max drawdown 89.98% 16.60% 54.60% 20.03% 70.94% 23.92%
Annual sharpe ratio 1.041 3.969 2.661 3.868 3.404 3.970
Annual sortino ratio 2.510 9.326 6.125 10.192 10.829 11.927
ADABTC market with percentage resampling
Final ROI −71.1% −56.8% −45.8% −85.7% −29.1% 10.1%
Accuracy 40.0% 52.04% 54.88% 50.24% 55.97% 59.02%
Negative log-loss −20.72 −0.6921 −0.6903 −0.6941 −0.674 −0.6837
Periods in market 100% 58.4% 52.5% 61.9% 23.6% 29.4%
Profitable positions 0% 58.6% 47.7% 48.4% 47.1% 53.8%
Profit per position −17.1% −0.14% −0.08% −0.20% −0.19% 0.03%
Largest gain NA 9.5% 9.4% 14.4% 12.5% 9.2%
Largest loss −71.1% −20.3% −11.9% −24.2% −8.8% −11.8%
Max drawdown 88.64% 75.77% 57.4% 87.7% 32.7% 35.7%
Annual sharpe ratio −1.483 −1.190 −1.028 −3.168 −1.013 0.471
Annual sortino ratio −2.303 −1.812 −1.637 −4.061 −1.682 0.662

Fig. 8 contains the example of a market where this system relatively calm period of the cryptocurrency pair IOTABTC. By
performed well. From this figure and Table 8 it can be concluded comparing the two figures, it is clear that the system is more
that the signal from EV only obtained top results for the risk inactive during calm intervals, which is in accordance with the
ratios. Results regarding ROI and especially predictive power are intention of taking advantage of highly volatile situations.
slightly inferior to the top values. In this example the top values
are dispersed throughout the different trading signals, which goes 4.2.4. Amount resampling
to show that when utilizing varied learning algorithms, one’s This case study contains an analysis over the results obtained
weakness can be overcome by another learning algorithm and, when the system utilizes amount resampled data. This type of
ultimately only the best traits of these algorithms are hopefully resampling procedure generates on average the best performance
noticeable in the EV, which in this case, did not happen. in terms of the ROI metric for the trading signals generated
Fig. 9, on the other hand, contains the same market who un- by EV. Hence, similarly to the previous case study, a more in
derperformed from the previous case study. From this figure and depth analysis will be carried out. Firstly, a temporal graph show-
Table 8 it can be concluded that EV obtained top predictive results ing the instantaneous average ROI of the 100 analysed markets
with a low risk, and even achieved a positive ROI, contrarily to the for each of the methodologies is represented in Fig. 12. In this
remaining trading signals. In fact, almost the opposite to what figure, similarly to percentage resampling, a slight drop is still
happened in the previous market is verified here. The trading visible from around May until September 2018. This resampling
signal from EV outdid most metrics of the remaining trading method generates an overall higher ROI relatively to the percent-
signals. This occurrence does not happen every time, but is one age resampling method. Once again, the B&H signal was clearly
potentiality of the EV. Obviously the contrary, where the trading surpassed by all remaining trading signals in terms of returns.
signal produced by EV is the worst out of all, does happen as Secondly, a table containing the general statistics obtained
well, but it is a much rarer occurrence in this system as can be for amount resampling method follow in Table 9. In this table,
confirmed by the tables containing the average overall results for yet again, the trading signal produced by EV clearly outperforms
any resampling method, where the EV trading signal on average the remaining methodologies. This method obtains the highest
excels the remaining. profits, the best predictive power and the lowest risk out of all
By now, it is evident that this resampling method is able to available trading signals. The B&H strategy visibly performed the
generate on average much higher ROIs than time resampling. worst as usual. Relatively to the previous case studies, it may be
However, in order to achieve the more thorough analysis of this concluded that on average, this resampling method generates the
work’s system, specific entry and exit points from periods with highest profits, but is riskier than both logarithmic amount and
different characteristics are represented in Figs. 10 and 11. Fig. 10 percentage resampling.
contains a volatile period of the cryptocurrency pair TRXBTC. Here Thirdly, the results for two specific markets are represented
it is visible that the system is mostly inside the market as the in Figs. 13 and 14 and their statistics are shown in Table 10.
overall trend is bullish. Fig. 11, on the other hand, contains a Likewise, these figures follow the same structure and contain the
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 17

Fig. 12. Average accumulated ROI [%] for each instant in the test dataset with amount resampled data.

Table 9
Average obtained results for the Buy & Hold and each of the five methodologies employed with amount resampled
data.
Parameter B&H LR RF GTB SVC Ensemble voting
Average obtained results (all markets are considered)
Final ROI −46.92% 859.1% 354.2% 215.5% 617.8% 1100.3%
Accuracy 40.84% 53.24% 53.12% 53.20% 53.58% 55.61%
Negative log-loss −20.4 −0.7021 −0.7012 −0.6921 −0.6868 −0.6839
Periods in market 100% 57.75% 55.28% 55.56% 50.96% 44.41%
Profitable positions 13.00% 59.72% 52.80% 55.49% 57.42% 57.10%
Profit per position −46.92% 0.919% 0.357% 0.210% 0.671% 1.22%
Largest gain 8.66% 18.26% 16.08% 18.94% 19.12% 16.44%
Largest loss −55.6% −16.28% −16.31% −17.00% −14.48% −14.10%
Max drawdown 81.6% 61.0% 63.7% 65.3% 55.5% 53.7%
Annual sharpe ratio −0.440 0.531 0.276 0.114 0.786 0.816
Annual sortino ratio −0.411 1.500 1.410 0.744 2.479 2.565

same markets as the figures presented in the previous case stud- technique also apply to this work [4]. Nevertheless, the system
ies. Fig. 13 and Table 10 contain the example of a market where developed in this work attempts to reproduce as closely as possi-
this system performed notably well. Once more, the trading signal ble a realistic environment in order to extrapolate its results into
from the EV overcomes in all aspects the remaining signals. an actual active trading system.
Nonetheless, it is observable that in this resampling method The obtained accuracies in this work are on par or, for the case
worse results were obtained for this market relatively to when of ensemble voting, exceed the accuracy of most papers regard-
percentage resampled data was used. The fact that lower values ing cryptocurrency exchange market forecasting throughout the
were obtained can be up to some degree blamed on the lower state-of-the-art ( Section 2 and Table 1). The risk measures and
predictive performance and the higher riskiness, as can be seen returns on investment are also superior on average. Only Mallqui
by comparing the metrics from this table with Table 8. Fig. 14, and Fernandes [34] obtained an absolute superior accuracy. How-
on the other hand, contains the market who underperformed. ever, contrarily to testing only one specific market, this system
From these figures and their respective values in Table 10, a was tested against 100 different markets. The levels of accuracy
conclusion similar to the one taken in the previous case study seen in Mallqui and Fernandes’s work were exceeded by a few
may be taken. The SVC’s signal performance, one more time, markets tested in this work (as can be discerned by observing
exceeded the remaining trading signals. However, tn this case, Table 5).
similarly to the logarithmic amount case study, the EV was unable In relation to the different resampling procedures, Table 11
of excelling both SVC and the remaining trading signals. contains a comparison showing how frequently the results from
For comparison purposes, the entry and exit points of the the two alternative resampling procedures exceed the results
same periods and markets used in the previous case study, but obtained with time resampling. It is worth noting the values
with data resampled according to an amount, are represented in displayed in this table were obtained by comparing the same
Figs. 15 and 16. Fig. 15 contains a highly volatile period of the given market and respective machine learning algorithm among
cryptocurrency pair TRXBTC. Relatively to the percentage case the different resampling procedures. All markets and machine
study (Fig. 10), it is clear that with amount resampled data, the learning algorithms were taken into account to compute the
trading signal typically does not keep the same long position values shown in this table.
active for as much time. With this type of resampling, relatively to Through analysis of the results obtained and Table 11, the
percentage resampling, the triggered long positions are of shorter following observations can be made:
durations. Therefore, this strategy is prone to sustain heavier
losses, namely due to transfer fees, which explains the larger risk • Relatively to the results provided by the time resampled
verified. Fig. 16, on the other hand, contains a relatively calm procedure it can be concluded that the developed strategy
period of cryptocurrency pair IOTABTC. Note that all candles have is not flawlessly suited for this type of resampling. Neither
durations of above 2 h due to the reduced price variation. The B&H nor any of the five remaining models achieved ab-
largest variation seen in this figure is of around 5% in the course solutely brilliant results. In any case, it may be concluded
of 12 h, this same pair earlier in 2018 on a regular basis had this the EV methodology is most definitely the superior one. It
same percentual variation in a matter of minutes. Once again, is worth praising the high predictive performances as can
the judgement that this system is clearly more active in volatile be perceived by the high accuracies and NLLs, particularly
moments is upheld. with the trading signals provided by EV, implying that this
system has predictive potential for this resampling method.
4.2.5. Results and discussion This suggests that if this same machine learning formulae
First of all, like all works mentioned in the state-of-the-art, were to be combined with a more fine-tuned strategy, there
the results obtained in this work were obtained through back- is potential to achieve more impressive results, risk and
test simulations, hence all the characteristic limitations of this return-wise.
18 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

Fig. 13. ROI variations for currency pair POEETH (Po.et/Ethereum) with amount resampling applied.

Fig. 14. ROI variations for currency pair ADABTC (Cardano/Bitcoin) with amount resampling applied.

Table 10
Comparison between the Buy & Hold strategy and each of the five methodologies employed for the amount
resampling in POEETH and ADABTC markets.
Parameter B&H LR RF GTB SVC Ensemble voting
POEETH market with amount resampling
Final ROI −39.35% 67.34% 367.7% 904.0% 1070.0% 1930.8%
Accuracy 42.31% 52.96% 52.41% 56.33% 54.90% 57.18%
Negative log-loss −19.92 −0.6881 −0.6970 −0.6870 −0.6793 −0.6817
Periods in market 100% 63.4% 67.3% 49.4% 52.9% 45.9%
Profitable positions 0% 56.9% 53.4% 57.7% 58.6% 59.2%
Profit per position −39.35% 0.060% 0.39% 0.98% 1.2% 2.0%
Largest gain NA 4.11% 2.4% 4.1% 3.7% 11.6%
Largest loss −39.35% −16.7% −7.7% −0.84% −0.66% −1.7%
Max drawdown 90.3% 75.3% 65.0% 37.1% 40.4% 24.9%
Annual sharpe ratio 0.2394 0.9534 1.855 2.051 2.799 3.092
Annual sortino ratio 0.4224 1.861 4.854 7.718 7.222 11.979
ADABTC market with amount resampling
Final ROI −85.87% −84.06% −86.45% −79.24% −45.78% −58.85%
Accuracy 39.37% 51.28% 49.51% 51.23% 52.96% 54.66%
Negative log-loss −20.94 −0.6916 −0.7176 −0.6935 −0.6732 −0.6882
Periods in market 100% 70.0% 71.1% 60.9% 64.3% 59.6%
Profitable positions 0% 52.1% 46.6% 49.3% 50.4% 50.7%
Profit per position −85.87% −0.135% −0.213% −0.200% −0.084% −0.113%
Largest gain NA 12.2% 13.5% 10.2% 16.4% 15.4%
Largest loss −85.87% −12.9% −21.6% −11.5% −8.4% −9.7%
Max drawdown 88.58% 85.5% 87.5% 79.9% 50.6% 61.9%
Annual sharpe ratio −2.224 −2.802 −3.018 −2.475 −0.960 −1.394
Annual sortino ratio −3.197 −3.845 −4.025 −3.341 −1.645 −2.015

• Looking at the results obtained by the percentage resampled clearly preferable relatively to the time resampled proce-
dure. Nonetheless, even though the returns were plainly
data, it may be concluded that return and risk-wise it is superior, the predictive power is inferior.
T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187 19

Fig. 15. Entry and exit points for the TRXBTC (Tron/Bitcoin) pair during most of 17 and 18 January 2018 for amount resampling.

Fig. 16. Entry and exit points for the IOTABTC (Internet of Things Application/Bitcoin) pair from mid September 26 until mid October 2 2018 for amount resampling.

Table 11
Proportion of markets resampled according to alternative methods that surpass the results of time resampling (out
of 500 market-Machine learning algorithm combinations).
Alternative resampling: Percentage Amount
Markets that obtained ROI higher than time resampling 69.6% 61.4%
Markets that obtained ROI at least 10x higher than time resampling 12.2% 55.4%
Markets that obtained Sharpe ratio higher than time resampling 72.4% 63.8%
Markets that obtained Sortino ratio higher than time resampling 72.8% 63.8%

• Finally, the results generated by the amount resampled data, noting is that, independently of the utilized strategy, higher ac-
once again, clearly outperform the time procedure return curacies or NLL do not inevitably translate into a higher ROI or
and risk-wise, but the predictive power is certainly inferior, a lower risk. To conclude, as a rule, both alternative resampling
likewise the percentage procedures. Relatively to percentage procedures consistently provide a higher ROI relatively to time
resampled data, generally this resampling procedure yields resampling, suggesting that financial time series resampled in
a better ROI, in fact almost all markets who acquired a accordance to an alternative metric, rather than time, do in fact
higher profit relatively to time resampling got a profit at have the potential of creating a signal more prone to earn larger
least 10 times higher. This type of resampling also got a profits with less risk involved.
slightly superior predictive power, yet this procedure clearly
has a higher risk associated when compared to percentage 5. Conclusion
resampling.
• Regarding the fact that, time rearrangement on average, In this work a system combining several machine learning
holds the lowest ROI, but contradictorily also obtains the algorithms with the goal of maximizing predictive performance
best accuracies and negative log-losses out of all the resam- was described, analysed and compared to a B&H strategy. All
pling methods employed, a possible explanation is because aspects of this system, namely the target formulation, were de-
most investors utilize similar investment strategies and pre- signed with the objectives of maximizing returns and reducing
diction algorithms based on similarly time sampled data. risks always in mind. To validate the robustness of this sys-
Furthermore, market manipulation is entirely permitted in tem’s performance, this system was tested on 100 cryptocurrency
cryptocurrency exchange markets. Because the algorithms exchange markets through backtest trading.
and strategies utilized by investors are related and due to
Based upon this work, it may be concluded that all four dis-
their collective influence, the current ongoing time series
tinct learning algorithms consistently bared positive results, bet-
is constantly being heavily impacted. Hence, when an al-
ter than random choice or the B&H strategy. Nonetheless, the
gorithm similar to the ones who collectively crafted and
trading signal generated by the ensemble voting method pro-
whose influence is deeply embedded in a given time series
duced by far the best results.
intends to backtest trade with this same time series, it seems
Independently of the utilized learning algorithm, the out-
plausible that its overall predictive power is enhanced. Nev-
come of utilizing data resampled according to alternative metrics,
ertheless, to assign this phenomenon as the unambiguous
namely data resampled according to a fixed percentage as well as
source of this issue would require further investigation.
a fixed amount, proved to generate significantly higher returns
Independently of the resampling method, this system is clearly than the commonly used time sampled data. The overall results
more active in volatile periods than calm periods as was pre- were in accordance with what was anticipated: rearranging fi-
tended. Throughout these case studies, it was verified that spe- nancial time series according to an alternative metric does in fact
cific markets, such as ADABTC, will invariably yield losses inde- construct a series prone to generating larger returns, which is in
pendently of the resampling procedure. A portfolio management accordance with one of the main intentions of this work.
could possibly withdraw these markets from the list of tradeable It is worth noting that the return on investment obtained in
markets in order to prevent these types of losses. On the upside, it this system with the alternative procedures is in some markets
was verified that if this work’s system was able of turning a profit excessive. This fact is attributed to the absence of bid–ask spread
with a time resampled financial series, most likely, the same data, as was mentioned in Section 3. Putting it differently, this
given market attains far larger returns with any of the two al- work shows that the three alternative methods offer more prof-
ternative resampling procedures. One important takeaway worth itable ROIs relatively to time resampling, however, it is unlikely
20 T.A. Borges and R.F. Neves / Applied Soft Computing Journal 90 (2020) 106187

that this system would do as well, return-wise, in an actual real [15] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140.
scenario in face of bid–ask spreads. [16] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning:
Data Mining, Inference, and Prediction, in: Springer Series in Statistics,
In future work, among other improvements, the following
Springer New York, 2009.
suggestions seem to be the most promising: incorporation of bid– [17] G. Louppe, Understanding random forests: From theory to practice, 2014,
ask spreads in the system to have a more realistic system; in arXiv preprint arXiv:1407.7502.
this work static resampling thresholds were applied, but these [18] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Pro-
could be dynamically adjusted as a function of specific indica- ceedings of the 22nd Acm Sigkdd International Conference on Knowledge
Discovery and Data Mining, ACM, 2016, pp. 785–794.
tors; an alternative set of technical indicators or indicators (such [19] Y. Freund, R.E. Schapire, et al., Experiments with a new boosting algorithm,
as fundamental indicators or data extracted from social media) in: Icml, Vol. 96, Citeseer, 1996, pp. 148–156.
could be experimented with; the utilized set of machine learning [20] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line
algorithms could be perfected with the aid of statistical testing learning and an application to boosting, J. Comput. Syst. Sci. 55 (1) (1997)
119–139.
and additional machine learning procedures namely evolution-
[21] J.H. Friedman, Stochastic gradient boosting, Comput. Stat. Data Anal. 38 (4)
ary algorithms should be tested in an attempt to address the (2002) 367–378.
shortcomings of the ones applied. [22] J.H. Friedman, Greedy function approximation: a gradient boosting
machine, Ann. Stat. (2001) 1189–1232.
Declaration of competing interest [23] R. Mitchell, E. Frank, Accelerating the xgboost algorithm using GPU
computing, Peer J. Comput. Sci. 3 (2017) e127.
[24] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995)
No author associated with this paper has disclosed any po- 273–297.
tential or pertinent conflicts which may be perceived to have [25] H.-T. Lin, C.-J. Lin, R.C. Weng, A note on Platt’s probabilistic outputs for
impending conflict with this work. For full disclosure statements support vector machines, Mach. Learn. 68 (3) (2007) 267–276.
[26] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, Chapman and
refer to https://doi.org/10.1016/j.asoc.2020.106187.
Hall/CRC, 2012.
[27] J.a.M.R. Cardoso, R. Neves, Investing in credit default swaps using technical
Acknowledgement analysis optimized by genetic algorithms, 2017.
[28] Z. Jiang, J. Liang, Cryptocurrency portfolio management with deep rein-
This work is funded by FCT/MCTES through national funds and forcement learning, in: 2017 Intelligent Systems Conference, IntelliSys,
IEEE, 2017, pp. 905–913.
when applicable co-funded EU funds under the project [29] M. Nakano, A. Takahashi, S. Takahashi, Bitcoin technical trading with
UIDB/EEA/50008/2020. . artificial neural network, Physica A 510 (2018) 587–609.
[30] S. McNally, J. Roche, S. Caton, Predicting the price of Bitcoin using Machine
References Learning, in: 2018 26th Euromicro International Conference on Parallel,
Distributed and Network-Based Processing, PDP, IEEE, 2018, pp. 339–343.
[31] A. Greaves, B. Au, Using the bitcoin transaction graph to predict the price
[1] G. Hileman, M. Rauchs, Global Cryptocurrency Benchmarking Study,
of bitcoin, 2015, No Data.
Cambridge Centre for Alternative Finance, Vol. 33, 2017.
[32] J. Yao, C.L. Tan, A case study on using neural networks to perform technical
[2] E. Hadavandi, H. Shavandi, A. Ghanbari, Integration of genetic fuzzy
forecasting of forex, Neurocomputing 34 (1–4) (2000) 79–98.
systems and artificial neural networks for stock price forecasting,
[33] K. Żbikowski, Using volume weighted support vector machines with walk
Knowl.-Based Syst. 23 (8) (2010) 800–808.
forward testing and feature selection for the purpose of creating stock
[3] J. Nadkarni, R.F. Neves, Combining neuroevolution and principal compo-
trading strategy, Expert Syst. Appl. 42 (4) (2015) 1797–1805.
nent analysis to trade in the financial markets, Expert Syst. Appl. 103
[34] D.C. Mallqui, R.A. Fernandes, Predicting the direction, maximum, minimum
(2018) 184–195.
and closing prices of daily Bitcoin exchange rate using machine learning
[4] M.L. De Prado, Advances in Financial Machine Learning, John Wiley & Sons,
techniques, Appl. Soft Comput. 75 (2019) 596–606.
2018.
[35] E. Akyildirim, A. Goncu, A. Sensoy, Prediction of cryptocurrency returns
[5] S. Nakamoto, et al., Bitcoin: A Peer-to-Peer Electronic Cash System,
using machine learning, 2018.
Working Paper, 2008.
[36] R. Caruana, A. Niculescu-Mizil, An empirical comparison of supervised
[6] A. Narayanan, J. Bonneau, E. Felten, A. Miller, S. Goldfeder, Bitcoin and
learning algorithms, in: Proceedings of the 23rd International Conference
Cryptocurrency Technologies: A Comprehensive Introduction, Princeton
on Machine Learning, ACM, 2006, pp. 161–168.
University Press, 2016.
[37] L.A. Teixeira, A.L.I. De Oliveira, A method for automatic stock trading
[7] T. Klein, H.P. Thu, T. Walther, Bitcoin is not the New Gold–A comparison
combining technical analysis and nearest neighbor classification, Expert
of volatility, correlation, and portfolio performance, Int. Rev. Financ. Anal.
Syst. Appl. 37 (10) (2010) 6885–6890.
59 (2018) 105–116.
[38] C.D. Kirkpatrick II, J.A. Dahlquist, Technical Analysis: The Complete
[8] J.J. Murphy, Technical Analysis of the Financial Markets: A Comprehensive
Resource for Financial Market Technicians, FT Press, 2010.
Guide to Trading Methods and Applications, Penguin, 1999.
[39] S.B. Achelis, Technical Analysis from A to Z, McGraw Hill New York, 2001.
[9] R.C. Cavalcante, R.C. Brasileiro, V.L. Souza, J.P. Nobrega, A.L. Oliveira,
[40] R.J. Hyndman, G. Athanasopoulos, Forecasting: Principles and Practice,
Computational intelligence and financial markets: A survey and future
OTexts, 2018.
directions, Expert Syst. Appl. 55 (2016) 194–211.
[41] A. Defazio, F. Bach, S. Lacoste-Julien, SAGA: A fast incremental gradient
[10] Y.S. Abu-Mostafa, A.F. Atiya, Introduction to financial forecasting, Appl.
method with support for non-strongly convex composite objectives, in:
Intell. 6 (3) (1996) 205–213.
Advances in Neural Information Processing Systems, 2014, pp. 1646–1654.
[11] D.W. Hosmer Jr, S. Lemeshow, R.X. Sturdivant, Applied Logistic Regression,
[42] M. Magdon-Ismail, A.F. Atiya, An Analysis of the Maximum Drawdown Risk
Vol. 398, John Wiley & Sons, 2013.
Measure, Citeseer, 2015.
[12] S. Raschka, Python Machine Learning, Packt Publishing Ltd, 2015.
[43] B.G. Malkiel, E.F. Fama, Efficient capital markets: A review of theory and
[13] K.P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press,
empirical work, J. Finance 25 (2) (1970) 383–417.
2012.
[14] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.

You might also like