13-How Good Is Your Data
13-How Good Is Your Data
13-How Good Is Your Data
by Sunny Harris
TRADING TECHNIQUES
Libert! Fraternit!
Is
file. The instrument I chose was the Russell 2000 index,
all data equal? If truth be told, I never gave it much which has different symbols in different software, like Rut,
thought. I have been using one vendor nearly exclusively $Rut, and RU2000. I selected the Russell 2000 because of
for about 20 years. My fills are good enough. My closing its high liquidity, ease of use, and it is something little guys
prices seem to match what I see on television or find like us can trade.
online. As long as the profits roll in, there has been no Figure 1 shows the beginning of the spreadsheet, with the
reason to question the data. data of the two vendors (T and M) in the columns. At first
But then I was told by another vendor that my vendors glance it appeared that everything was in order, with small
data is off by just enough to generate a side income, discrepancies here and there. The differences in the data,
through the slippage from actual price to the price I where there is one, seem to be out in the hundredths place,
am presented. My curiosity was piqued, and so I decided to like 600.01 vs. 600.02. That wouldnt make much difference
investigate. First, I set up a spreadsheet and compared the over time, with some errors to the positive and the negative.
two vendors. To keep it simple, I considered only the past five It seems like it should be a wash.
years of data. My data experiment ran from June 30, 2005, to Next, I put columns in the spreadsheet to calculate the dif-
June 29, 2010. ferences between the open, high, low, and close (Ohlc) of each
Copyright Technical Analysis Inc.
Stocks & Commodities V. 29:2 (42-47): How Good Is Your Data? by Sunny Harris
Februar
Figure 1: data comparison, vendor t vs. vendor m. The differences in the data seems to be in the hundredths. Will deviations Figure 2: total differences, t
this small affect your profits? vs. m. In the first row of data you see
a sum of the differences in the open,
high, low, and close between the two
vendor. Part of that spreadsheet is shown in vendors. Now it gets interesting. The
Figure 2. At the top of each column, in the You can do all the testing in the closes are 52 points lower, the opens
are 40 points higher, the highs are
first row of data, is the result of calculating world, but when it comes to entering 65 points lower, and the lows are 48
the sum of all the differences between the
two vendors Ohlc data. I wouldnt have
real trades, the markets will hand points higher.
I found small discrepancies clock tells time a bit off from every other clock in the shop.
that led to large numbers when Theres no way to tell what time it really is. Which clock is
summed over time (Figure 4). telling the right time?
On its own, an error of 0.01 This situation demands that I compare the data from vendor
doesnt seem like much. But G to vendor T and also to vendor M. Im not sure what I would
when you add that up over five find out if none of it matched, but if one matched one other,
years of data, it is 1,257 trad- then Ill know something about the veracity of the vendors
ing days and an accumulated data that didnt match.
error of $12.57. Remember, Heres the spreadsheet I have for three vendors data so far
each point is worth $100 on (see Figure 5). Back to the differences spreadsheet, I inserted
the Rut. columns for calculating the new spreads: vendors T versus
This is where it starts look- G; T versus M; and G versus M. That setup will be compared
ing scary. Multiplying $12.57 against the other and maybe Ill get some clarity. The differ-
x 100 gives you $1,257. Thats ences section of the spreadsheet can be seen in Figure 6.
over $1,000 out of the traders Aha! Look at the zeroes in columns BD through BG. Reading
pocket. It isnt huge, but if you the description in row B over those columns (shaded green), I
are the vendor and you have see that the zeroes show up when comparing vendors M to G.
20,000 clients at $1,000 each, Still, looking at the numbers over the header Differences M
that comes to $20 million. v G, we see that despite all the zeroes there are discrepancies
That is $20 million over five along the way, giving us (9.17) among the closing values.
years. Now I was beginning to As I scanned the columns of this spreadsheet comparison,
understand what that vendor I found that on September 17, 2008, there was a difference of
FIGURE 4: TOTAL DIFFERENCES, T VS.
G. Although the discrepancies are small, was talking about. (8.83) between the close of M and the close of G. That was
when summed over five years of data it Still, I couldnt go anywhere where most of the error comes in.
could accumulate to $12.57 per point on the with this bit of information. How could these vendors have such differences among
Russell 2000. This situation was akin to hav- their data? Isnt the close the close, no matter who vends it?
ing a clock shop where each Next, I called the Russell 2000 exchange and got the data
FIGURE 5: THREE VENDORS DATA. Comparing data from three vendors will say something about the veracity of the data.
FIGURE 7: RUSSELL 2000 EXCHANGE DATA VS. T VS. M VS. G. If you look at the data of the close carefully, you will note there are slight discrepancies.