Data Management for Forex Trading
Data Management!
“Garbage in, Garbage out!” When we have inaccurate data, our tests and outputs will be corrupt. Thiscould cause massive losses for the user. Hence, before we test and optimise our Robots, we need to
make sure that our data is clean.
Data Management and Cleaning is a large field with many in-depth areas of study. For the purpose of
this course, we are going to focus on Data Cleaning with regards to market data.
What is dirty market data?
Essentially, dirty market data is data that inaccurately reflect the current state of the market.
Types of problems:
1) Duplicate Entries
2) Missing Entries
3) Wrong Entries
Process data…
Steps:
1) Data Check
Before we check, we need to know what our acceptable level of cleanliness is. Eg. If we are doing
trading intraweek on Open and Close prices, we can afford to ignore errors due to Sunday Candles
and those related to High, Close and Volume.
Next, we should have an understanding of what clean data looks like. Eg. A stock price cannot be
negative but a future spread derivative can.
Start by eyeballing the chart data to get a feel of the big picture (you may not be able to do that if
you are dealing with big data, fortunately we are not). Next, use statistic tools like MATLAB, R, Maple,
FORTRAN, Excel to check the data.
2) Data Cleaning/Cleansing/Scrubbing
After identifying the problem, we have 4 methods to solve/mitigate the problem.
a) Ignore the error
b) Delete the entry
c) Replace the entry
d) Modify the entry
Depending on the purpose of the data, we select one of the above options.
Modifying the entry may involve “massaging” the entire data set. This means we process the entire
data set such that the error is almost irrelevant. However, our results can only be used as an
estimate to the original. Eg. Instead of EURUSD hourly data, we run our test on EURUSD SMA(5)
data. This smoothes out the irregularity in the data.
Other ways to modify the data includes creating a model to predict the next entry.
3) Data Validation
Repeat Step 1.
Run trading backtests and optimisation using the data. Check if the output is as expected or out-ofline.
4) Post-Processing and Monitoring
Monitor the results to ensure correctness. A severe difference between live trading performance and
testing could indicate data inaccuracy. Ensure that new data do not mix when the old cleansed data.
Challenges to Data Cleaning
1) Expensive
The more complicated and bigger our data, the more time, manpower (and money) it takes to ensure
cleanliness. There is a positive relationship between computational cost and cleanliness of data.
2) Inaccurate/Ineffective Cleaning Methods
Poor cleaning methods not only fail to clean our data, they may pollute it further. This may cause
addition problems during testing. If the errors go uncheck, it will potentially cause our trading
systems to fail.
3) Difficulty in maintaining Clean Data
Whenever new data is available, we need to restart the cleaning process. Hence, data cleaning is a
continuous process and requires much effort.
Comments
Post a Comment