New Post has been published on TRADE MASTER TEAM
New Post has been published on https://trademasterteam.com/forex-strategies/how-to-build-a-winning-machine-learning-forex-strategy-in-python-creating-the-feature-space-2/30184
How to Build a Winning Machine Learning FOREX Strategy in Python: Creating the Feature Space (2)
by [Music] for the rest of them we need two more loops okay so the first loop is looping through each key or looping through the key list all right so remember once we hit with we're doing momentum then we need to loop through The Associated keys in the momentum so for J in key list I okay because we're looping through the ice we're good I know it's kind of complicated we're looping through the the ice element of key list which is just one of these upper level keys like momentum key used to cast a key okay and then once we're in here we want to loop through the columns of that dictionary according to that key that we're on alright because most of each key for each period we're going to calculate multiple columns of data so now we need to loop through those columns so we'll do that by doing this for K in this list dick list I J alright so what are we doing here so so dickless IJ so I is is the actual element so one of these or stochastic or Williams and then J is going to be some key inside that one so the keys according to sprints as if we're on Williams this is basically to get a give us all of the columns inside the Williams dictionary corresponding to the Jade key all right I know that's kind of complicated but just look at it for a while stare at it and you'll understand and so now we're going to do is create the column ID for this element and so the column ID is going to be the column feature plus string of J okay so that's going to be a number according and J is going to be whichever key we're on we're looking at and then plus K and K is the name of the column so for instance when we go inside the Bolinger dictionary we're going to have the name bowl or Bolinger and then 15 which is the period that we're doing it for and then k now K can be either upper mid or lower since we have three bowling or bands for each period so it's so this is going to create three different column IDs and then what we're going to do is going to set master frame column ID is equal to Dick's List and now I J okay okay so that's how it's indexed and this took me a long time to develop how to figure out how to do this well that is it now at the end of all this looping nonsense we should have a completely populated master frame now that we have the master frame we need to address a couple issues and one of the issues is that we will have man values in our data frame and for machine learning we do not want these man values so one thing that I'm going to introduce is something called the threshold okay so the threshold is is going to be I'm going to say 70% of the length of the data frame so let's do that really quick so we'll do round and we'll do point seven four seventy percent times the length of the master frame so this is going to be a number according and it's going to be a rounded number that is approximately 70% of the length of the data frame and so what we're going to do later is we're going to say if a column does not have threshold number of actual value number values that we're going to just get rid of that column completely so basically we're setting the threshold saying if we don't have this amount of data that's clean then we're going to get rid of this so next I'm just going to go ahead and rename I'm going to stress master frame to also have the price data in it so we're going to set some new columns by doing this we'll go up here open high low close ass volume lottie dottie da set that right here and actually we don't need to ask why and so just get rid of that guy and then we'll say that one is equal to prices of the same columns oh sorry I am coming down right here so right here like that okay alrighty and then another issue that we need to address I can oh she is resampled so it has this means that it's going to have empty data data in in between so what that means is that for hiking ah she you know remember we resampled it to 15 hours so that means that for our it's going to have it's going to line up with our other data but for 14 hours in between each hike and asha candle it's going to be an and value so what we want to do is we want to just back fill all those nine values so what we're going to do is master frame and we're going to get that that column that we named through this loop which we'll be hiking 15 open that is going to be equal to master frame hiked in 15 open dot still MA and we're going to use the back fill method okay and we're going to do this for all the other I cannot she high open low close so all right now that we have all of those we are going to go ahead and we are going to drop all the columns that do not meet this condition here so drop columns that have 30% or more man data okay because 30% is going to the opposite of 51st or 70% so the way we're going to do that is we're going to call it master frame clean master frame cleaned is equal to that way we have two different master frames and we can just say we're going to create it master frame copy that way we have a copy just in case something goes wrong and we don't lose any anything so now what we'll do is we'll say master frame cleans is equal to master frame cleaned dot drop not and we'll say axis is 1 so this is a column axis and we'll say Thresh Thresh and this is it is equal to threshold okay so that will drop the columns that do not meet this threshold requirement which is what we want and then we will drop our there's going to be man data for like the first so however many rows of data at the top and at the bottom so we want to drop those off as well so master frame clean is equal to mass frame clean dot drop now axis is equal to 0 so that's going to drop off data that we don't care for all right and now that that is all completed let us write it to a CSV file so that we can do it we can have access to it later later on so we'll do is we'll say master frame cleaned dot to CSV and we will put it in the data folder and we'll call it master brain dot CSV all right and let's just say complete completed feature calculations alright so if all goes well I'm going to go ahead and run it here let's say a prayer to the rain gods that it works okay looks like we just had a simple error here oh okay so this is what I did wrong here up here I said data calms what I'm meant to do is say data is equal to data indexed by this so that we're effectively dropping off the date column that was a small error hopefully there's no more as yet another small error that I found here and this one's actually really dumb so in this loop here you'll see here it says can only concatenate lists not string to list and that's because in here I accidentally set the whole like the whole string or the whole list of column features we really want is column feature I okay so hopefully that was the only dumb air I mean that's too now so to dumb errors and counting let's run it again guys well it just finished but real quick and we'll let you know that there was another error and yeah so that brings our air counter to three which is par so we're still on track here so I'm going to tell you where that error was and it had to do with the index of the hike annachi candle beta frame and that's because the feature functions when we are when we re sample the data which is right here we created this new column called symbol and so that that basically created it when we grouped it it made a multi index data frame which we didn't want and if you remember in a many videos ago when I created the item aashiq handle data frame I put a line in there but I said we won't need this until later and that line is this one right here so we originally had put it in comp it out and it was this drop level okay because that's what we want to do we want to drop the zeroth level which is the symbol index level so after you add this line or uncommented or whatever you need to do and this is in the hike annachi function file then it should work completely properly so as you can see here for my output it went through and it completed all the calculations so I'll just open it up to show you guys what it looks like and you see here the file exceeds the limit but you know it's a huge data frame that we just created so as you can see here this is the result of that loop that we made where it creates the column names momentum eight close so the 8th period momentum closed nine period momentum closed and so on so if you go through all these columns I believe there's like 73 columns I think or sixties I can't remember how many but yeah there's a lot of data that we just generated all right so this is what our feature frame looks like and it is a lot of data to handle and you'll notice if you run this by yourself that the feature collector takes a long time to gather all these features and if the number of data points goes up and we do like two or three years of data or even more then we end up spending like 30 or 40 minutes just to collect all these features so in the next video what I'm going to do is show you guys how to leverage the multi processing library so we can create a pool of processes that communicate with each other and so that we can access multiple compute computational cores and complete the process in like a fifth of the time so I hope you guys enjoyed this video in the next video or maybe in a couple videos down the road we will do that in the next video I will show you guys how to back test and to basically simulate trading with these features already and you guys have any questions or comments go ahead and ask me below you also let me know how many errors you get maybe you can feed my three errors for this file which is pretty good so I hope you guys are having a good day and I'll see you next time














