*** Proof of Product ***
Exploring the Essential Features of “QuantInsti – Unsupervised Learning in Trading”
Unsupervised Learning in Trading
Enhance your trading with unsupervised learning algorithms by using concepts like PCA, euclidean distance, WCSS, elbow curve and dimensionality. Create, backtest, paper trade trading strategies using clustering algorithms like k-means and DBSCAN. Perfect for ML Engineers, Python Programmers, Students, Quant Traders, Quant Researchers, and Risk managers.
LIVE TRADING
- Creating, backtesting and paper trading a trading strategy using a clustering algorithm
- Creating a list of pairs that are suitable for pairs trading strategy
- Intuitively and mathematically describe the working of principal component analysis (PCA)
- Detailed difference between supervised learning algorithms and unsupervised learning algorithms
- Describe, implement and list the differences between the workings of k-means clustering algorithm and DBSCAN clustering algorithmÂ
- Concepts such as euclidean distance, WCSS and interpret elbow curve
- Curse of dimensionality and various ways to overcome the curse of dimensionality
- Applications of unsupervised learning
SKILLS COVERED
Python
Numpy
Pandas
Sklearn
Matplotlib
Itertools
Machine Learning
K-Means Clustering
DBSCAN Clustering
Curse of dimensionality
Principal Components Analysis
Hyperparameter selection
Statistics and Maths
Euclidean distance
WCSS or inertia
Eigenvalues and eigenvectors
Cointegration
Skewness & hit ratio
COURSE FEATURES
- Interactive Coding Practice
- Capstone Project Using Real Market Data
- Trade and Learn Together
PREREQUISITES
A general understanding of trading in the financial markets such as how to place orders to buy and sell is helpful. Basic knowledge of the pandas dataframe and matplotlib would be beneficial to easily work with the codes and trading strategies covered in this course. To learn how to use Python, check out our Free course “Python for Machine Learning in Finance”.
SYLLABUS
Introduction to the Course
Unsupervised learning has the ability to uncover hidden patterns in the dataset and can provide unique insights in your data. In this section, you will acquaint yourself with the course structure, and the various teaching tools used in the course: videos, quizzes, and strategy codes. The interactive methods used help you to not only understand the concepts, but also how to implement the strategies.
Course Introduction 4m 40s
Course Structure 3m 39s
Course Structure Flow Diagram 10m
Quantra Features 4m 10s
Introduction to Unsupervised Learning
Unsupervised learning can be applied when you are not sure about the end outcome that you are looking for. It can be used to divide the data in smaller groups when the labels are not available. This section will give you an overview of what unsupervised learning is along with an insight into one of its applications, dimensionality reduction.
Introduction to Unsupervised Learning 3m
Need for Unsupervised Learning I 2m
Need for Unsupervised Learning II 2m
An Application of Unsupervised Learning 2m 55s
Dimensionality Reduction 2m
Features after Dimensionality Reduction 2m
Need for Dimensionality Reduction 2m
Facts about Unsupervised Learning 2m
Clustering
Clustering is a technique to divide the data into smaller groups with similar data points called clusters. This section explains how the outputs are generated in clustering. You will also understand how clustering is different from classification.
Clustering 2m 48s
What is Clustering? 2m
Criteria for Grouping 2m
Number of Data Points in a Cluster 2m
Output of Clustering Algorithm 2m
Properties of a Cluster 1m 45s
Optimal Cluster 2m
Properties of a Cluster 2m
Classification Vs Clustering
Labels in the Dataset 2m
Aim of Clustering 2m
Select the Algorithm I 2m
Select the Algorithm II 2m
Limitations of Clustering 2m
K-Means Clustering
You will be introduced to your very first clustering algorithm, the k-means. You will take a deep dive into how k-means works by developing an intuitive as well as a mathematical understanding on how the algorithm finds hidden patterns in the data.
What is K-Means Clustering? 3m 30s
Optimised Centroid 2m
Points in a Cluster 2m
Mathematics behind K-Means Clustering 5m 22s
Euclidean Distance 2m
Choosing the Cluster 2m
Mean Distance 2m
Additional Reading 10m
K-Means for Financial Data
This section will introduce you to the application of k-means on financial data. You will find hidden patterns on the price series for Apple stock using the relative strength index and the average directional index. The implementation for this will be carried out in Python.
K-Means on Financial Data 2m 31s
Number of Clusters 2m
Features 2m
Wrong Clusters 2m
Distance from Centroid 2m
Using Jupyter Notebook 1m 54s
Applying K-Means to Create Clusters 10m
Calculate Percentage Change 5m
Calculate Volatility 5m
Initialise K-Means 5m
Fit and Predict the Model 5m
Additional Reading 10m
Scaling the Data
You want to introduce multiple features to your model, say for instance, RSI, ADX and volatility. Can you directly pass the features to the model? The model won’t appreciate that! This section will introduce you to the importance of scaling your data before passing it to the k-means model.
Scaling the Data 2m 33s
Distance Problem 2m
Scaling Requirement 2m
Range for Min-Max Scaling 2m
Subtraction in Min-Max Scaling 2m
Min-Max Scaling Calculation 2m
Scaled Clusters 2m
Scaling Technique 5m
Scaling the Data 10m
Min-Max Scaler 5m
Additional Reading 10m
Feature Selection
Remember the time you used simple moving averages, bollinger bands, MACD and 5 other indicators to take that trade? This feature selection section will introduce and guide you in selecting your input features and what features are considered good.
Feature Selection 3m 6s
Selecting Features 2m
Appropriate Features 2m
Stationary Series 2m
Correlated Series 2m
Correlated Features 2m
Choosing Threshold Value 2m
Discarding Features 2m
Feature Selection 10m
Calculate Test Statistic 5m
ADF test 2m
Create a Correlation Matrix 5m
Pairs Above the Threshold Correlation Value 5m
Drop Columns 5m
Additional Reading 10m
Selecting Clusters for K-Means
K-means asks you to pass the number of groups you would like to discover with unique hidden patterns. You can find eight groups or ten groups or any number as per your choice. The section answers the question whether there is an optimal number of groups that can be created.
Choosing the Number of Clusters 4m 45s
Calculate WCSS for 2 Clusters 2m
Penalising in WCSS 2m
Minimise WCSS – I 2m
Minimise WCSS – II 2m
Change in WCSS 2m
Optimum Clusters 2m
Steps in WCSS 2m
Choosing the Number of Clusters 10m
Calculate WCSS 5m
WCSS for Multiple Cluster Numbers 5m
Additional Reading 10m
Analysing Clusters: Hit Ratio
You have identified the hidden patterns and created the clusters. What do you do next? You generate trading signals! You will learn to create the very first trading strategy using the k-means algorithm and perform a comprehensive backtest.
Cluster Analysis with Hit Ratio 4m 35s
Trading Signals 2m
Average Future Returns 2m
Future Returns 2m
Threshold Value for Hit Ratio 2m
Train-Test Split 2m
Cluster Analysis with Hit Ratio 10m
Strategy Analytics for Hit Ratio 10m
Creating Train and Test Dataset 5m
Analytics for a Single Cluster 5m
Calculating Hit Ratio 5m
Map Clusters to Trading Signals 5m
Strategy Returns 5m
Forecast the Cluster 2m
Map the Trading Signal 2m
Additional Reading 10m
Analysing Clusters: Skewness
Could there be an alternative approach to creating a trading strategy with k-means? This section will walk you through your second strategy using skewness. The strategy will aim to capture the tail events in Apple.
Cluster Analysis with Skewness 4m 40s
Why Skewness? 2m
Calculate Skewness 2m
Direction Based on Skewness 2m
Skewed Distributions 2m
Cluster Analysis with Skewness 10m
Calculating Skewness with Python 5m
Plot Returns Distribution 5m
Trade Direction 2m
Additional Reading 10m
Putting It All Together
This section will put together all of your learnings from the previous sections. You will have a workflow that you can follow to develop any trading strategy using k-means for any asset class.
Strategising with K-Means 1m 59s
Sequence of Steps 2m
Putting It All Together 10m
Additional Reading 10m
Curse of Dimensionality
Does adding more data equal more information and ultimately offer more insights? In this section, you will realise that this isn’t entirely true. You will also go about learning different ways to reduce the dataset features effectively.
Impact of Adding Features to Clustering 2m 29s
Definition of Curse of Dimensionality 2m
Relation Between Features and Distance Between Points 2m
Issue With Cluster Equal to Data Points 2m
Overcoming Curse of Dimensionality 2m 5s
Criteria for Eliminating Features 2m
Comparing Features for Elimination 2m
Relation Between Features 2m
Additional Reading 10m
Introduction to Principal Component Analysis
Reducing features while minimising information loss is the ace up the sleeve of principal component analysis (PCA). You will see how PCA reduces dimensions and still keeps most of the information content.
Principal Component Analysis 3m 4s
Reducing Dimensions by Finding Best Line 2m
Maximum Limit of PCA Algorithm 2m
Choosing the Best Line 2m
PCA and Information Loss 2m
Mathematical Explanation of PCA 2m 22s
Matrix Multiplication 2m
Preservation of Maximum Information 2m
Features and Eigenvalues 2m
Variance and Covariance 10m
Interpretation of Variance 2m
Calculate the Covariance Matrix 5m
Variance of ADX 2m
Covariance of ADX and EMA 2m
High Covariance 2m
Additional Reading 10m
Maths Behind Principal Component Analysis
In this section, you will get under the hood of the principal component analysis and see how it uses eigenvalues and eigenvectors to reduce the dimensions effectively. You will also work on a data sample and see it in action.
Basics of Matrices and Eigenvalues 3m 14s
Definition of Variance Matrix 2m
Properties of an Identity Matrix 2m
Lambda and Eigenvectors 2m
Determinant of a Matrix 2m
Significance of Eigenvalues 2m
Dimensions and Eigenvalues 2m
Eigenvectors 2m 53s
Maximum Variance and Eigenvalues 2m
Selection of Eigenvalues 2m
Eigenvalues and Explained Variance 2m
Selecting the Correct Eigenvalue 2m
Transforming Dimensions Using Eigenvectors 2m
Maths Behind PCA 10m
First Principal Component 2m
Calculate the Eigenvectors 5m
PCA Example 10m
Project the Data-points in 1 Dimension 5m
Calculate the Principal Components 5m
Additional Reading 10m
Principal Component Analysis
Principal components help us in dimensionality reduction. You will work on a dataset and figure out how you can choose the number of principal components, depending on the threshold of the information contained in the dataset you would like to keep.
Choosing Number of Principal Components 4m 17s
Percentage of Principal Components 2m
Optimum Number of Principal Components 2m
Choosing the Number of Features in PCA 10m
Explain 80% of the Variance 5m
Explained Variance Vs Number of Features 2m
Additional Reading 10m
Application of Unsupervised Learning for Pairs Trading
In simple terms, pairs trading implies that you select a pair of stocks for trading, where you go long on one and short the other. This is a market neutral trading strategy. In this section, you will see how unsupervised learning can help you identify pairs from a large number of stocks dataset.
Application of Unsupervised Learning for Pairs Trading 10m
Selection of Pairs From Cluster 2m
Steps to Find Pairs 2m
Definition of Pairs Trading 2m
Test of Stationarity 2m
Pairs Using Unsupervised Learning 2m
Feature Selection for Pairs 2m
Feature Engineering for Pairs Trading 10m
Good Features 2m
Need to Apply PCA 2m
Need to Standardise the Data 2m
Standardise the Data 5m
Creating Clusters 10m
DBSCAN
DBSCAN is a density-based clustering technique that deals with the noise in the dataset. You will create and visualise clusters on a toy dataset using the DBSCAN algorithm in this section. You will also see how this technique is different from k-means clustering.
Intuition of Density-Based Clustering 3m 54s
Parameters of DBSCAN 2m
Identify the Core Point 2m
Identify the Noise 2m
Number of Boundary Points 2m
Number of Points in Neighbourhood 2m
Clusters using DBSCAN 2m
Limitations of DBSCAN 2m
K-Means Vs DBSCAN 10m
Create Clusters Using DBSCAN 5m
Selecting the Parameters for DBSCAN 10m
Additional Reading 10m
Pairs Trading using Clustering Algorithms
In this section, you will continue to work on creating pairs of stocks for pairs trading strategy. You will apply DBSCAN on the reduced data to create clusters. From the clusters, you will select pairs of stock and test for cointegration.
Create Pairs using DBSCAN 10m
Reduce Data for Visualisation Using TSNE 5m
The Grey Points 2m
Select and View Stocks in a Cluster 5m
Number of Pairs 2m
Form Pairs from a Cluster 5m
Cointegration Test 10m
Create the Portfolio 5m
Test for Cointegration 5m
ADF Test 2m
Additional Reading 10m
Run Codes Locally on Your Machine
Learn to install the Python environment in your local machine.
Python Installation Overview 2m 18s
Flow Diagram 10m
Install Anaconda on Windows 10m
Install Anaconda on Mac 10m
Know your Current Environment 2m
Troubleshooting Anaconda Installation Problems 10m
Creating a Python Environment 10m
Changing Environments 2m
Quantra Environment 2m
Troubleshooting Tips For Setting Up Environment 10m
How to Run Files in Downloadable Section? 10m
Troubleshooting For Running Files in Downloadable Section 10m
Capstone Project
In this section, you will undertake a capstone project where you apply the k-means algorithm on the stock of your choice. This project helps you to practice and apply the concepts learnt in this course.
Capstone Project: Getting Started 10m
Problem Statement 10m
Frequently Asked Questions 10m
Code Template and Data Files 2m
Capstone Project Model Solution 10m
Capstone Solution Downloadable 2m
Automate Trading Strategy Using IBridgePy
The section will provide you with a ready-made template that can be used for live trading after tweaking the parameters to your discretion.
Additional Reading 10m
Sample Strategy to Run on Interactive Brokers 2m
Course Summary
In this section, you will go through the different concepts you learnt throughout the course. You will also be able to download all the strategy notebooks as a zip file. You can use these notebooks and modify their contents to create your own unique strategy.
Course Summary 3m 50s
Python Data and Codes 2m
ABOUT AUTHOR
QuantInsti®
QuantInsti is the world’s leading algorithmic and quantitative trading research & training institute with registered users in 190+ countries and territories. An initiative by founders of iRage, one of India’s top HFT firms, QuantInsti has been helping its users grow in this domain through its learning & financial applications based ecosystem for 10+ years.
WHY QUANTRA®?
- Gain more in less time
- Get taught by practitioners
- Learn at your own pace
- Get data & strategy models to practice on your own
Please see the full list of alternative group-buy courses available here: https://lunacourse.com/shop/