Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
ETF Screening Process and Key Points Overview
Retrieve ETF list: Use get_all_securities([‘etf’]) to get all market ETFs, then filter for those established before January 1, 2013 (start_date < 2023-01-01) to ensure sufficient historical data.
Exclude low-liquidity ETFs: Manually remove specific ETFs with very low average trading volume (e.g., 159003.XSHE China Merchants Fast Track ETF, 159005.XSHE Harvest Fund Quick Money ETF, etc., average volume ≤ 2.92k).
Data Range: Obtain closing prices for the most recent 240 trading days up to today.
Return Processing: Calculate daily returns (pchg = close.pct_change()), forming an ETF return matrix (prices, rows=trading days, columns=ETF codes).
Clustering Goal: Group ETFs with similar trends to reduce duplicates.
Parameters: Set number of clusters n_clusters=30 (to avoid too few clusters that may merge dissimilar ETFs), use KMeans algorithm with random_state=42.
Within-Cluster Selection: Keep only the earliest established ETF in each cluster, because:
Calculate silhouette score: approximately 0.45 (moderate level, indicating decent compactness and separation, but room for improvement).
Correlation matrix: Compute correlation of ETF returns (corr = prices[df.code].corr()).
Handling highly correlated pairs: For pairs with correlation > 0.85, keep only the ETF established earlier, remove the other (e.g., remove 159922.XSHE, 512100.XSHG, etc.).
Set threshold: Remove ETFs established after 2020 (e.g., 513060.XSHG Hang Seng Healthcare, 515790.XSHG Photovoltaic ETF), to ensure remaining ETFs have richer historical data (useful for model training).
Special handling for government bond ETFs: If used for modeling, exclude 511010.XSHE government bond ETF—its trend is nearly linear (similar to Yu’ebao), with minimal volatility, which can interfere with the model’s learning of volatility features and offers no predictive value.
Handling declining ETFs: The results may include long-term declining ETFs (e.g., healthcare ETF, real estate ETF). Whether to exclude depends on strategy goals:
Visualization validation: Plot remaining ETFs’ price charts (e.g., since 2017) to manually verify if correlations and distributions meet expectations (low correlation, reasonable spread).
Final filtering logic summary:
Through “initial filtering → clustering deduplication → secondary correlation filtering → (optional) establishment time filtering,” obtain a pool of ETFs with good liquidity, low trend correlation, and ample historical data. The core goal is to provide diverse, high-quality underlying assets for strategies or models.