Using the code presented in Section 8.6:
(a) Generate a dataset (X, y) of 1E6 observations, where 5 features are informative, 5 are redundant and 10 are noise.
(b) Split (X, y) into 10 datasets each of 1E5 observations.
(c) Compute the parallelized feature importance (Section 8.5), on each of the 10 datasets,
(d) Compute the stacked feature importance on the combined dataset (X, y).
(e) What causes the discrepancy between the two? Which one is more reliable?