Random Forest

태그

모델링

출처

Battle of the Ensemble - Random Forest vs Gradient Boosting

If you have spent some time in the world of machine learning, you would have undoubtedly heard of a concept called the bias-variance tradeoff. It is one of the most important concepts any machine learning practitioner should learn and be aware of.

https://towardsdatascience.com/battle-of-the-ensemble-random-forest-vs-gradient-boosting-6fbfed14cb7

Bagging 알고리즘을 Decision Tree에 그대로 적용한 모델이다. 생성한 Bootstrap sample로 여러 개의 Tree를 학습시킨 뒤에 예측값을 취합한다. 각 Tree에서 모든 변수를 고려해 분할할 경우 대부분의 Tree가 특정 변수만을 사용하는 과적합이 발생할 수 있기 때문에 개별 Tree는 일부 변수를 랜덤하게 뽑아(Column Subsampling) 사용한다. 모든 변수를 사용하지 않음으로써 결과적으로는 모든 변수를 고려하는 셈이다.

Boosting 대비 학습 속도가 빠름(예측 속도는 더 느릴 수 있음)

Boosting 대비 과적합 위험이 적음

hyperparameter가 적고 tuning이 없이도 성능이 괜찮음

문제가 복잡하다면 Boosting 대비 성능이 낮음

cf.) Bagging은 개별 모델을 병렬 학습하기 때문에 서버가 여러 대인 경우(=병렬 학습을 지원하는 환경)에 최적이다.