This is the official code repository for the paper titled "Reweighting Improves Conditional Risk Bounds" (link to paper), accepted in Transaction on Machine Learning Research (TMLR), 2024.
Authors: Yikai Zhang, Jiahe Lin, Fengpei Li, Songzhu Zheng, Anant Raj, Anderson Schneider, Yuriy Nevmyvaka.
In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general "balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.
The following installs all the dependencies
pip install -e .This section provides instructions on how to setup/run the experiments reported in Section 5 of the manuscript.
- Data generation:
- classification setting:
./bin/prep-clsf-data --ds-str=ds_clsf --view-dataset - regression setting:
./bin/prep-regr-data --ds-str=ds_regr --view-dataset
- classification setting:
- Run experiments on a specific synthetic dataset using neural network:
./bin/train-sim --ds-str=ds_regr --cuda=0 --n-replica=1 --train-size=20000
@article{zhang2024reweighting,
title={Reweighting Improves Conditional Risk Bounds},
author={Zhang, Yikai and Lin, Jiahe and Li, Fengpei and Zheng, Songzhu and Schneider, Anderson and Nevmyvaka, Yuriy and Raj, Anat},
journal={Transactions on Machine Learning Research},
year={2024},
url={https://openreview.net/forum?id=MvYddudHuE},
}
All source files in this repository, unless explicitly mentioned otherwise, are released under the Apache 2.0 license, the text of which can be found in the LICENSE file.
Authors: yikai.zhang@morganstanley.com; jiahe.lin@morganstanley.com
Morgan Stanley Machine Learning Research: msml-qa@morganstanley.com