Title : ( Preserving data distribution in sampling and instance selection with Renyi’s divergence )
Authors: Hadi Sadoghi Yazdi , Soheila Ashkezari-Toussi , Abolfazl Ramezanzadeh Yazdi ,Access to full-text not allowed by authors
Abstract
This paper introduces a novel method for sampling based on a distribution function. By utilizing Renyi’s divergence criterion, a recursive formulation is derived directly from the difference between the original and estimated distributions. This recursive equation is solved using the gradient descent algorithm, from which new samples are then generated. Because this method relies on the original data’s distribution, it effectively preserves the data distribution in the sampled dataset. When the original distribution is unknown, kernel density estimation is used for approximation. Experimental results show that the proposed method successfully maintains the data distribution and concept integrity, while also preserving the model’s predictability on selected instances. Specifically, this method has managed to reduce the dataset size by approximately 70%, without compromising the accuracy of the learning algorithm.
Keywords
, Instance selection · Sampling methods · Renyi’s divergence · Kernel density estimation · Synthetic datasets · Real, world datasets@article{paperid:1101330,
author = {Sadoghi Yazdi, Hadi and Ashkezari-Toussi, Soheila and Ramezanzadeh Yazdi, Abolfazl},
title = {Preserving data distribution in sampling and instance selection with Renyi’s divergence},
journal = {Knowledge and Information Systems},
year = {2024},
volume = {67},
number = {1},
month = {December},
issn = {0219-1377},
pages = {549--578},
numpages = {29},
keywords = {Instance selection · Sampling methods · Renyi’s divergence · Kernel density
estimation · Synthetic datasets · Real-world datasets},
}
%0 Journal Article
%T Preserving data distribution in sampling and instance selection with Renyi’s divergence
%A Sadoghi Yazdi, Hadi
%A Ashkezari-Toussi, Soheila
%A Ramezanzadeh Yazdi, Abolfazl
%J Knowledge and Information Systems
%@ 0219-1377
%D 2024