[Bug]: Training and Test Set Overlap For Classification Tasks with Holdout Strategy

### Describe the bug

Hi @thinkall,

I think that I may have found an issue in FLAML, although it's possible it was a deliberate choice by the developers. Basically if we use the holdout strategy for classification tasks, then we will find that:
`len(input_data) < len(training_data) + len(test_data)`.

This occurs even when I set `auto_augment=False`, so up-sampling of data is not the issue here.

### Steps to reproduce

If I run a classification task against the Iris dataset then my input dataset has 150 rows.
If I then analyse the automl state afterwards, I can see that we have 135 rows in automl._state.X_train, and 18 rows in automl._state.X_val - making 153 rows in total - we have 3 too many rows in total. The code to reproduce this is pasted below:

```
from flaml import AutoML
from sklearn import datasets
import numpy as np

dic_data = datasets.load_iris(as_frame=True)  # numpy arrays
iris_data = dic_data["frame"]  # pandas dataframe data + target
rng = np.random.default_rng(42)
iris_data["cluster"] = rng.integers(
    low=0, high=5, size=iris_data.shape[0]
)
print(iris_data["cluster"])
print('shape at start', iris_data.shape)
automl = AutoML()
automl_settings = {
    "max_iter":5,
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "holdout_test.log",
    "log_type": "all",
    "estimator_list": ['lgbm'],
    "eval_method": "holdout",
    "split_type":"stratified",
    "keep_search_state":True,
    "retrain_full":True,
    "auto_augment":False,
}
x_train = iris_data[["sepal length (cm)","sepal width (cm)", "petal length (cm)","petal width (cm)"]].to_numpy()
y_train = iris_data['target']
automl.fit(x_train, y_train, **automl_settings)
print(len(automl._state.X_train), len(automl._state.X_train_all), len(automl._state.X_val))
print(len(automl._state.y_train), len(automl._state.y_train_all), len(automl._state.y_val))
```

My colleague @drwillcharles has identified the cause of this issue, which is found in the `prepare_data` method on `flaml/automl/task/generic_task.py`:
```
                X_train, X_val, y_train, y_val = self._train_test_split(
                    state, X_rest, y_rest, first, rest, split_ratio, stratify
                )
                X_train = concat(X_first, X_train)
                y_train = concat(label_set, y_train) if data_is_df else np.concatenate([label_set, y_train])
                X_val = concat(X_first, X_val)
                y_val = concat(label_set, y_val) if data_is_df else np.concatenate([label_set, y_val])
```

Here the first row containing each class has been extracted from the original training dataset (within `X_first`), and then once `_train_test_split` has been used to split the data, these rows are then added back to both the training and test datasets. 

I'm not sure if this was an error, or a deliberate choice by the original developers.
I think that the advantage of this code would be that you guarantee that the training and testing datasets both contain at least one instance of every class. The disadvantage of this is that you have an overlap of training and test data, which will bias the models.

**Possible Solution**

Could I please ask your thoughts on this? Personally if you were going to keep this code, I'd rather you only applied it when it was required, and not in every case.

Perhaps we could just `_train_test_split` on the entire dataset (including `X_first`) and then only duplicate `X_first` if it's necessary (i.e. either the training or test set doesn't contain one/any of the classes)? 
 
Please let me know if I'm misunderstanding anything - I welcome your thoughts.
Thanks!


### Model Used

Random Forest in this example but this error has been present for all models tested.

### Expected Behavior

_No response_

### Screenshots and logs

_No response_

### Additional Information

Python 3.10.3
FLAML 2.3.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Training and Test Set Overlap For Classification Tasks with Holdout Strategy #1390

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Training and Test Set Overlap For Classification Tasks with Holdout Strategy #1390

Description

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions