Data Preparation¶

This section describes the steps taken to prepare the data for the Store Sales - Deep Learning Solution project.

1. Raw Data Collection¶

Download the competition data from Kaggle Store Sales - Time Series Forecasting.
Place all raw CSV files in the data/raw/ directory.

2. Data Processing Pipeline¶

The data processing pipeline is responsible for: - Loading raw data files (sales, stores, oil, holidays, etc.). - Merging datasets to create a unified view. - Handling missing values and correcting data types. - Saving interim datasets to data/interim/ for further processing.

You can run the data processing pipeline with:

python store-sales-DL/dataset.py

or, using the Makefile:

make dataset

3. Output¶

The processed interim datasets are saved in the data/interim/ directory.
These datasets are used as input for the feature engineering step.

4. Notes¶

Ensure all raw data files are present and named correctly before running the pipeline.
Review the logs/output for any warnings or errors during processing.

For more details on feature engineering, see the Feature Engineering section.