This project explores the generation of synthetic Moroccan darija dataset with billions of tokens to replicate Cluade 3
# Clone the repository
git clone https://github.com/Alabouchsalaheddine/moroccan_darija_dataset_generator.git
# create a conda environment
conda create --name mddg_env python=3.10
# activate the created conda environment
conda mddg_env activate
# try source mddg_env activate if conda doesn't work
# Install requirements
pip install -r requirements.txt