Introduction
CleanML is an MLOps tool for data centric AI. CleanML has been built to help a ML team to manage the lifecycle of their Named Entity Recognition projects.
CleanML's features include:
- Curate and annotate data
- Edit data and add custom data
- Present insights about the annotated data
- Export annotated data in multiple data formats
- Show annotated entities distribution on both training data and the data used for evaluating trained algorithms (evaluation data or test data)
- Display suggestions and previous classifications
- Auto-label data with the best trained algorithm on that data
- Configure a custom api/web based dictionary
- Experiment with multiple algorithms with multiple libraries
- Connect an algorithm directly with the code's git repository
- Train off from a custom branch making experimentation easy and integratable into the development cycle
- Train on a remote docker container and on either the CPU or the GPU*
- Write algorithms independent of data format. CleanML will convert to a supported data format before the data is sent for training
- Track data changes and code changes between experiments
- Compare two different trainings of the same algorithm, CleanML tracks data changes which increases debug ability
- Track the progress of data annotation
- Track data overfitting or underfitting of specific entities based on reports from periodic training and data classification
- Compare model deployed in production with a freshly trained model
- Tag and upload data hit in production and once annotated analyze the accuracy of the model deployed in production
note
Anything unclear or buggy in this doc, or you couldn't find what you were looking for? Please file an issue on GitHub!