A Data Centric MLOps suite for
Named Entity Recognition

Data-Centric dashboard

Advanced Workbench

In-built data versioning

Train, Test, Compare, Repeat

Auto labeling suggestions

Support for multiple data formats

How CleanML helps?

Project Managers ›

Create multiple projects and track their progress independently

Easily experiment with a new model/algorithm by training it in CleanML and comparing its performance with the other models in your project.

Train multiple algorithms and compare them
Track progress of data annotation
Train custom word-embeddings for domain-specific applications
Analyze over-fitting and under-fitting per entity based on training output
Include the data from production and analyze the accuracy of the model in production
Compare the model in production with a model freshly developed and trained

Project Managers ›

Create multiple projects and track their progress independently

Easily experiment with a new model/algorithm by training it in CleanML and comparing its performance with the other models in your project.

Train multiple algorithms and compare them
Track progress of data annotation
Train custom word-embeddings for domain-specific applications
Analyze over-fitting and under-fitting per entity based on training output
Include the data from production and analyze the accuracy of the model in production
Compare the model in production with a model freshly developed and trained

Data Scientists

Gain insights about training & test data, distribution of annotated entities, and decide how to curate more data for better accuracy

Data Scientists ›

Gain insights about training & test data, distribution of annotated entities, and decide how to curate more data for better accuracy

Analyze your data annotations, identify missing, incorrect & multi-classifications, and improve your dataset quality

Update training and test data based on training and experiment with selective records for training
Edit data content, upload custom data and export annotated data in multiple data formats
Improve your dataset with insights from training with record and entity level accuracy drill-down for individual algorithms trained. Modify & curate your data accordingly

Annotators

Speed up and improves the annotation process with CleanML's helpful features, all from a single window

Annotators ›

Speed up and improves the annotation process with CleanML's helpful features, all from a single window

Features for annotators include

Annotation suggestions based on previous annotations
Show previous classifications of a word across different records
Suggest annotations based on training runs, even if the training is done on partially annotated data
Ability to configure 3rd party custom dictionaries (e.g. a medical terminology dictionary) to help with similar words

Developers

Experiment with multiple algorithms using different libraries irrespective of them being on GPU, CPU, on-prem or cloud

Developers ›

Experiment with multiple algorithms using different libraries irrespective of them being on GPU, CPU, on-prem or cloud

Algorithms are easy to scale and replicate via configuration files and they are run on Docker containers

Connect your model's git repo and train it. When done with the development branch, it makes experimenting easy in the dev cycle
Train a custom word-embedding for the domain specific data
Cache outputs of training and word-embeddings resulting in both time and cost savings
Train as per your convenience - locally, on-prem, CPU or GPU or on a remote Docker system
Write code independent of data format, CleanML will convert the training data to the supported format specified before the data is sent for training
Compare two different training runs for code as well as data changes, resulting in faster diagnosis of drop in accuracy

Easily experiment with a new model/algorithm by training it in CleanML and comparing its performance with the other models in your project.

Train multiple algorithms and compare them
Track progress of data annotation
Train custom word-embeddings for domain-specific applications
Analyze over-fitting and under-fitting per entity based on training output
Include the data from production and analyze the accuracy of the model in production
Compare the model in production with a model freshly developed and trained

Features

Data-Centric dashboard

Identify and fix data & data-classification issues, and perform drill-down analytics on the dataset. Gain insights about data classified across multiple categories/classes, missed classifications and anomalies in classifications.

Advanced Workbench

Workbench provides useful features including annotating text, entity renaming across records, editing content in-place, tag suggestions, auto-labeling suggestions, previous classifications and an ability to add a custom dictionary.

In-built data versioning

CleanML does data versioning by default. This helps with training reproducibility. CleanML also provides capability to compare a model training with a future version of the same model, with a model that uses a different algorithm and even with a model deployed in production.

Train, Test, Compare, Repeat

Train and compare models of different algorithms with the same dataset. CleanML versions all the training and helps compare between versions of training and data. The ability to perform comparison of both models and data at a record level significantly increases your productivity.

Auto labeling suggestions

Get labeling suggestions based on the trained algorithms which can assist the annotators and speed up new data annotations.

Support for multiple data formats

Import data in CoNLL-2003, IOB (IOB1/2, BILOU, IOBES), JSONL, and txt. Import data from UI, API, command-line and Singer Taps. Also export annotated data to multiple data formats via command-line.