End-to-end deep learning (E2EDL) is the only technology to create a best in class speech-to-text (STT) solution. This approach is hugely flexible and easier to optimize than traditional STT. You do not need to reconnect and optimize multiple models (acoustic, pronunciation, and language) every time you want to make a change. And the speech model could be retrained and enhanced without starting from scratch. Then using transfer learning, new speech or language models can be developed faster than ever before. Customized use case speech models can be created in weeks instead of months.