Show HN: Port of OpenAI's Whisper model in C/C++
14 by ggerganov | 2 comments on Hacker News.
Hi HN, OpenAI recently released a model for automatic speech recognition called Whisper [0]. I decided to reimplement the inference of the model from scratch using C/C++. To achieve this I implemented a minimalistic tensor library in C and ported the high-level architecture of the model in C++. The entire code is less than 8000 lines of code and is contained in just 2 source files without any third-party dependencies. The Github project is here: https://ift.tt/T8ficS4 With this implementation I can very easily build and run the model - “make base.en” . It also allows me to run it on a wide range of devices. For example, I have provided examples of running the model on an iPhone, Raspberry Pi 4 and even in a web page via WebAssembly! The implementation runs fully on the CPU and utilizes FP16, AVX intrinsics on x86 architectures and NEON + Accelerate framework on Apple Silicon. The latter is especially efficient and I observe that the inference is about 2-3 times faster compared to the current PyTorch implementation provided by OpenAI when running it on my MacBook M1 Pro. The WASM port utilizes SIMD 128-bit intrinsics - a feature supported in some modern web browsers [1]. I am very happy with the performance that I observe on Apple Silicon devices. I didn’t expect that the Accelerate framework [2] (i.e. CBLAS) offers such a dramatic performance boost for matrix multiplications so I was very pleasantly surprised! To enable the framework in your C/C++ projects, all you have to do is add `-framework Accelerate` to your clang command-line flags. This entire exercise of implementing the Whisper model was very interesting to me and helped me understand a lot about how the transformer architecture works. I also got a lot of positive feedback from people finding and using my project. We brainstormed on a lot of interesting tools that can potentially be created with this library (such as speech-to-text plugin for Vim, RPi4 voice assistant, WASM chat bot, etc). If interested, checkout the “Examples” section and the “Show and tell” discussions for some ideas! Would love to know what you think about this project and about your experience with using the Accelerate framework in any of your projects. Cheers! [0] https://ift.tt/Sfgah8o [1] https://ift.tt/ZkEn3tO [2] https://ift.tt/hVOSrkB
No comments:
Post a Comment