NEWS

audio.whisper 0.4.1

Added function predict.whisper_transcription which allows to assign a transcription segment to either a left/right channel based on a Voice Activity Detection

Allow to pass on multiple offset/durations
Allow to give sections in the audio (e.g. detected with a voice activity detector) to filter out these (voiced) data, make the transcription and make sure to add the amount of time which was cut out to the from/to timestamps such that the resulting timepoints in from/to are aligned to the original audio file
The data element of the predict.whisper now includes a column called segment_offset indicating the offset of the provided sections or offsets

Fixes of typos in documentation of functions
Add stereo.wav file
Allow to do diarization for audio with 2 channels by comparing the energy of the signal in each channel for each segment

Documentation of arguments in predict.whisper
Add option to download quantised models
- tiny-q5_1, tiny.en-q5_1
- base-q5_1, base.en-q5_1
- small-q5_1, small.en-q5_1
- medium-q5_0, medium.en-q5_0
- large-v2-q5_0 and large-v3-q5_0
Allow to disable printing the transcription evolution during the prediction with the trace argument
Enable O3 optimisations by default
Allow speedup of transcriptions by compiling with cuBLAS against CUDA on Linux
- specify Sys.setenv(WHISPER_CUBLAS = "1") before installing the package if you have a GPU with CUDA

Makevars
- Added detection of AVX512F for adding compilation flags to PKG_CFLAGS/PKG_CPPFLAGS
- Enable Metal for speeding up transcriptions on the GPU on Mac
- Enable compiling with OpenBLAS to speed up the transcriptions
Add whisper_languages to get a data.frame of all languages the Whisper model can handle
whisper_download_model
- change default timeout to 10 minutes if no timeout is set by the user + change output element in the list to 'download_success' instead of 'download_failed'
- model_dir now defaults to the directory set in the environment variable WHISPER_MODEL_DIR and if this is not set, the current working directory
whisper
- Add option use_gpu to be able to run the prediction on a GPU (e.g. Metal)
predict.whisper
- Add option to pass on initial prompt
- Output of predict.whisper adds the audio duration of the wav file in seconds in the params list element
- Gains an extra argument indicating to transcribe or translate

Upgrade to whisper.cpp version v1.5.4
whisper_download_model allows to download 'large-v1', 'large-v2', 'large-v3' while model 'large' should no longer be used

Add option to pass on float entropy_thold (similar to compression_ratio_threshold), logprob_thold, beam_size, best_of, split_on_word, max_context when doing the prediction
Output of the predict.whisper function now includes an element called timing indicating how long it took to do the transcription
whisper gains 2 arguments: the model_dir/overwrite which is passed directly to whisper_download_model
whisper_download_model
- gains an argument version which defaults to models for whisper.cpp version 1.2.1
- gets the models now from https://huggingface.co/ggerganov/whisper.cpp/resolve/80da2d8bfee42b0e836fc3a9890373e5defc00a6 instead of https://huggingface.co/ggerganov/whisper.cpp/resolve/main

whisper_download_model now Deprecates downloading from https://ggml.ggerganov.com and changed the URL's to download models from huggingface (Issue #18)

Add option to compile with own PKG_CFLAGS by setting environment variable WHISPER_CFLAGS
Add option to compile with extra PKG_CPPFLAGS by setting environment variable WHISPER_CPPFLAGS

Ongoing work on improving compilation instructions to speed up transcribing while still being CRAN compliant
Add whisper_benchmark

Incorporate whisper.cpp release 1.0.4: https://github.com/ggerganov/whisper.cpp/releases/tag/1.0.4 and up to commit 99da1e5cc853f7cdd61d2f259c8d770ea9279d29
predict.whisper now uses 'auto' as default language
predict.whisper now sets resulting text with UTF-8 encoding

Incorporate https://github.com/ggerganov/whisper.cpp/pull/257 (Remove C++20 requirement)

Initial version based on
- https://github.com/ggerganov/whisper.cpp commit 85c9ac18b59125b988cda40f40d8687e1ba88a7a
- https://github.com/mackron/dr_libs commit dd762b861ecadf5ddd5fb03e9ca1db6707b54fbb
Added whisper
- whisper_download_model, whisper and predict.whisper