nikoel Posted January 3 Posted January 3 (edited) OpenAI Whisper for VoiceAttack - WhisperAttack This repository provides a single-server approach for using OpenAI Whisper locally with VoiceAttack, replacing Windows Speech Recognition with a fully offline, GPU-accelerated recognition engine via Nvidia Cuda. (For those of you without CUDA, this is still possible with the use of CPU compute. The code automatically uses the CPU as fallback and will work on any processor) This is a fork for further integration of KneeboardWhisper by the amazing creator @bojote A special thank you goes to hradec, whose original script used Google Voice Recognition, and @SeaTechNerd83 for helping combine the two approaches finally @sleighzy for VAICOM implementation and the lengthy list of bug fixes and enhancements that would fill this page In short, @SeaTechNerd83 and I combined the two scripts to run voice commands through Whisper using Bojote's code and then pushed it into VoiceAttack using hradec's code. To speed this up, I unified the codebase into one file and made it run a server to send commands to VoiceAttack. Current average compute for me is between 0.3-0.2s, which can be sped up even more with the use of smaller models (like base.en or tiny.en) by modifying one string (and it can actually understand you, in two days of testing I yet to get it to mis-transcribe anything that I have said) Features: Pushes transcribed text to clipboard - (perfect for voice to text DCS Chat...) Bojote's original vision preserved - DCS Kneeboard integration out of the gate About x10 faster than the original script (from ~2.5s to 0.25ms on a 4090) - and up to 2000% if smaller AI models are utilised - also included with the release (depending on compute available) Has aviation dictionary and bias out of the gate Has phonic alphabet integration and bias Instructions and release download can be found here: https://github.com/nikoelt/WhisperAttack TLDR No more saying Gear Down, and Windows Speech interpreting it as 'Eject, Eject, Eject' VAICOM Support is currently being tested and will be coming shortly! VAICOM is now working. A massive thank you to @sleighzy Additionally, a fixed profile by Bailey for the apache also by Sleighzy is now fully incorporated and working with WhisperAttack https://github.com/nikoelt/WhisperAttack/blob/add-vaicom-integration-instructions/VAICOM PRO/VAICOM_INTEGRATION.md If you want to test, contribute, troubleshoot etc and see latest happenings the development was done in the VR4DCS Discord - https://discord.com/channels/610534461456777257/809527129422430218 Edited January 12 by nikoel 3 4
nikoel Posted January 10 Author Posted January 10 (edited) New version will soon feature DCS Airport recognition and correction with the use of FuzzyWuzzy, Direct text replacement and Ai Beaming/Prompting Here is a small preview Edited January 10 by nikoel 1 3
buur Posted January 10 Posted January 10 Do you plan this feature also for the Callsigns? Whispers has some problems to understand me or my accent
speed-of-heat Posted January 10 Posted January 10 On 1/3/2025 at 11:03 AM, nikoel said: VAICOM Support is currently being tested and will be coming shortly! looking forward to it! 1 SYSTEM SPECS: Hardware AMD 9800X3D, 64Gb RAM, 4090 FE, Virpil T50CM3 Throttle, WinWIng Orion 2 & F-16EX + MFG Crosswinds V2, Varjo Aero SOFTWARE: Microsoft Windows 11, VoiceAttack & VAICOM PRO YOUTUBE CHANNEL: @speed-of-heat
markturner1960 Posted January 10 Posted January 10 (edited) You coding guys blow me away with your skills and ability to make amazing things happen for us….thanks! Had a Quick Look 0n the GitHub page……looks a little “advanced” for normal users like myself……..or am I being put off by the complicated install? Edited January 10 by markturner1960 1 System specs: PC1 :Scan 3XS Ryzen 5900X, 64GB Corsair veng DDR4 3600, EVGA GTX 3090 Win 10, Quest Pro, Samsung Odyssey G9 Neo monitor.
nikoel Posted January 10 Author Posted January 10 (edited) 6 hours ago, buur said: Do you plan this feature also for the Callsigns? Whispers has some problems to understand me or my accent Can you please expand on that. I have so far used Ford and Uzi with out issues But for your information the next release will have two types of direct entries to steer voice recognition into the right direction. The first is something called FuzzyWuzzy, which uses an algorithm and a 0-100 threshold. Which is just a fancy way of saying of how sensitive or broad you want the algorithm to be when it changes what you say into what you think you meant. This is it here in action: Gudalta Tower becomes Gudauta Tower. It ain Pokemon, so it won't catch them all. But it should give you a few commands that otherwise would have been misunderstood with a "typo" or two. Set it too tight and it won't correct much, set it ultra loose and it will start being a hammer and seeing nails. Currently set around 70-90 (out of 100) Here is the code and the output Then it also uses direct replacements. This is 100% foolproof catchment, good if Whisper consistently misunderstands you. For instance when I say 'Enter' it interprets it as "Inter' so instead of trying to change this, I simply put it into this list. So now I have a direct replacement that looks like this. If there is a word that is constantly intruding itself you can simply replace it with the one you want Edited January 10 by nikoel 1
nikoel Posted January 10 Author Posted January 10 (edited) 1 hour ago, markturner1960 said: You coding guys blow me away with your skills and ability to make amazing things happen for us….thanks! Had a Quick Look 0n the GitHub page……looks a little “advanced” for normal users like myself……..or am I being put off by the complicated install? It might look like this because of terminal, but it isn't, there are only two copy and paste commands to do there. The readme on the GitHub looks advanced because I spelled out every step. But it's actually simple when you get your head around it. All we are doing is installing Python and ffmpeg as Path (you will see checkbox to install them as path as part of the installer so make sure to tick it) Then in the same terminal the you installed ffmpeg you enter the two lines of code and it will install dependencies Then you download from the releases page here the version you want from here: https://github.com/nikoelt/WhisperAttack/releases/tag/v0.2.2-beta. With a 3090 VRAM is not an issue so small.en is a good version to start with That's it. It's working. The other steps are to make voice attack start and stop voice recognition. Which is again a simple task. When you press a button it sends a command 'Start' and when you release it sends a command 'Stop' to let whisper know when to start/stop transcribing. If you take your time you will have it done in 15min Edited January 11 by nikoel 1
skypickle Posted January 11 Posted January 11 Why does this need to run as a server in the background? The kneeboardWhisper project seems not to require this overhead. 4930K @ 4.5, 32g ram, TitanPascal
nikoel Posted January 11 Author Posted January 11 (edited) 8 minutes ago, skypickle said: Why does this need to run as a server in the background? The kneeboardWhisper project seems not to require this overhead. This project builds on what Bojote has developed. It’s not an overhead, as using a server-based approach provides a 10-20x speedup for the script. This is because the AI model doesn’t need to reload every time a command is sent. It minimizes stuttering and hitching for those not using the latest or most powerful hardware and allows us to integrate it with VAICOM and VoiceAttack profiles without the lengthy wait. Edited January 11 by nikoel 1
buur Posted January 11 Posted January 11 vor 13 Stunden schrieb nikoel: Can you please expand on that. I have so far used Ford and Uzi with out issues Did some simple tests with Whisper to see the potential but also where problems could be. Therefore that callsigns are very important, I tested them. Here the results: X-Men, Dark Knight, Warrior, Pointer, A-Ball, Moonbeam, The Plash, Finger, Pinpoint, Ferret, Shaba, Playboy, Hammer, Yegoir, Testar and Will, Firefly, Mantis, This are the JTAC callsigns after the list from Hoggit. You see, some are well recognized, others are catastrophic :-). Also do some first tests with a fuzzy logic to help in this regard but in a very very early stage.
nikoel Posted January 11 Author Posted January 11 (edited) 12 minutes ago, buur said: Did some simple tests with Whisper to see the potential but also where problems could be. Therefore that callsigns are very important, I tested them. Here the results: X-Men, Dark Knight, Warrior, Pointer, A-Ball, Moonbeam, The Plash, Finger, Pinpoint, Ferret, Shaba, Playboy, Hammer, Yegoir, Testar and Will, Firefly, Mantis, This are the JTAC callsigns after the list from Hoggit. You see, some are well recognized, others are catastrophic :-). Also do some first tests with a fuzzy logic to help in this regard but in a very very early stage. Can you give me all the callsigns you want included please (and anything additional that is left of center that needs to be recognised) I am testing a new build with these custom callsigns now should be out soon [EDIT] Disregard, sorry did not see that the Hoggit was an actual link. Got them now Edited January 11 by nikoel
nikoel Posted January 11 Author Posted January 11 (edited) 20 hours ago, buur said: Did some simple tests with Whisper to see the potential but also where problems could be. Therefore that callsigns are very important, I tested them. Here the results: X-Men, Dark Knight, Warrior, Pointer, A-Ball, Moonbeam, The Plash, Finger, Pinpoint, Ferret, Shaba, Playboy, Hammer, Yegoir, Testar and Will, Firefly, Mantis, This are the JTAC callsigns after the list from Hoggit. You see, some are well recognized, others are catastrophic :-). Also do some first tests with a fuzzy logic to help in this regard but in a very very early stage. Can you please test this pre-release. I will delete when it goes live so we don't have double ups Adjust how trigger happy you want FuzzyWuzzy to be via these two. It's pre-rlease I have no idea if 70/80 is a good number but need more data. The lower the number the more FuzzyWuzzy will correct. If you set it too loose it will start changing things it really should not, too tight it won't do its job. Don't get carried away as these will be in an external file which will have these for friendly editing to the end user Now for things that Whisper gets *Consistently* wrong. For instance if Axemen always is X-Men, then I have included text replacement feature. Use this instead of trying to somehow change the pronunciation or AI prompting. See below for examples. Edited January 12 by nikoel Deleted Pre-Release
buur Posted January 11 Posted January 11 ok, going through the different Word list with very good results. Found problems with Best/Batumi Uzi/Kutaisi Watziani/Vatziani UC/Uzi Code/Colt Senaki/Sioux Uniform/Bear Also there is a problem while replacing two words. Army Air for example is doubled. Here the complete list dcs_airports Anapa Best Beslan Galencik Gudauta Kobuleti Krasnodar Krymsk Uzi Maykop Mineralnye Vody Mozdok Nalchik Novorossiysk Kolkhi Senaki Sochi Sukhumi Tbilisi Watsiani # Generic Enfield Springfield UC Code Dodge Ford Chevy Pontiac # AH-64 callsigns Army Air Army Air Apache Crow Senaki Gatling Gunslinger Hammerhead Bootleg Palehorse Carnivor Saber # A-10 callsigns Hawg Uniform Pig Tusk # F-16 callsigns Viper Venom Lobo Cowboy Python Rattler Panther Wolf Weasel Wild Ninja Jedi # F-18 callsigns Hornet Squid Ragin Roman Sting Jury Joker Ram Hawk Devil Check Snake # F-15E callsigns Dude Squid Gunny Trek Sniper Sled Best Jazz Rage Tahoe # B-1B callsigns Bone Dark Vader # B-52 callsigns Buff Dump Kenworth # Transport (C-47, C-130, C-17) callsigns Heavy Trash Cargo Ascot # AWACS callsigns Overlord Magic Wizard Focus Darkstar # Tanker callsigns Texaco Arco Shell # JTAC callsigns Axeman Darknight Warrior Pointer Eyeball Moonbeam Whiplash Finger Pinpoint Ferret Shaba Playboy Hammer Jaguar Deathstar Anvil Firefly Mantis Badger phonetic_alphabet Alpha Bravo Charlie Delta Echo Foxtrot Golf Hotel India Juliet Kilo Lima Mike November Oscar Papa Quebec Romeo Sierra Tango Uniform Victor Wild Xray Yankee Zulu
MAXsenna Posted January 11 Posted January 11 Are there any ways to train the Speech Recognition engine, like Microsoft's?
nikoel Posted January 12 Author Posted January 12 (edited) 9 hours ago, MAXsenna said: Are there any ways to train the Speech Recognition engine, like Microsoft's? Since you asked - if you guys really want to help to finetune this model specifically for DCS it involves work. Simple but boring work. Basically, we need a sound file (aka a guy putting in lots of DCS commands, talking to ATC and wingmen, VIACOM etc...) and a transcript for that audio - this is usually a VTT file. I then need your permission to train a new finetuned whisper model on these files Basically it will be a file of you talking a bunch of words and frequently used phrases. Like: 'Ragnarok 1-1 ready pre-contact' 'Sochi Tower, this is Enfield 1-1 in company with Enfield 1-1, inbound' etc... Then we will take this data and push into the whisper validation model and let it transcribe it into a VTT file. Then we will open that VTT file in a word editor and make changes where whisper made mistakes. Then we feed that data back into the model 10 hours ago, buur said: ok, going through the different Word list with very good results. Found problems with Best/Batumi Uzi/Kutaisi Watziani/Vatziani UC/Uzi Code/Colt Senaki/Sioux Uniform/Bear Also there is a problem while replacing two words. Army Air for example is doubled. Here the complete list dcs_airports Anapa Best Beslan Galencik Gudauta Kobuleti Krasnodar Krymsk Uzi Maykop Mineralnye Vody Mozdok Nalchik Novorossiysk Kolkhi Senaki Sochi Sukhumi Tbilisi Watsiani # Generic Enfield Springfield UC Code Dodge Ford Chevy Pontiac # AH-64 callsigns Army Air Army Air Apache Crow Senaki Gatling Gunslinger Hammerhead Bootleg Palehorse Carnivor Saber # A-10 callsigns Hawg Uniform Pig Tusk # F-16 callsigns Viper Venom Lobo Cowboy Python Rattler Panther Wolf Weasel Wild Ninja Jedi # F-18 callsigns Hornet Squid Ragin Roman Sting Jury Joker Ram Hawk Devil Check Snake # F-15E callsigns Dude Squid Gunny Trek Sniper Sled Best Jazz Rage Tahoe # B-1B callsigns Bone Dark Vader # B-52 callsigns Buff Dump Kenworth # Transport (C-47, C-130, C-17) callsigns Heavy Trash Cargo Ascot # AWACS callsigns Overlord Magic Wizard Focus Darkstar # Tanker callsigns Texaco Arco Shell # JTAC callsigns Axeman Darknight Warrior Pointer Eyeball Moonbeam Whiplash Finger Pinpoint Ferret Shaba Playboy Hammer Jaguar Deathstar Anvil Firefly Mantis Badger phonetic_alphabet Alpha Bravo Charlie Delta Echo Foxtrot Golf Hotel India Juliet Kilo Lima Mike November Oscar Papa Quebec Romeo Sierra Tango Uniform Victor Wild Xray Yankee Zulu Glad it's better for you, however you're not really using or testing the model correctly. Whisper is different to a simple voice transcriber. By just feeding words into it you're denying it it's biggest advantage against dumb V2T apps like Windows Voice Recognition. The model takes into account what you have spoken, and then uses the meaning of the sentence to retroactively correct words. By saying "Sochi Tower this is Uzi 1-1, inbound" vs just "Uzi" you will see a better transcription because the model will be able to derive what you're trying to say. See below as an example of what is happening behind the scenes. Make sure you're sitting down Edited January 12 by nikoel 1
nikoel Posted January 12 Author Posted January 12 (edited) VAICOM Pro integration and instructions by @sleighzy are now live https://github.com/nikoelt/WhisperAttack/blob/add-vaicom-integration-instructions/VAICOM PRO/VAICOM_INTEGRATION.md New version is now up- WhisperAttack Changelog VAICOM Support Thanks to @sleighzy For integration instructions, see: VAICOM PRO Integration Guide Script will now Auto-Install Missing Dependencies A mechanism has been introduced to automatically check for and install any missing Python packages required by the script, ensuring seamless setup and execution. Just double click and you're good to go Buuut you still need Python and Ffmpeg! Support for External Word Matching and Replacement Added configuration to dynamically load fuzzy matching terms (fuzzy_words.txt) and word mappings (word_mappings.txt) for direct word replacement - thank you @sleighzy for external file handling Enhanced flexibility for text correction and fuzzy matching, allowing dynamic updates without modifying the script itself. Feel free to edit either file with more keywords! We have pre-populated examples Fuzzy Matching Implemented functionality for fuzzy matching of DCS callsigns and phonetic alphabet terms. Integrated RapidFuzz for more robust and accurate text correction using configurable thresholds for both phonetic terms and DCS callsigns. Introduced weighting adjustments to control the level of correction interference: High thresholds: Minimal correction, preserves user input closely. Low thresholds: Aggressive correction, may be "trigger-happy." Configuration Example: dcs_threshold = 85 phonetic_threshold = 80 Improved Clipboard and Kneeboard Handling Revised logic to distinguish text destined for the clipboard versus text forwarded to VoiceAttack. To transcribe speech directly into the DCS kneeboard and clipboard, users now need to say "Copy" followed by the text they want in clipboard/kneeboard This bypasses VoiceAttack entirely for faster processing and will only copy to Clipboard and DCS Kneeboard!!! End users can change this key phrase to whatever they like within the code For standard VoiceAttack commands, the script no longer copies text to the clipboard or DCS kneeboard, improving overall performance. Enhanced AI Initial Prompt Whisper now has an optimized initial prompt for voice recognition, significantly improving its handling of DCS-specific callsigns like Deathstar and Enfield. This ensures more accurate transcription from the start. Bug fixes Coordinates starting with 0 will now populate VoiceAttack commands - thank you @sleighzy Many more code revisions to keep the code tidy Edited January 12 by nikoel 1 1
buur Posted January 12 Posted January 12 vor 10 Stunden schrieb nikoel: Glad it's better for you, however you're not really using or testing the model correctly. Whisper is different to a simple voice transcriber. By just feeding words into it you're denying it it's biggest advantage against dumb V2T apps like Windows Voice Recognition. The model takes into account what you have spoken, and then uses the meaning of the sentence to retroactively correct words. By saying "Sochi Tower this is Uzi 1-1, inbound" vs just "Uzi" you will see a better transcription because the model will be able to derive what you're trying to say. See below as an example of what is happening behind the scenes. Make sure you're sitting down Ok, I will try it today with some example sentences.
buur Posted January 12 Posted January 12 vor 10 Stunden schrieb nikoel: Since you asked - if you guys really want to help to finetune this model specifically for DCS it involves work. Simple but boring work. Basically, we need a sound file (aka a guy putting in lots of DCS commands, talking to ATC and wingmen, VIACOM etc...) and a transcript for that audio - this is usually a VTT file. I then need your permission to train a new finetuned whisper model on these files Basically it will be a file of you talking a bunch of words and frequently used phrases. Like: 'Ragnarok 1-1 ready pre-contact' 'Sochi Tower, this is Enfield 1-1 in company with Enfield 1-1, inbound' etc... Then we will take this data and push into the whisper validation model and let it transcribe it into a VTT file. Then we will open that VTT file in a word editor and make changes where whisper made mistakes. Then we feed that data back into the model Have you an idea how many voice samples are necessary for a good result? How many persons have to deliver how many hours of such samples. For me the idea of a community DCS LLM sounds very nice. 1
nikoel Posted January 12 Author Posted January 12 (edited) 58 minutes ago, buur said: Have you an idea how many voice samples are necessary for a good result? How many persons have to deliver how many hours of such samples. For me the idea of a community DCS LLM sounds very nice. It's difficult to tell as it's a black box. However 10hrs would be a good starting point from what I have read, especially if one were to use LoRA to nudge the model into the right direction Hilariously the way it works via Whisper eating it's own dogfood. We would take the voice file, and then us whisper to transcribe that file. Then we would edit parts it got wrong, and feed it back into the model to create a new one Edited January 12 by nikoel 1
sleighzy Posted January 14 Posted January 14 (edited) For VAICOM for you need to be using VSPX, and hence don't use the sender in the sentence, just the recipient and the command. VoiceAttack does not support wildcards (*) syntax to be used in the profile for key words when using with Whisper. This is not a Whisper issue per se, just that VoiceAttack does not support matching on wildcards when being executed with a passed in command text, which is the way this Whisper integration works and how it passes the transcribed text). The VoiceAttack folk have added this to their feature suggestion list (fingers-crossed for a release). The syntax used in key words by VAICOM's VSPX mode better match what is expected, and I have made updates for the ones that don't (these are available in my provided profile). I have yet to document what updates are needed but you could refer (or directly use may be easier) to my profile for the base set and then just update with any missing recipients. I'll grab a complete set and update with any missing anyway. If there are recipients missing from the "When I say" key word collection in VoiceAttack then this will also not match. Just add the recipients to the list of strings in that VoiceAttack key words collection. Recipients containing any dashes, e.g. I see "X-Men" was mentioned will need to have those changed as I believe those are replaced with space characters out, e.g. this may become "X Men". We do replacements before that so you can add "X-Men=Xmen in the word_mappings.txt file, granted I haven't tested this so not even sure what Whisper is transcribing that one too, will need to test. May need work in that area. I'll look at the VAICOM source code again to see what handling it has around that. Numbers are converted to their numerical value by Whisper so my recipients list has been updated accordingly, e.g. your wingman supports "winger, bozo, two, 2". This requires the VAICOM key words database to have that "2" also added as an alias for that wingman as far as I'm aware (I have the alias but need to 100% confirm it works with it and doesn't work without it). (...still a work in progress so still ironing out the kinks, apologies for grey areas/confusion, and anything I may have missed out at this stage) EDIT: Maybe I'd misunderstood X-men. Was this supposed to be "Axeman"? Then yeah, just add X-men=Axeman to the word_mappings.txt file. Edited January 14 by sleighzy 2 AMD 7800x3D, 4080Super, 64Gb DDR5 RAM, 4Tb NVMe M.2, Quest 2
nikoel Posted January 15 Author Posted January 15 Something exciting might be soon coming. I bought access to a Whisper fine tuning training Repo and been running experiments For those who have offered I will take you up on transcription and training of the model. Only need 10 minutes of training and 10 minutes of validation. I will transcribe and then we will need to correct the transcription and I will push to train a new model. I will then upload it to Huggingface for everyone to use As an example down below, you can see in the screenshot the WER (Word Error Rate) going from 55 down to 38 utilising LoRa 2 1
buur Posted January 15 Posted January 15 Nice! Yes I'm ready to train the model with my horrible German accent 1
Seatechnerd83 Posted January 17 Posted January 17 (edited) New Sub-Version is up- WhisperAttack Changelog v0.3.5 No changes to the whisper_server.py between v0.3 and v0.3.5 Major Changes: Voice Attack Whisper Recording Plugin v0.3.5 introduces a native plugin for VoiceAttack called WASC (Whisper Attack Server Command). Allows for VoiceAttack to trigger the whisper_server.py itself instead of having to push commands through send_command.py. soft-stops whisper_server.py upon closing VoiceAttack through the shutdown command, meaning that shutting down the whisper system correctly takes only one click. Streamlines the entire chain of commands, stopping send_command.py from usurping the window focus from any other windows and minorly reduces latency. Edited January 17 by Seatechnerd83 1 2
nikoel Posted February 4 Author Posted February 4 WhisperAttack - New Release! We’re excited to bring you the latest version of Whisper Server with major improvements, enhanced accuracy, and better configurability! What’s New? Improved Installation & Dependencies Replaced whisper with openai-whisper for better compatibility. torch now installs with CUDA support automatically for optimized GPU performance. Added text2digits to improve number recognition in transcriptions. New Configuration System! Added settings.cfg for easier customization! You can now define key settings (e.g., Whisper model, device selection, and VoiceAttack path) in a simple config file. 🏎 Faster & More Accurate Transcriptions New text processing enhancements: Automatically converts spoken numbers into actual digits (e.g., "five thousand two hundred" → 5200). Better word normalization – reduces errors and improves accuracy. More refined regex processing to clean up text input. Addition of ability to push fine tuned future whisper models -> 🛠 VoiceAttack Integration Enhancements VoiceAttack path is now configurable instead of hardcoded. If VoiceAttack is not found, a clear error message is displayed. Trigger phrase change: Instead of saying "copy", now use "note " to send transcriptions to the DCS kneeboard. Other Fixes & Improvements Fixed issues with hyphenated words in transcriptions. Improved phonetic alphabet handling (e.g., "X-ray" is now properly recognized). Enhanced logging for easier debugging. ️ This update brings smoother, faster, and more accurate voice recognition with better configurability. Let us know if you have any feedback! Happy flying and commanding Massive thank you to @sleighzy who did most of the heavy lifting for this release and @Seatechnerd83 for the awesome new plugin! Get it here! https://github.com/nikoelt/WhisperAttack/releases 4
markturner1960 Posted February 9 Posted February 9 (edited) On 1/10/2025 at 11:59 PM, nikoel said: It might look like this because of terminal, but it isn't, there are only two copy and paste commands to do there. The readme on the GitHub looks advanced because I spelled out every step. But it's actually simple when you get your head around it. All we are doing is installing Python and ffmpeg as Path (you will see checkbox to install them as path as part of the installer so make sure to tick it) Then in the same terminal the you installed ffmpeg you enter the two lines of code and it will install dependencies Then you download from the releases page here the version you want from here: https://github.com/nikoelt/WhisperAttack/releases/tag/v0.2.2-beta. With a 3090 VRAM is not an issue so small.en is a good version to start with That's it. It's working. The other steps are to make voice attack start and stop voice recognition. Which is again a simple task. When you press a button it sends a command 'Start' and when you release it sends a command 'Stop' to let whisper know when to start/stop transcribing. If you take your time you will have it done in 15min Thanks, trying to install this now....stuck at FFmpeg...I dont see a link for thuis? How is it installed please? I looked in the unzipped Whisperattack folder but was unsure what to do from there... Edited February 9 by markturner1960 System specs: PC1 :Scan 3XS Ryzen 5900X, 64GB Corsair veng DDR4 3600, EVGA GTX 3090 Win 10, Quest Pro, Samsung Odyssey G9 Neo monitor.
Recommended Posts