You are here

Realtime syllable recognition using an FFT analyzer

For everything after the recording stage: hardware/software and how you use it.

Realtime syllable recognition using an FFT analyzer

Postby castpixel » Mon Jun 22, 2020 12:01 pm

I am a complete audio noob, but I have a very specific question.

I've made a proof-of-concept game where you shoot things with your voice. It's playable here
https://castpixel.itch.io/zapsignal

How it works:
There is no speech recognition, as that's obviously too slow for a realtime game based on reflexes.
The game examines the mic input in real time, just looking for Peak and RMS levels, and when they exceed a threshold, I send a "shoot" signal.


What I need:
I'm trying to improve recognition of syllables. I found out that a bandpass filter on the live input makes some frequency bins more "excited". For example, with a bandpass at 5000hz and Q=10000 (!), the syllables "pon" and "chu" produce distinct bands in the FFT analyzer.


Pon
|
|____πΠπ______> frequency

Chu.
|
|Ππ__________> frequency


But it's not accurate enough yet.

Does anyone know of any filters that can further differentiate between these sounds?

I am constrained to tone.js a javascript audio framework. I can do filtering (band pass, hi/ low etc) and convolution filter. Would a convolution impulse response "push" the "ch" sounds to be more distinct? Am I thinking about it wrong?

Thank you in advance for your time and expertise
castpixel
Posts: 1
Joined: Mon Jun 22, 2020 11:39 am