Forum: War Ensemble BBS

How to check whether audio bytes contain empty noise or actualvoice/signal?

From marc nicole@mk1853387@gmail.com to comp.lang.python on Fri Oct 25 18:25:19 2024

From Newsgroup: comp.lang.python

Hello Python fellows,

I hope this question is not very far from the main topic of this list, but
I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise.

I am using PyAudio to collect the sound through my PC mic as follows:

FRAMES_PER_BUFFER = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000
RECORD_SECONDS = 2import pyaudio
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=FRAMES_PER_BUFFER,
input_device_index=2)
data = stream.read(FRAMES_PER_BUFFER)

I want to know whether or not data contains voice signals or empty sound,
To note that the variable always contains bytes (empty or sound) if I print
it.

Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke?

Thanks.
--- Synchronet 3.20a-Linux NewsLink 1.114

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.python on Fri Oct 25 16:43:11 2024

From Newsgroup: comp.lang.python

marc nicole <mk1853387@gmail.com> wrote or quoted:

I hope this question is not very far from the main topic of this list, but
I have a hard time finding a way to check whether audio data samples are >containing empty noise or actual significant voice/noise.

The Spectral Flatness Measure (SFM), also called Wiener entropy, can
separate the wheat from the chaff when it comes to how noise-like
a signal is. This measure runs the gamut from 0 to 1, where:
1 means you've hit pay dirt with perfect white noise (flat spectrum),
0 is as pure as a Napa Valley Chardonnay (single frequency).
(Everything in between is just different shades of gnarly.)

import numpy as np
from scipy.signal import welch

def noiseness(signal, fs):
# Compute the power spectral density
f, psd = welch(signal, fs, nperseg=min(len(signal), 256))

# Compute geometric mean of PSD
geometric_mean = np.exp(np.mean(np.log(psd + 1e-10)))

# Compute arithmetic mean of PSD
arithmetic_mean = np.mean(psd)

# Calculate Spectral Flatness Measure
sfm = geometric_mean / arithmetic_mean

return sfm

--- Synchronet 3.20a-Linux NewsLink 1.114

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.python on Fri Oct 25 17:00:14 2024

From Newsgroup: comp.lang.python

ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:

marc nicole <mk1853387@gmail.com> wrote or quoted:

I hope this question is not very far from the main topic of this list, but >>I have a hard time finding a way to check whether audio data samples are >>containing empty noise or actual significant voice/noise.

The Spectral Flatness Measure (SFM), also called Wiener entropy, can
separate the wheat from the chaff when it comes to how noise-like
a signal is.

You can also peep the envelope flatness (the flatness of the
volume). If you've got some white noise that's not bringing much to
the table, that envelope should be flatter than a pancake at IHOP.

import librosa
import numpy as np

def measure_volume_flatness(audio_path, sr=None):
# Load the audio file
y, sr = librosa.load(audio_path, sr=sr)

# Calculate the root mean square (RMS) energy for each frame
frame_length = 2048
hop_length = 512
rms = librosa.feature.rms(y=y, frame_length=frame_length, hop_length=hop_length)[0]

# Calculate the dynamic range
db_range = librosa.amplitude_to_db(np.max(rms)) - librosa.amplitude_to_db(np.min(rms))

# Normalize the dynamic range to a 0-1 scale
# Assuming a maximum possible dynamic range of 120 dB
flatness = 1 - (db_range / 120)

return flatness

--- Synchronet 3.20a-Linux NewsLink 1.114

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.python on Sat Oct 26 11:16:13 2024

From Newsgroup: comp.lang.python

marc nicole <mk1853387@gmail.com> wrote or quoted:

I have a hard time finding a way to check whether audio data samples are >containing empty noise or actual significant voice/noise.

Or, you could have a human do a quick listen to some audio files to
gauge the "empty-noise ratio," then use that number as the filename
as a float, and finally train up a neural net on this. E.g.,

0.99.wav # very empty
0.992.wav # very empty file #2
0.993.wav # very empty file #3

0.00.wav # very not empty file
0.002.wav # very not empty file #2

One possible approach:

import os
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import librosa

## Data Preparation

# Function to extract audio features
def extract_features(file_path):
audio, sr = librosa.load(file_path)
mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)
return np.mean(mfccs.T, axis=0)

# Load data from directory
directory = 'd' # for example
X = []
y = []

for filename in os.listdir(directory):
if filename.endswith('.wav'):
file_path = os.path.join(directory, filename)
X.append(extract_features(file_path))
y.append(float(filename[:-4])) # Assuming filename is the p value

X = np.array(X)
y = np.array(y)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Neural Network Model

model = Sequential([
Dense(64, activation='relu', input_shape=(13,)),
Dense(32, activation='relu'),
Dense(1)
])

model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

## Training

model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=1)

## Evaluation

test_loss = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test Loss: {test_loss}")

## Prediction Function

def predict_p(audio_file):
features = extract_features(audio_file)
scaled_features = scaler.transform(features.reshape(1, -1))
prediction = model.predict(scaled_features)
return prediction[0][0]

# Example usage
new_audio_file = 'path/to/new/audio/file.wav'
predicted_p = predict_p(new_audio_file)
print(f"Predicted p value: {predicted_p}")

--- Synchronet 3.20a-Linux NewsLink 1.114

From MRAB@python@mrabarnett.plus.com to comp.lang.python on Sat Oct 26 16:35:47 2024

From Newsgroup: comp.lang.python

On 2024-10-25 17:25, marc nicole via Python-list wrote:

Hello Python fellows,

I hope this question is not very far from the main topic of this list, but
I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise.

I am using PyAudio to collect the sound through my PC mic as follows:

FRAMES_PER_BUFFER = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000
RECORD_SECONDS = 2import pyaudio
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=FRAMES_PER_BUFFER,
input_device_index=2)
data = stream.read(FRAMES_PER_BUFFER)

I want to know whether or not data contains voice signals or empty sound,
To note that the variable always contains bytes (empty or sound) if I print it.

Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke?

Thanks.

If you do a spectral analysis and find peaks at certain frequencies,
then there might be a "significant" sound.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Thomas Passin@list1@tompassin.net to comp.lang.python on Sat Oct 26 12:07:10 2024

From Newsgroup: comp.lang.python

On 10/25/2024 12:25 PM, marc nicole via Python-list wrote:

Hello Python fellows,

I hope this question is not very far from the main topic of this list, but
I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise.

I am using PyAudio to collect the sound through my PC mic as follows:

FRAMES_PER_BUFFER = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000
RECORD_SECONDS = 2import pyaudio
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=FRAMES_PER_BUFFER,
input_device_index=2)
data = stream.read(FRAMES_PER_BUFFER)

I want to know whether or not data contains voice signals or empty sound,
To note that the variable always contains bytes (empty or sound) if I print it.

Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke?

It's not always so easy. The Fast Fourier Transform will be your
friend. The most straightforward way would be to do an autocorrelation
on the recorded interval, possibly with some pre-filtering to enhance
the typical vocal frequency range. If the data is only noise, the autocorrelation will show a large signal at point 0 and only small,
obviously noisy numbers everywhere else. There are practical aspects
that make things less clear. For example, voices tend to be spiky and
erratic so you need to use small intervals to have a better chance of
getting an interval with a good S/N ratio, but small intervals will have
a lower signal to noise ratio.

Human speech is produced with various statistical regularities and these
can sometimes be detected with various means, including the autocorrelation.

You also will need to test-record your entire signal chain because it
might be producing artifacts that could fool some tests. And background sounds could fool some tests as well.

Here are some Python libraries that could be very helpful:

librosa (I have not worked with this but it sounds right on target); scipy.signal (I have used scypi but not specifically scipy.signal); python-speech-features (another I haven't used);
https://python-speech-features.readthedocs.io/en/latest/

Other people will know of others.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lars Liedtke@lal@solute.de to comp.lang.python on Mon Oct 28 09:57:09 2024

From Newsgroup: comp.lang.python

There are also the concepts of Cepstrum (https://en.wikipedia.org/wiki/Cepstrum) and Quefrency, which are derivatives of Spectrum and Frequency, with which you can even do speaker-recognition, but also detection of events.
Lars Liedtke
Lead Developer
[Tel.] +49 721 98993-
[Fax] +49 721 98993-
[E-Mail] lal@solute.de<mailto:lal@solute.de>
solute GmbH
Zeppelinstraße 15
76185 Karlsruhe
Germany
[Marken]
Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
Webseite | www.solute.de <http://www.solute.de/>
Sitz | Registered Office: Karlsruhe
Registergericht | Register Court: Amtsgericht Mannheim
Registernummer | Register No.: HRB 748044
USt-ID | VAT ID: DE234663798
Informationen zum Datenschutz | Information about privacy policy https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php
Am 26.10.24 um 18:07 schrieb Thomas Passin via Python-list:
On 10/25/2024 12:25 PM, marc nicole via Python-list wrote:
Hello Python fellows,
I hope this question is not very far from the main topic of this list, but
I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise.
I am using PyAudio to collect the sound through my PC mic as follows: FRAMES_PER_BUFFER = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000
RECORD_SECONDS = 2import pyaudio
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=FRAMES_PER_BUFFER,
input_device_index=2)
data = stream.read(FRAMES_PER_BUFFER)
I want to know whether or not data contains voice signals or empty sound,
To note that the variable always contains bytes (empty or sound) if I print
it.
Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke?
It's not always so easy. The Fast Fourier Transform will be your friend. The most straightforward way would be to do an autocorrelation on the recorded interval, possibly with some pre-filtering to enhance the typical vocal frequency range. If the data is only noise, the autocorrelation will show a large signal at point 0 and only small, obviously noisy numbers everywhere else. There are practical aspects that make things less clear. For example, voices tend to be spiky and erratic so you need to use small intervals to have a better chance of getting an interval with a good S/N ratio, but small intervals will have a lower signal to noise ratio.
Human speech is produced with various statistical regularities and these can sometimes be detected with various means, including the autocorrelation.
You also will need to test-record your entire signal chain because it might be producing artifacts that could fool some tests. And background sounds could fool some tests as well.
Here are some Python libraries that could be very helpful:
librosa (I have not worked with this but it sounds right on target); scipy.signal (I have used scypi but not specifically scipy.signal); python-speech-features (another I haven't used);
https://python-speech-features.readthedocs.io/en/latest/
Other people will know of others.
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Winston
  Thu Nov 21 08:55:50 2024
  from Kerrville, Tx via SSH
- Grey Gamer
  Thu Nov 21 07:37:11 2024
  from Show Low, Az via Telnet
- Microbot
  Thu Nov 21 03:10:00 2024
  from Moore, Ok via Telnet
- Winston
  Wed Nov 20 09:30:02 2024
  from Kerrville, Tx via SSH

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	991
Nodes:	10 (1 / 9)
Uptime:	77:44:28
Calls:	12,949
Calls today:	3
Files:	186,574
Messages:	3,264,606

How to check whether audio bytes contain empty noise or actualvoice/signal?

Who's Online

Recent Visitors

System Info