speechbrain.processing.multi_mic module
Multi-microphone components.
This library contains functions for multi-microphone signal processing.
Example
>>> import torch
>>>
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT, ISTFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import GccPhat, SrpPhat, Music
>>> from speechbrain.processing.multi_mic import DelaySum, Mvdr, Gev
>>>
>>> xs_speech = read_audio(
... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels]
>>> xs_noise_diff = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs_noise_diff = xs_noise_diff.unsqueeze(0)
>>> xs_noise_loc = read_audio('tests/samples/multi-mic/noise_0.70225_-0.70225_0.11704.flac')
>>> xs_noise_loc = xs_noise_loc.unsqueeze(0)
>>> fs = 16000 # sampling rate
>>> ss = xs_speech
>>> nn_diff = 0.05 * xs_noise_diff
>>> nn_loc = 0.05 * xs_noise_loc
>>> xs_diffused_noise = ss + nn_diff
>>> xs_localized_noise = ss + nn_loc
>>> # Delay-and-Sum Beamforming with GCC-PHAT localization
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>> delaysum = DelaySum()
>>> istft = ISTFT(sample_rate=fs)
>>> Xs = stft(xs_diffused_noise)
>>> Ns = stft(nn_diff)
>>> XXs = cov(Xs)
>>> NNs = cov(Ns)
>>> tdoas = gccphat(XXs)
>>> Ys_ds = delaysum(Xs, tdoas)
>>> ys_ds = istft(Ys_ds)
>>> # Mvdr Beamforming with SRP-PHAT localization
>>> mvdr = Mvdr()
>>> mics = torch.zeros((4,3), dtype=torch.float)
>>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00])
>>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00])
>>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> srpphat = SrpPhat(mics=mics)
>>> doas = srpphat(XXs)
>>> Ys_mvdr = mvdr(Xs, NNs, doas, doa_mode=True, mics=mics, fs=fs)
>>> ys_mvdr = istft(Ys_mvdr)
>>> # Mvdr Beamforming with MUSIC localization
>>> music = Music(mics=mics)
>>> doas = music(XXs)
>>> Ys_mvdr2 = mvdr(Xs, NNs, doas, doa_mode=True, mics=mics, fs=fs)
>>> ys_mvdr2 = istft(Ys_mvdr2)
>>> # GeV Beamforming
>>> gev = Gev()
>>> Xs = stft(xs_localized_noise)
>>> Ss = stft(ss)
>>> Ns = stft(nn_loc)
>>> SSs = cov(Ss)
>>> NNs = cov(Ns)
>>> Ys_gev = gev(Xs, SSs, NNs)
>>> ys_gev = istft(Ys_gev)
- Authors:
William Aris
Francois Grondin
Summary
Classes:
Computes the covariance matrices of the signals. |
|
Performs delay and sum beamforming by using the TDOAs and the first channel as a reference. |
|
Generalized Cross-Correlation with Phase Transform localization. |
|
Generalized EigenValue decomposition (GEV) Beamforming. |
|
Multiple Signal Classification (MUSIC) localization. |
|
Perform minimum variance distortionless response (MVDR) beamforming by using an input signal in the frequency domain, its covariance matrices and tdoas (to compute a steering vector). |
|
Steered-Response Power with Phase Transform Localization. |
Functions:
This function converts directions of arrival (xyz coordinates expressed in meters) in time differences of arrival (expressed in samples). |
|
This function generates cartesian coordinates (xyz) for a set of points forming a 3D sphere. |
|
This function computes a steering vector by using the time differences of arrival for each channel (in samples) and the number of bins (n_fft). |
|
This function selects the tdoas of each channel and put them in a tensor. |
Reference
- class speechbrain.processing.multi_mic.Covariance(average=True)[source]
Bases:
Module
Computes the covariance matrices of the signals.
Arguments:
- averagebool
Informs the module if it should return an average (computed on the time dimension) of the covariance matrices. The Default value is True.
Example
>>> import torch >>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) >>> xs = xs_speech + 0.05 * xs_noise >>> fs = 16000
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> XXs.shape torch.Size([1, 1001, 201, 2, 10])
- forward(Xs)[source]
This method uses the utility function _cov to compute covariance matrices. Therefore, the result has the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics + n_pairs).
The order on the last dimension corresponds to the triu_indices for a square matrix. For instance, if we have 4 channels, we get the following order: (0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) and (3, 3). Therefore, XXs[…, 0] corresponds to channels (0, 0) and XXs[…, 1] corresponds to channels (0, 1).
Arguments:
- Xstensor
A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)
- class speechbrain.processing.multi_mic.DelaySum[source]
Bases:
Module
Performs delay and sum beamforming by using the TDOAs and the first channel as a reference.
Example
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech. unsqueeze(0) # [batch, time, channel] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] >>> fs = 16000 >>> xs = xs_speech + 0.05 * xs_noise >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> delaysum = DelaySum() >>> istft = ISTFT(sample_rate=fs) >>> >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs) >>> Ys = delaysum(Xs, tdoas) >>> ys = istft(Ys)
- forward(Xs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]
This method computes a steering vector by using the TDOAs/DOAs and then calls the utility function _delaysum to perform beamforming. The result has the following format: (batch, time_step, n_fft, 2, 1).
- Parameters
Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)
localization_tensor (tensor) – A tensor containing either time differences of arrival (TDOAs) (in samples) for each timestamp or directions of arrival (DOAs) (xyz coordinates in meters). If localization_tensor represents TDOAs, then its format is (batch, time_steps, n_mics + n_pairs). If localization_tensor represents DOAs, then its format is (batch, time_steps, 3)
doa_mode (bool) – The user needs to set this parameter to True if localization_tensor represents DOAs instead of TDOAs. Its default value is set to False.
mics (tensor) – The cartesian position (xyz coordinates in meters) of each microphone. The tensor must have the following format (n_mics, 3). This parameter is only mandatory when localization_tensor represents DOAs.
fs (int) – The sample rate in Hertz of the signals. This parameter is only mandatory when localization_tensor represents DOAs.
c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s. This parameter is only used when localization_tensor represents DOAs.
- class speechbrain.processing.multi_mic.Mvdr(eps=1e-20)[source]
Bases:
Module
Perform minimum variance distortionless response (MVDR) beamforming by using an input signal in the frequency domain, its covariance matrices and tdoas (to compute a steering vector).
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] >>> fs = 16000 >>> xs = xs_speech + 0.05 * xs_noise >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> mvdr = Mvdr() >>> istft = ISTFT(sample_rate=fs) >>> >>> Xs = stft(xs) >>> Ns = stft(xs_noise) >>> XXs = cov(Xs) >>> NNs = cov(Ns) >>> tdoas = gccphat(XXs) >>> Ys = mvdr(Xs, NNs, tdoas) >>> ys = istft(Ys)
- forward(Xs, NNs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]
This method computes a steering vector before using the utility function _mvdr to perform beamforming. The result has the following format: (batch, time_step, n_fft, 2, 1).
- Parameters
Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)
NNs (tensor) – The covariance matrices of the noise signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)
localization_tensor (tensor) – A tensor containing either time differences of arrival (TDOAs) (in samples) for each timestamp or directions of arrival (DOAs) (xyz coordinates in meters). If localization_tensor represents TDOAs, then its format is (batch, time_steps, n_mics + n_pairs). If localization_tensor represents DOAs, then its format is (batch, time_steps, 3)
doa_mode (bool) – The user needs to set this parameter to True if localization_tensor represents DOAs instead of TDOAs. Its default value is set to False.
mics (tensor) – The cartesian position (xyz coordinates in meters) of each microphone. The tensor must have the following format (n_mics, 3). This parameter is only mandatory when localization_tensor represents DOAs.
fs (int) – The sample rate in Hertz of the signals. This parameter is only mandatory when localization_tensor represents DOAs.
c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s. This parameter is only used when localization_tensor represents DOAs.
- class speechbrain.processing.multi_mic.Gev[source]
Bases:
Module
Generalized EigenValue decomposition (GEV) Beamforming.
Example
>>> from speechbrain.dataio.dataio import read_audio >>> import torch >>> >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import Gev >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_0.70225_-0.70225_0.11704.flac') >>> xs_noise = xs_noise.unsqueeze(0) >>> fs = 16000 >>> ss = xs_speech >>> nn = 0.05 * xs_noise >>> xs = ss + nn >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gev = Gev() >>> istft = ISTFT(sample_rate=fs) >>> >>> Ss = stft(ss) >>> Nn = stft(nn) >>> Xs = stft(xs) >>> >>> SSs = cov(Ss) >>> NNs = cov(Nn) >>> >>> Ys = gev(Xs, SSs, NNs) >>> ys = istft(Ys)
- forward(Xs, SSs, NNs)[source]
This method uses the utility function _gev to perform generalized eigenvalue decomposition beamforming. Therefore, the result has the following format: (batch, time_step, n_fft, 2, 1).
- Parameters
Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics).
SSs (tensor) – The covariance matrices of the target signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
NNs (tensor) – The covariance matrices of the noise signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
- class speechbrain.processing.multi_mic.GccPhat(tdoa_max=None, eps=1e-20)[source]
Bases:
Module
Generalized Cross-Correlation with Phase Transform localization.
- Parameters
tdoa_max (int) – Specifies a range to search for delays. For example, if tdoa_max = 10, the method will restrict its search for delays between -10 and 10 samples. This parameter is optional and its default value is None. When tdoa_max is None, the method will search for delays between -n_fft/2 and n_fft/2 (full range).
eps (float) – A small value to avoid divisions by 0 with the phase transformation. The default value is 1e-20.
Example
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] >>> fs = 16000 >>> xs = xs_speech + 0.05 * xs_noise >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs)
- forward(XXs)[source]
Perform generalized cross-correlation with phase transform localization by using the utility function _gcc_phat and by extracting the delays (in samples) before performing a quadratic interpolation to improve the accuracy. The result has the format: (batch, time_steps, n_mics + n_pairs).
The order on the last dimension corresponds to the triu_indices for a square matrix. For instance, if we have 4 channels, we get the following order: (0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) and (3, 3). Therefore, delays[…, 0] corresponds to channels (0, 0) and delays[…, 1] corresponds to channels (0, 1).
Arguments:
- XXstensor
The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
- class speechbrain.processing.multi_mic.SrpPhat(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20)[source]
Bases:
Module
Steered-Response Power with Phase Transform Localization.
- Parameters
mics (tensor) – The cartesian coordinates (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).
space (string) – If this parameter is set to ‘sphere’, the localization will be done in 3D by searching in a sphere of possible doas. If it set to ‘circle’, the search will be done in 2D by searching in a circle. By default, this parameter is set to ‘sphere’. Note: The ‘circle’ option isn’t implemented yet.
sample_rate (int) – The sample rate in Hertz of the signals to perform SRP-PHAT on. By default, this parameter is set to 16000 Hz.
speed_sound (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.
eps (float) – A small value to avoid errors like division by 0. The default value of this parameter is 1e-20.
Example
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import SrpPhat
>>> xs_speech = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac') >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> fs = 16000
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = xs_noise.unsqueeze(0)
>>> ss1 = xs_speech >>> ns1 = 0.05 * xs_noise >>> xs1 = ss1 + ns1
>>> ss2 = xs_speech >>> ns2 = 0.20 * xs_noise >>> xs2 = ss2 + ns2
>>> ss = torch.cat((ss1,ss2), dim=0) >>> ns = torch.cat((ns1,ns2), dim=0) >>> xs = torch.cat((xs1,xs2), dim=0)
>>> mics = torch.zeros((4,3), dtype=torch.float) >>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) >>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) >>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) >>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> srpphat = SrpPhat(mics=mics)
>>> Xs = stft(xs) >>> XXs = cov(Xs) >>> doas = srpphat(XXs)
- forward(XXs)[source]
Perform SRP-PHAT localization on a signal by computing a steering vector and then by using the utility function _srp_phat to extract the doas. The result is a tensor containing the directions of arrival (xyz coordinates (in meters) in the direction of the sound source). The output tensor has the format (batch, time_steps, 3).
This localization method uses Global Coherence Field (GCF): https://www.researchgate.net/publication/221491705_Speaker_localization_based_on_oriented_global_coherence_field
- Parameters
XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
- class speechbrain.processing.multi_mic.Music(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20, n_sig=1)[source]
Bases:
Module
Multiple Signal Classification (MUSIC) localization.
- Parameters
mics (tensor) – The cartesian coordinates (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).
space (string) – If this parameter is set to ‘sphere’, the localization will be done in 3D by searching in a sphere of possible doas. If it set to ‘circle’, the search will be done in 2D by searching in a circle. By default, this parameter is set to ‘sphere’. Note: The ‘circle’ option isn’t implemented yet.
sample_rate (int) – The sample rate in Hertz of the signals to perform SRP-PHAT on. By default, this parameter is set to 16000 Hz.
speed_sound (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.
eps (float) – A small value to avoid errors like division by 0. The default value of this parameter is 1e-20.
n_sig (int) – An estimation of the number of sound sources. The default value is set to one source.
Example
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import SrpPhat
>>> xs_speech = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac') >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> fs = 16000
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = xs_noise.unsqueeze(0)
>>> ss1 = xs_speech >>> ns1 = 0.05 * xs_noise >>> xs1 = ss1 + ns1
>>> ss2 = xs_speech >>> ns2 = 0.20 * xs_noise >>> xs2 = ss2 + ns2
>>> ss = torch.cat((ss1,ss2), dim=0) >>> ns = torch.cat((ns1,ns2), dim=0) >>> xs = torch.cat((xs1,xs2), dim=0)
>>> mics = torch.zeros((4,3), dtype=torch.float) >>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) >>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) >>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) >>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> music = Music(mics=mics)
>>> Xs = stft(xs) >>> XXs = cov(Xs) >>> doas = music(XXs)
- forward(XXs)[source]
Perform MUSIC localization on a signal by computing a steering vector and then by using the utility function _music to extract the doas. The result is a tensor containing the directions of arrival (xyz coordinates (in meters) in the direction of the sound source). The output tensor has the format (batch, time_steps, 3).
- Parameters
XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
- speechbrain.processing.multi_mic.doas2taus(doas, mics, fs, c=343.0)[source]
This function converts directions of arrival (xyz coordinates expressed in meters) in time differences of arrival (expressed in samples). The result has the following format: (batch, time_steps, n_mics).
- Parameters
doas (tensor) – The directions of arrival expressed with cartesian coordinates (xyz) in meters. The tensor must have the following format: (batch, time_steps, 3).
mics (tensor) – The cartesian position (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).
fs (int) – The sample rate in Hertz of the signals.
c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.
Example
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.multi_mic import sphere, doas2taus
>>> xs = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac') >>> xs = xs.unsqueeze(0) # [batch, time, channels] >>> fs = 16000 >>> mics = torch.zeros((4,3), dtype=torch.float) >>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) >>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) >>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) >>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> doas = sphere() >>> taus = doas2taus(doas, mics, fs)
- speechbrain.processing.multi_mic.tdoas2taus(tdoas)[source]
This function selects the tdoas of each channel and put them in a tensor. The result has the following format: (batch, time_steps, n_mics).
Arguments:
- tdoastensor
The time difference of arrival (TDOA) (in samples) for each timestamp. The tensor has the format (batch, time_steps, n_mics + n_pairs).
Example
>>> import torch >>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, tdoas2taus >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs = xs_speech + 0.05 * xs_noise >>> xs = xs.unsqueeze(0) >>> fs = 16000 >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs) >>> taus = tdoas2taus(tdoas)
- speechbrain.processing.multi_mic.steering(taus, n_fft)[source]
This function computes a steering vector by using the time differences of arrival for each channel (in samples) and the number of bins (n_fft). The result has the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics).
Arguments:
- taustensor
The time differences of arrival for each channel. The tensor must have the following format: (batch, time_steps, n_mics).
- n_fftint
The number of bins resulting of the STFT. It is assumed that the argument “onesided” was set to True for the STFT.
Example: ——–f >>> import torch >>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, tdoas2taus, steering >>> >>> xs_speech = read_audio( … ‘tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac’ … ) >>> xs_noise = read_audio(‘tests/samples/multi-mic/noise_diffuse.flac’) >>> xs = xs_speech + 0.05 * xs_noise >>> xs = xs.unsqueeze(0) # [batch, time, channels] >>> fs = 16000
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> >>> Xs = stft(xs) >>> n_fft = Xs.shape[2] >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs) >>> taus = tdoas2taus(tdoas) >>> As = steering(taus, n_fft)
- speechbrain.processing.multi_mic.sphere(levels_count=4)[source]
This function generates cartesian coordinates (xyz) for a set of points forming a 3D sphere. The coordinates are expressed in meters and can be used as doas. The result has the format: (n_points, 3).
- Parameters
levels_count (int) –
A number proportional to the number of points that the user wants to generate.
If levels_count = 1, then the sphere will have 42 points
If levels_count = 2, then the sphere will have 162 points
If levels_count = 3, then the sphere will have 642 points
If levels_count = 4, then the sphere will have 2562 points
If levels_count = 5, then the sphere will have 10242 points
…
By default, levels_count is set to 4.
Example
>>> import torch >>> from speechbrain.processing.multi_mic import sphere >>> doas = sphere()