speechbrain.processing.NMF module

Non-negative matrix factorization

Authors
  • Cem Subakan

Summary

Functions:

NMF_separate_spectra

This function separates the mixture signals, given NMF template matrices.

reconstruct_results

This function reconstructs the separated spectra into waveforms.

spectral_phase

Returns the phase of a complex spectrogram.

Reference

speechbrain.processing.NMF.spectral_phase(stft, power=2, log=False)[source]

Returns the phase of a complex spectrogram.

Parameters

stft (torch.Tensor) – A tensor, output from the stft function.

Example

>>> BS, nfft, T = 10, 20, 300
>>> X_stft = torch.randn(BS, nfft//2 + 1, T, 2)
>>> phase_mix = spectral_phase(X_stft)
speechbrain.processing.NMF.NMF_separate_spectra(Whats, Xmix)[source]

This function separates the mixture signals, given NMF template matrices.

Parameters
  • Whats (list) – This list contains the list [W1, W2], where W1 W2 are respectively the NMF template matrices that correspond to source1 and source2. W1, W2 are of size [nfft/2 + 1, K], where nfft is the fft size for STFT, and K is the number of vectors (templates) in W.

  • Xmix (torch.tensor) – This is the magnitude spectra for the mixtures. The size is [BS x T x nfft//2 + 1] where, BS = batch size, nfft = fft size, T = number of time steps in the spectra.

  • Outputs

  • -------

  • X1hat (Separated spectrum for source1) – Size = [BS x (nfft/2 +1) x T] where, BS = batch size, nfft = fft size, T = number of time steps in the spectra.

  • X2hat (Separated Spectrum for source2) – The size definitions are the same as above.

Example

>>> BS, nfft, T = 4, 20, 400
>>> K1, K2 = 10, 10
>>> W1hat = torch.randn(nfft//2 + 1, K1)
>>> W2hat = torch.randn(nfft//2 + 1, K2)
>>> Whats = [W1hat, W2hat]
>>> Xmix = torch.randn(BS, T, nfft//2 + 1)
>>> X1hat, X2hat = NMF_separate_spectra(Whats, Xmix)
speechbrain.processing.NMF.reconstruct_results(X1hat, X2hat, X_stft, sample_rate, win_length, hop_length)[source]

This function reconstructs the separated spectra into waveforms.

Parameters
  • Xhat1 (torch.tensor) – The separated spectrum for source 1 of size [BS, nfft/2 + 1, T], where, BS = batch size, nfft = fft size, T = length of the spectra.

  • Xhat2 (torch.tensor) – The separated spectrum for source 2 of size [BS, nfft/2 + 1, T]. The size definitions are the same as Xhat1.

  • X_stft (torch.tensor) – This is the magnitude spectra for the mixtures. The size is [BS x nfft//2 + 1 x T x 2] where, BS = batch size, nfft = fft size, T = number of time steps in the spectra. The last dimension is to represent complex numbers.

  • sample_rate (int) – The sampling rate (in Hz) in which we would like to save the results.

  • win_length (int) – The length of stft windows (in ms).

  • hop_length (int) – The length with which we shift the STFT windows (in ms).

Returns

  • x1hats (list) – List of waveforms for source 1.

  • x2hats (list) – List of waveforms for source 2.

Example

>>> BS, nfft, T = 10, 512, 16000
>>> sample_rate, win_length, hop_length = 16000, 25, 10
>>> X1hat = torch.randn(BS, nfft//2 + 1, T)
>>> X2hat = torch.randn(BS, nfft//2 + 1, T)
>>> X_stft = torch.randn(BS, nfft//2 + 1, T, 2)
>>> x1hats, x2hats = reconstruct_results(X1hat, X2hat, X_stft, sample_rate, win_length, hop_length)