Standard SDIF Types

SDIF Types Version: Ircam_1.5

SDIF Types CVS Revision: $Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $



Supporting Types

Matrix 1NVT NameValueTable

Status:
standard
Description:
Name-value table: list of key (name) and associated value. See the 1NVT frame. The format is a list of: name\tvalue\n. Name must only contain alphanumerical characters, value must not contain \n or the zero byte.
Rows:
Characters
Columns:
  1. NVTText


Matrix 1TYP TypeDefinitions

Status:
standard
Description:
Description type definition. See the 1TYP frame.
Rows:
Characters
Columns:
  1. TYPText


Matrix 1IDS StreamInfo

Status:
standard
Description:
Stream routing and synthesizer patch information. See the 1IDS frame.
Rows:
Characters
Columns:
  1. IDSText


Frame 1NVT NameValueTable

Status:
standard
Description:
The 1NVT name-value table frame stores ASCII or numerical header information (value) for arbitrary keys (name). This is the place to store the name and version of the program that wrote the file, the date, the host, the user, the command-line arguments, and other parameters.
Matrices:


Frame 1TYP TypeDefinitions

Status:
standard
Description:
Definition of SDIF description types (frame and matrix types) local to the file. The definitions of any non-standard types used on generation of the SDIF file are copied to this frame, so that the file will work at other sites that don't know about the non-standard types.
Matrices:


Frame 1IDS StreamInfo

Status:
standard
Description:
Stream routing and synthesizer patch information for the Chant library, used e.g. in Diphone.
Matrices:



Types used in different contexts

Matrix 1GAI Gain

Description:
A linear gain factor for the data to which it is attached, such as an autoregressive filter, i.e. 1ARA
Rows: number=1
Columns:
  1. Gain linear gain value


Frame 1GAI Gain

Description:
See matrix Gain
Matrices:


Matrix IWIN WindowInfo

Description:
Window information matrix. Windows used e.g. in spectral estimation. A portion of the signal is selected by multiplying it by the window function. The WindowIdentifier is a (usually positive) number which identifies the type of window as specified in window-enumeration.text
The negative number -1 means a not-yet specified window which then must appear in sampled values explictly in a following 1WIN matrix. Then the WindowSize should be the same as the number of rows in the following 1WIN .
Rows: number=1
Columns:
  1. WindowIdentifier A number which identifies the type of window.
  2. WindowSize in samples


Matrix 1WIN Window

Description:
Windows used e.g. in spectral estimation. A portion of the signal is selected by multiplying it by the window function the samples of which appear in the matrix. It has to be preceeded by a IWIN matrix.
Rows:
As many as samples describing the window function.
Columns:
  1. Samples Samples


Frame 1WIN Window

Description:
See matrix IWIN
Matrices:


Matrix 1CHA Channels

Description:
Gain factors to distribute signals among several channels with corresponding gains (simple panning).
Rows:
as many as signals to pan
Columns:
  1. Channel1
  2. Channel2 and so on optionally...



Fundamental Frequency Estimation

Matrix 1FQ0 FundamentalFrequencyEstimate

Description:
The fundamental frequency f0 of a signal is the inverse of the most likely (according to the estimation method used) periodicity at which the signal locally (on a short window) repeats nearly identical to itself. For synthesis, the synthesized signal is supposed to locally have the given periodicity (fundamental frequency) at the instant of the frame time. Even though f0 is often very close to the pitch at which the sound is perceived, this is not a psychoacoustic notion but a statistical notion. The psychoacoustic notion should go into a 1PCH or so matrix. Columns 2 to 4 (Confidence, Score, RealAmplitude) are not yet well defined. They are used by the f0 program at Ircam with the 'long output, -L' option.
Rows:
Columns:
  1. Frequency
  2. Confidence
  3. Score
  4. RealAmplitude


Frame 1FQ0 FundamentalFrequencyEstimate

Description:
Fundamental Frequency f0 (see matrix 1FQ0)
Matrices:



Sinusoidal Modeling

Matrix 1PIC PickedPeaks

Status:
standard
Description:
These are the peaks in a short-time frequency estimate such as the Short Time Fourier Transform. Frequency, Amplitude and Phase should be that of a constant sinusoid producing such a peak in the frequency estimate. Phase (and Frequency and Amplitude in case of nearly linearly time varying parameters) is relative to the center of the window.
Rows:
as many as peaks estimated in the frequency estimate
Columns:
  1. Frequency
  2. Amplitude
  3. Phase
  4. Confidence


Matrix 1TRC SinusoidalTracks

Status:
standard
Description:
Sinusoidal tracks are sinsusoids with time-varying Frequency, Amplitude and Phase. Obviusly, giving these values at discrete times (Frame times) only is imprecise unless an interpolation formula is provided. This is generally not done yet but can easily be added in the name value table 1NVT. One track is described by the rows with same index in successive adjacent frames and that's what index are for, row numbers being arbitrary. When such an index is absent in a frame it means that the track vanishes at the instant of the frame. Again what happens after the last or before the first occurrence of that index remains to be precised. The solution adopted at Ircam is to get sure that the first and the last occurrence always have a zero amplitude.
Rows:
as many as sinusoidal tracks.
Columns:
  1. Index
  2. Frequency
  3. Amplitude
  4. Phase


Matrix 1HRM HarmonicPartials

Status:
standard
Description:
Harmonic Partials are sinsusoids with time-varying Frequency, Amplitude and Phase and such that each one has a frequency which is close, or at least related, to the integer ith multiple (harmonic) of a common fundamental frequency (see 1FQ0). This integer number i, which is usually named the harmonic number, is named 'Index' here and figures in the first column. In a given 1HRM matrix, an index value should not appear more than once. Some Indexes may be absent in certain matrices meaning that the corresponding harmonic partial vanishes or has not been detected. See matrix 1TRC for other properties.
Rows:
A row describes one harmonic partial
Columns:
  1. Index harmonic number of the partial
  2. Frequency [Hz] partial frequency
  3. Amplitude partial amplitude
  4. Phase partial phase, between -pi and pi


Frame 1PIC PickedPeaks

Status:
standard
Description:
Matrices:


Frame 1TRC SinusoidalTracks

Status:
standard
Description:
Matrices:


Frame 1HRM HarmonicPartials

Status:
standard
Description:
Matrices:


Matrix 1HRE HarmonicityEstimate

Description:
Rows:
Columns:
  1. MeanDeltaFrequency
  2. Harmonicity
  3. WeightedHarmonicity


Frame 1HRE HarmonicityEstimate

Status:
experimental
Description:
Matrices:



Spectral Envelopes, Transfer Functions and Filters

Matrix IENV SpectralEnvelopeInfo

Description:
Spectral envelopes information matrix, defines interpretation of following 1ENV matrix The linear+logarithmic scaling is linear under a certain frequency called Break Frequency, then logarithmic above. The exact formula is described in: Diemo Schwarz, Spectral Envelopes in Sound Analysis and Synthesis, Master's thesis, 1998.
Rows: number=1
Columns:
  1. HighestBinFrequency [Hz] ?? frequency of the highest bin of the 1ENV matrix ??
  2. ScaleType 0 for linear, 1 for linear+logarithmic scaling
  3. BreakFrequency [Hz] break frequency when linear+logarithmic scaling


Matrix 1ENV SpectralEnvelope

Description:
Spectral envelope or magnitude transfer function in sampled representation. A spectral envelope is the envelope of the magnitude of a short-time frequency estimate such as the Short Time Fourier Transform. See for instance: Diemo Schwarz, Spectral Envelopes in Sound Analysis and Synthesis, Master's thesis, 1998.
Rows:
A row corresponds to one bin of the spectral envelope
Columns:
  1. Env envelope bin


Frame 1ENV SpectralEnvelope

Status:
proposed
Description:
The reason for the optional 1GAI matrix is that in some cases it is useful to force the desired gain for the synthesised signal.
Matrices:


Matrix ITFC TransferFunctionCoefficientsInfo

Description:
Transfer function coefficients information matrix, defines interpretation of following transfer function coefficients matrix such as 1CEC, 1ARA, 1ARR, 1ARK.
Rows: number=1
Columns:
  1. SamplingRate [Hz] SamplingRate corresponding to the transfer function coefficients.
  2. Order Order of the estimation or number of coefficients of the corresponding filter.


Matrix 1CEC CepstralCoefs

Description:
Cepstral coefficients as in text books
Rows:
As many as Cepstral Coefficients wished.
Columns:
  1. CepstralCoefficients


Frame 1CEC CepstralCoefs

Description:
Matrices:


Matrix 1ARA ARACoefs

Description:
Autoregressive coefficients as in text books
Rows:
The rows contain p + 1 autoregressive coefficients a0 to ap, where p is the order of the autoregressive filer.
Columns:
  1. AutoRegressiveCoefficients


Matrix 1ARK ARKCoefs

Description:
Rows:
The rows contain p reflection coefficients k1 to kp, where p is the order of the lattice filer.
Columns:
  1. ReflectionCoefficients


Matrix 1ARR ARRCoefs

Description:
Rows:
The rows contain p + 1 autocorrelation coefficients r0 to rp.
Columns:
  1. AutoCorrelationCoefficients


Frame 1ARA ARKCoefs

Description:
Matrices:


Frame 1ARK ARKCoefs

Description:
Matrices:


Frame 1ARR ARRCoefs

Description:
Matrices:



Resonances

Matrix 1FOF Formants

Description:
Formant Waveforms as described in Chant.
Could be changed sometime to use 2RES plus a new 2FOF which would contain the columns Tex, DebAtt and Atten which are what FOF adds to RES.
Rows:
Columns:
  1. Frequency
  2. Amplitude
  3. BandWidth
  4. Tex
  5. DebAtt
  6. Atten
  7. Phase


Matrix 1RES Filters

Description:
Resonances/Exponentially Decaying Sinusoids. Resonances data can describe the characteristics of a resonant system like a bank of second order section-filters, or can specify parameters for a model of sinusoids with fixed frequencies and exponentially decaying amplitudes. (If you put an impulse into such a group of filter banks, the output should be a sum of sinusoids with fixed frequencies and exponentially decaying amplitudes, so these two situations are in a certain sense the same.) The decay curve of a resonance should be the same as that of a two-pole filter with bandwidth equal to decay rate divided by pi. This formula gives the amplitude of each sinusoid over time:
        amp(t) = initial_amp * e ^ (- decay_rate * t)
     
The phase of a resonance specifies the initial phase of each decaying sinusoid. Ircam's programs still in the previous definition, i.e. columns 3 to 5 being:
3. BandWidth
4. Saliance
5. Normalisation of amplitude

Therefore the following columns should be that of 2RES:

Rows:
resonances
Columns:
  1. Frequency
  2. Amplitude
  3. DecayRate
  4. Phase


Matrix 1DIS NoiseDistribution

Description:
Noise Distribution matrix, defines a white noise random signal Column Distribution is an identifier (0 is uniform, 1 Gaussian...)
Rows:
Columns:
  1. Distribution
  2. Amplitude should be variance or standard deviation


Frame 1NOI

Description:
Matrices:


Frame 1FOB

Description:
Matrices:


Frame 1REB

Description:
Matrices:



Fourier Transform

Matrix ISTF FourierTransformInfo

Description:
See 1STF for details. WindowDuration is the duration of the window, in seconds,
Rows:
Columns:
  1. PeriodOfTheDFT i.e., Sampling Rate in Hertz
  2. WindowDuration in seconds.
  3. FFTSize


Matrix 1STF FourierTransform

Description:
1STF frames represent the data that come out of a discrete short-term time-domain to frequency-domain transform such as an FFT. Here is a precise mathematical definition of this frame type:
Let s(i) be a discrete signal with sampling rate SR Hertz
Let w(m) be a window defined with the support [0, M-1], i.e., w(m)=0 for m<0 and m>=M . M is called the Window Size in Samples. The corresponding Window Duration in seconds is M/(Sampling Frequency).
Let N be the size of the transform
We define the input to the transform, x(n), as follows. Note that the windowed signal is 'put' at the beginning of the vector x(n).
Let x(n) =   s(i+n) * w(n)  for  0 lessThanOrEqual n lessThanOrEqual M-1
    x(n) =   0              for  M lessThanOrEqual n lessThanOrEqual N-1
     
(This is slightly redundant, since we define w(m)=0 when m>=M.) The 1STF matrix data is the Discrete Fourier Transform (DFT) of size N, i.e. the X(k) as follows. The DFT is a length N vector X, with these elements:
              N-1
       X(k) = sum  x(n) * exp(-j * 2 * pi * k * n/N)
              n=0

       0 lessThanOrEqual k lessThanOrEqual N-1
     
The time tag in a 1STF frame is the time of the center of the window, i.e., (i + M/2)/SR, not the beginning.
Notes:
This definition corresponds to the output of Matlab's (and UDI's) FFTfunction The real and imaginary parts come directly from this formula: therefore, if you compute a phase as atan2(imaginary, real), it is the phase of the corresponding COSINUSOID (and not sinusoid as we are used in additive synthesis) at time (i)/SR. Note that the windowed signal is 'put' at the beginning of the vector x(n) (then zero padding follows) and this is crucial for the phase definition. Because of aliasing and foldover above the Nyquist frequency (and below the negative Nyquist frequency), the output of the DFT can be thought of as a periodic function of frequency over the range -infinity to infinity. The period of this function is the range from the negative Nyquist frequency to the positive Nyquist frequency, in other words, the sampling rate of the input signal
Rows:
Bins
Columns:
  1. Real
  2. Imaginary


Frame ISTF FourierTransformInfo

Description:
Matrices:


Frame 1STF FourierTransform

Description:
Matrices:



Energy

Matrix INRG ScaleAndFactor

Description:
Energy information matrix, defines interpretation of following 1NRG matrix
Rows: number=1
Columns:
  1. Scale
  2. NormalisationFactor


Matrix 1NRG Energy

Description:
Short time energy of the signal, i.e.
                    m+N-1
       E(k) = 1/N *  sum  x(n) * x(n)
                    n=m
     
The time of the frame should be the center of the interval [m, m+N-1].
If a window is applied prior to calculation, it should be taken into acount and a IWIN and an optional 1WIN can be joined.
Rows: number=1
Columns:
  1. Energy


Frame 1NRG

Description:
Matrices:


Matrix 1BND Bands

Description:
Rows:
Columns:
  1. LowerFrequencyLimit
  2. UpperFrequencyLimit


Frame 1BND Bands

Description:
Matrices:



Time-Domain Samples

Matrix ITDS TimeDomainSamplesInfo

Description:
Time-Domain Samples information matrix, defines interpretation of following 1TDS matrix
Rows: number=1
Columns:
  1. SamplingRate


Matrix 1TDS TimeDomainSamples

Description:
Restrict this type to linearly quantized samples with no compression Unlike most other SDIF frame types, a frame of 1TDS data represents an interval of time (equal to the number of rows in the 1TDS matrix divided by the sampling rate) rather than an instant of time. The time tag of a 1TDS frame represents the beginning of this interval. Most SDIF streams containing 1TDS data will consist of a single large frame at time zero with all of the samples for the stream in a single matrix. The same data could be represented equivalently in a series of shorter frames. There is also the possibility of "gaps" in the time axis,
Rows:
Sample frames
Columns:
  1. Amplitudes in each channel. Linear. All but the first are optional.



Time markers

Matrix 1PEM PeriodMarker

Description:
1PEM matrix type stands for PeriodMarker. PeriodMarker are used for pitch-synchronous algorithm such as PSOLA synthesis, or pitch-synchronous analysis. The most general definition of period markers is "the distance between two markers is equal to the local fundamental period". Instead of this definition which only gives a relative positionning of the period markers, an exact positionning of the period markers is often required. This absolute positionning of the period markers depends on the analysis method used. This is the reason why it is essential to refer to the analysis method used in the matrix. The matrix can contain other columns for the parameters of the method.

Method: 1
Definition: phase +Pi/2 of the 1st partial SINUSOID which represents this partial
Note: To be coherent with additive partial , a partial is represented by a sinusoid. The interesting point is at phase +Pi/2 since it is closer to the maximum of energy needed for PSOLA.
Method: 2
Definition: group delay
Note: In brief, an amplitude wheigted mean group delay provides a delay which, added to the begining time of the analysis frame gives the time of the mark
Method: 3
Definition: Glottale closure
Note: A method developed for speech
Method: 4
Definition:Maximum of energy synchronised with F0
Note: Needs a precise definition
Method: 5 ...

Rows:
Columns:
  1. Identifier
  2. Parameter1
  3. Parameter2
  4. Parameter3


Matrix ITMR TransientMarkerRepresentation

Description:
Rows:
Columns:
  1. Index
  2. Frequency
  3. Amplitude
  4. Phase


Matrix ITMI TransientMarkerIdentifier

Description:
Matrix identifier for transient in the signal such as occurs at tke beginning of percussive sounds and which need special care.
Rows:
Columns:
  1. Index Identifier of the method used to detect and estimate this marker



Voiced-Unvoiced Decision

Matrix 1VUN VoicedUnvoicedNorm

Description:
1VUN matrix type stands for Voiced/Unvoiced Normalized. It represents the voicing coefficient (harmonicity or periodicity coefficient in the general case) at a given time. The "voicing" coefficient is a measure of the part of the signal which is produced by the vocal folds (which is harmonic or periodic in the general case) at a given time. Range: [0,1]. 0 means purely unvoiced, 1 means purely voiced.
Rows:
Columns:
  1. VoicingCoefficient Normalized voicing coefficient, range: [0,1]


Matrix 1VUF VoicedUnvoicedFreq

Description:
1VUF matrix type stands for Voiced/Unvoiced Frequency. It is the cutting frequency below which the signal is considered as voiced (harmonic or periodic in the general case) and above which it is considered as unvoiced (non-harmonic, non-periodic or noisy in the general case). Range: [0, 1/2 Nyquist Frequency].
Rows:
Columns:
  1. CuttingFrequency [Hz] Voiced/Unvoiced cutting frequency, range: [0, 1/2 Nyquist Frequency]


Frame 1MRK Marker

Status:
proposed
Description:
1MRK frame type stands for Marker. Usually analysis method estimate parameters at a given time (examples: sinusoidal analysis, spectral analysis, ...). In the opposite, the results of some analysis method can be a serie of times (examples: detection of the times where abrupt changes occur, detection of the times of glottal closures, ...). The results of these analysis, which is essentially a series of times, is stored in as a series of frames. The time of each of these frames is the resulis of the analysis and the matrix inside each frame describe what this frame refer to, or in other word which analysis method has decided to point to this time (example: this frame time has been pointed by a glottal closure detection algorithm). In some cases, it is also important to add information related to the results (example: coded waveform of a transient).
Matrices:


Frame 1VUV VoicedUnvoiced

Status:
proposed
Description:
1VUV frame type stands for Voiced/Unvoiced. This frame contains information about the voicing (harmonicity or periodiodicity in the general case) property of the current time.
Matrices:



jMax Physical Model Parameters

Matrix EMPM Tableau

Description:
Rows:
Columns:
  1. Value
  2. Index


Matrix EMJR EndRecording

Description:
Rows:
Columns:
  1. Record


Frame EFPM PhysicalModelParameters

Status:
private
Description:
Matrices:




About This Document
This document types-doc.html generated Tue Aug 22 20:33:24 2000 by xmltohtml.pl from sdiftypes.xml
GenerationschwarzTue Aug 22 20:33:24 2000kethuk.ircam.fr /u/formes/schwarz/src/SDIF/types
Generatorxmltohtml.plTue Aug 22 18:21:48 2000$Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $
Source filesdiftypes.xmlTue Aug 22 20:33:15 2000 $Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $
BackSDIF HomeAnalysis/Synthesis TeamIRCAM