*SDIF Types CVS Revision: *```
$Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $
```

Supporting Types |

*Status:*- standard

*Description:*- Name-value table: list of key (name) and associated value. See the 1NVT frame. The format is a list of: name\tvalue\n. Name must only contain alphanumerical characters, value must not contain \n or the zero byte.

*Rows:*- Characters

*Columns:*

**NVTText**

*Status:*- standard

*Description:*- Description type definition. See the 1TYP frame.

*Rows:*- Characters

*Columns:*

**TYPText**

*Status:*- standard

*Description:*- Stream routing and synthesizer patch information. See the 1IDS frame.

*Rows:*- Characters

*Columns:*

**IDSText**

*Status:*- standard

*Description:*- The 1NVT name-value table frame stores ASCII or numerical header information (value) for arbitrary keys (name). This is the place to store the name and version of the program that wrote the file, the date, the host, the user, the command-line arguments, and other parameters.

*Matrices:*-
- 1NVT NameValueTable
*required*

- 1NVT NameValueTable

*Status:*- standard

*Description:*- Definition of SDIF description types (frame and matrix types) local to the file. The definitions of any non-standard types used on generation of the SDIF file are copied to this frame, so that the file will work at other sites that don't know about the non-standard types.

*Matrices:*-
- 1TYP TypeDefinitions
*required*

- 1TYP TypeDefinitions

*Status:*- standard

*Description:*- Stream routing and synthesizer patch information for the Chant library, used e.g. in Diphone.

*Matrices:*-
- 1IDS StreamInfo
*required*

- 1IDS StreamInfo

Types used in different contexts |

*Description:*- A linear gain factor for the data to which it is attached, such as an autoregressive filter, i.e. 1ARA

*Rows: number=1*

*Columns:*

**Gain**linear gain value

*Description:*- See matrix Gain

*Matrices:*-
- 1GAI Gain
*required*Gain

- 1GAI Gain

*Description:*-
Window information matrix.
Windows used e.g. in spectral estimation.
A portion of the signal is selected by
multiplying it by the window function.
The WindowIdentifier is a (usually positive)
number which identifies the type of window as specified in
`window-enumeration.text`

The negative number -1 means a not-yet specified window which then must appear in sampled values explictly in a following 1WIN matrix. Then the WindowSize should be the same as the number of rows in the following 1WIN .

*Rows: number=1*

*Columns:*

**WindowIdentifier**A number which identifies the type of window.**WindowSize**in samples

*Description:*- Windows used e.g. in spectral estimation. A portion of the signal is selected by multiplying it by the window function the samples of which appear in the matrix. It has to be preceeded by a IWIN matrix.

*Rows:*- As many as samples describing the window function.

*Columns:*

**Samples**Samples

*Description:*- See matrix IWIN

*Matrices:*-
- IWIN WindowInfo
*required* - 1WIN Window
*optional*

- IWIN WindowInfo

*Description:*- Gain factors to distribute signals among several channels with corresponding gains (simple panning).

*Rows:*- as many as signals to pan

*Columns:*

**Channel1****Channel2**and so on optionally...

Fundamental Frequency Estimation |

*Description:*-
The fundamental frequency
*f0*of a signal is the inverse of the most likely (according to the estimation method used) periodicity at which the signal locally (on a short window) repeats nearly identical to itself. For synthesis, the synthesized signal is supposed to locally have the given periodicity (fundamental frequency) at the instant of the frame time. Even though f0 is often very close to the pitch at which the sound is perceived, this is not a psychoacoustic notion but a statistical notion. The psychoacoustic notion should go into a 1PCH or so matrix. Columns 2 to 4 (Confidence, Score, RealAmplitude) are not yet well defined. They are used by the f0 program at Ircam with the '*long output, -L*' option.

*Rows:*

*Columns:*

**Frequency****Confidence****Score****RealAmplitude**

*Description:*-
Fundamental Frequency
*f0*(see matrix 1FQ0)

*Matrices:*-
- 1FQ0 FundamentalFrequencyEstimate
*required*Fundamental Frequency estimate

- 1FQ0 FundamentalFrequencyEstimate

Sinusoidal Modeling |

*Status:*- standard

*Description:*- These are the peaks in a short-time frequency estimate such as the Short Time Fourier Transform. Frequency, Amplitude and Phase should be that of a constant sinusoid producing such a peak in the frequency estimate. Phase (and Frequency and Amplitude in case of nearly linearly time varying parameters) is relative to the center of the window.

*Rows:*- as many as peaks estimated in the frequency estimate

*Columns:*

**Frequency****Amplitude****Phase****Confidence**

*Status:*- standard

*Description:*- Sinusoidal tracks are sinsusoids with time-varying Frequency, Amplitude and Phase. Obviusly, giving these values at discrete times (Frame times) only is imprecise unless an interpolation formula is provided. This is generally not done yet but can easily be added in the name value table 1NVT. One track is described by the rows with same index in successive adjacent frames and that's what index are for, row numbers being arbitrary. When such an index is absent in a frame it means that the track vanishes at the instant of the frame. Again what happens after the last or before the first occurrence of that index remains to be precised. The solution adopted at Ircam is to get sure that the first and the last occurrence always have a zero amplitude.

*Rows:*- as many as sinusoidal tracks.

*Columns:*

**Index****Frequency****Amplitude****Phase**

*Status:*- standard

*Description:*- Harmonic Partials are sinsusoids with time-varying Frequency, Amplitude and Phase and such that each one has a frequency which is close, or at least related, to the integer ith multiple (harmonic) of a common fundamental frequency (see 1FQ0). This integer number i, which is usually named the harmonic number, is named 'Index' here and figures in the first column. In a given 1HRM matrix, an index value should not appear more than once. Some Indexes may be absent in certain matrices meaning that the corresponding harmonic partial vanishes or has not been detected. See matrix 1TRC for other properties.

*Rows:*- A row describes one harmonic partial

*Columns:*

**Index**harmonic number of the partial**Frequency***[Hz]*partial frequency**Amplitude**partial amplitude**Phase**partial phase, between -pi and pi

*Status:*- standard

*Description:*

*Matrices:*-
- 1PIC PickedPeaks
*required*PickedPeaks

- 1PIC PickedPeaks

*Status:*- standard

*Description:*

*Matrices:*-
- 1TRC SinusoidalTracks
*required*SinusoidalTracks

- 1TRC SinusoidalTracks

*Status:*- standard

*Description:*

*Matrices:*-
- 1HRM HarmonicPartials
*required*

- 1HRM HarmonicPartials

*Description:*

*Rows:*

*Columns:*

**MeanDeltaFrequency****Harmonicity****WeightedHarmonicity**

*Status:*- experimental

*Description:*

*Matrices:*-
- 1HRE HarmonicityEstimate
*required*Harmonicity

- 1HRE HarmonicityEstimate

Spectral Envelopes, Transfer Functions and Filters |

*Description:*-
Spectral envelopes information matrix, defines interpretation
of following 1ENV matrix
The linear+logarithmic scaling is linear under a certain
frequency called Break Frequency, then logarithmic above.
The exact formula is described in:
Diemo Schwarz,
*Spectral Envelopes in Sound Analysis and Synthesis*, Master's thesis, 1998.

*Rows: number=1*

*Columns:*

**HighestBinFrequency***[Hz]*?? frequency of the highest bin of the 1ENV matrix ??**ScaleType**0 for linear, 1 for linear+logarithmic scaling**BreakFrequency***[Hz]*break frequency when linear+logarithmic scaling

*Description:*-
Spectral envelope or magnitude transfer function
in sampled representation.
A spectral envelope is the envelope of the magnitude of
a short-time frequency estimate such
as the Short Time Fourier Transform.
See for instance:
Diemo Schwarz,
*Spectral Envelopes in Sound Analysis and Synthesis*, Master's thesis, 1998.

*Rows:*- A row corresponds to one bin of the spectral envelope

*Columns:*

**Env**envelope bin

*Status:*- proposed

*Description:*- The reason for the optional 1GAI matrix is that in some cases it is useful to force the desired gain for the synthesised signal.

*Matrices:*-
- IENV SpectralEnvelopeInfo
*optional* - 1ENV SpectralEnvelope
*required* - 1GAI Gain
*optional*gain of the original signal = desired gain for the synthesised signal

- IENV SpectralEnvelopeInfo

*Description:*- Transfer function coefficients information matrix, defines interpretation of following transfer function coefficients matrix such as 1CEC, 1ARA, 1ARR, 1ARK.

*Rows: number=1*

*Columns:*

**SamplingRate***[Hz]*SamplingRate corresponding to the transfer function coefficients.**Order**Order of the estimation or number of coefficients of the corresponding filter.

*Description:*- Cepstral coefficients as in text books

*Rows:*- As many as Cepstral Coefficients wished.

*Columns:*

**CepstralCoefficients**

*Description:*

*Matrices:*-
- 1CEC CepstralCoefs
*required*CepstralCoefs

- 1CEC CepstralCoefs

*Description:*- Autoregressive coefficients as in text books

*Rows:*-
The rows contain
`p + 1`autoregressive coefficients`a`to_{0}`a`, where_{p}`p`is theorder of the autoregressive filer.

*Columns:*

**AutoRegressiveCoefficients**

*Description:*

*Rows:*-
The rows contain
`p`reflection coefficients`k`to_{1}`k`, where_{p}`p`is theorder of the lattice filer.

*Columns:*

**ReflectionCoefficients**

*Description:*

*Rows:*-
The rows contain
`p + 1`autocorrelation coefficients`r`to_{0}`r`._{p}

*Columns:*

**AutoCorrelationCoefficients**

*Description:*

*Matrices:*-
- 1GAI Gain
*required*Gain - 1ARA ARACoefs
*required*ARCoefs

- 1GAI Gain

*Description:*

*Matrices:*-
- 1GAI Gain
*required*Gain - 1ARK ARKCoefs
*required*ARCoefs

- 1GAI Gain

*Description:*

*Matrices:*-
- 1ARR ARRCoefs
*required*ARCoefs

- 1ARR ARRCoefs

Resonances |

*Description:*-
Formant Waveforms as described in Chant.

Could be changed sometime to use 2RES plus a new 2FOF which would contain the columns Tex, DebAtt and Atten which are what FOF adds to RES.

*Rows:*

*Columns:*

**Frequency****Amplitude****BandWidth****Tex****DebAtt****Atten****Phase**

*Description:*-
Resonances/Exponentially Decaying Sinusoids.
Resonances data can describe the characteristics of a
resonant system like a
bank of second order section-filters,
or can specify parameters for a model
of sinusoids with fixed frequencies and
exponentially decaying amplitudes. (If you put an impulse
into such a group of filter
banks, the output should be a sum of sinusoids with
fixed frequencies and exponentially
decaying amplitudes, so these two situations are
in a certain sense the same.)
The decay curve of a resonance should be the same as that of
a two-pole filter with
bandwidth equal to decay rate divided by pi.
This formula gives the amplitude of each
sinusoid over time:
amp(t) = initial_amp * e ^ (- decay_rate * t)

The phase of a resonance specifies the initial phase of each decaying sinusoid. Ircam's programs still in the previous definition, i.e. columns 3 to 5 being:

3. BandWidth

4. Saliance

5. Normalisation of amplitudeTherefore the following columns should be that of 2RES:

*Rows:*- resonances

*Columns:*

**Frequency****Amplitude****DecayRate****Phase**

*Description:*- Noise Distribution matrix, defines a white noise random signal Column Distribution is an identifier (0 is uniform, 1 Gaussian...)

*Rows:*

*Columns:*

**Distribution****Amplitude**should be variance or standard deviation

*Description:*

*Matrices:*-
- IDIS
*required*NoiseInfo - 1DIS NoiseDistribution
*required*NoiseInfo

- IDIS

*Description:*

*Matrices:*-
- 1FQ0 FundamentalFrequencyEstimate
*required*PitchModeHit - 1FOF Formants
*required*Formants - 1CHA Channels
*required*FormantsChannels

- 1FQ0 FundamentalFrequencyEstimate

*Description:*

*Matrices:*-
- 1RES Filters
*required*Filters - 1CHA Channels
*required*FiltersChannels

- 1RES Filters

Fourier Transform |

*Description:*- See 1STF for details. WindowDuration is the duration of the window, in seconds,

*Rows:*

*Columns:*

**PeriodOfTheDFT**i.e., Sampling Rate in Hertz**WindowDuration**in seconds.**FFTSize**

*Description:*-
1STF frames represent the data that come out of a discrete short-term time-domain to frequency-domain transform such as an FFT.
Here is a precise mathematical definition of this frame type:

Let s(i) be a discrete signal with sampling rate SR Hertz

Let w(m) be a window defined with the support [0, M-1], i.e., w(m)=0 for m<0 and m>=M . M is called the*Window Size in Samples*. The corresponding*Window Duration*in seconds is M/(Sampling Frequency).

Let N be the size of the transform

We define the input to the transform, x(n), as follows. Note that the windowed signal is 'put' at the beginning of the vector x(n).Let x(n) = s(i+n) * w(n) for 0 lessThanOrEqual n lessThanOrEqual M-1 x(n) = 0 for M lessThanOrEqual n lessThanOrEqual N-1

(This is slightly redundant, since we define w(m)=0 when m>=M.) The 1STF matrix data is the Discrete Fourier Transform (DFT) of size N, i.e. the X(k) as follows. The DFT is a length N vector X, with these elements:N-1 X(k) = sum x(n) * exp(-j * 2 * pi * k * n/N) n=0 0 lessThanOrEqual k lessThanOrEqual N-1

The time tag in a 1STF frame is the time of the center of the window, i.e., (i + M/2)/SR, not the beginning.

Notes:

This definition corresponds to the output of Matlab's (and UDI's) FFTfunction The real and imaginary parts come directly from this formula: therefore, if you compute a phase as atan2(imaginary, real), it is the phase of the corresponding COSINUSOID (and not sinusoid as we are used in additive synthesis) at time (i)/SR. Note that the windowed signal is 'put' at the beginning of the vector x(n) (then zero padding follows) and this is crucial for the phase definition. Because of aliasing and foldover above the Nyquist frequency (and below the negative Nyquist frequency), the output of the DFT can be thought of as a periodic function of frequency over the range -infinity to infinity. The period of this function is the range from the negative Nyquist frequency to the positive Nyquist frequency, in other words, the sampling rate of the input signal

*Rows:*- Bins

*Columns:*

**Real****Imaginary**

*Description:*

*Matrices:*-
- ISTF FourierTransformInfo
*required*Info

- ISTF FourierTransformInfo

*Description:*

*Matrices:*-
- ISTF FourierTransformInfo
*required*Info - 1STF FourierTransform
*required*FourierTransform - 1WIN Window
*optional*Window applied on the signal prior to Fourier transform.

- ISTF FourierTransformInfo

Energy |

*Description:*- Energy information matrix, defines interpretation of following 1NRG matrix

*Rows: number=1*

*Columns:*

**Scale****NormalisationFactor**

*Description:*-
Short time energy of the signal, i.e.
m+N-1 E(k) = 1/N * sum x(n) * x(n) n=m

The time of the frame should be the center of the interval [m, m+N-1].

If a window is applied prior to calculation, it should be taken into acount and a IWIN and an optional 1WIN can be joined.

*Rows: number=1*

*Columns:*

**Energy**

*Description:*

*Matrices:*-
- INRG ScaleAndFactor
*required*ScaleAndFactor - 1NRG Energy
*required*Energy - IWIN WindowInfo
*optional*WindowInfo - 1WIN Window
*optional*Window

- INRG ScaleAndFactor

*Description:*

*Rows:*

*Columns:*

**LowerFrequencyLimit****UpperFrequencyLimit**

*Description:*

*Matrices:*-
- 1BND Bands
*required*Bands

- 1BND Bands

Time-Domain Samples |

*Description:*- Time-Domain Samples information matrix, defines interpretation of following 1TDS matrix

*Rows: number=1*

*Columns:*

**SamplingRate**

*Description:*- Restrict this type to linearly quantized samples with no compression Unlike most other SDIF frame types, a frame of 1TDS data represents an interval of time (equal to the number of rows in the 1TDS matrix divided by the sampling rate) rather than an instant of time. The time tag of a 1TDS frame represents the beginning of this interval. Most SDIF streams containing 1TDS data will consist of a single large frame at time zero with all of the samples for the stream in a single matrix. The same data could be represented equivalently in a series of shorter frames. There is also the possibility of "gaps" in the time axis,

*Rows:*- Sample frames

*Columns:*

Time markers |

*Description:*-
**1PEM**matrix type stands for**PeriodMarker**. PeriodMarker are used for pitch-synchronous algorithm such as PSOLA synthesis, or pitch-synchronous analysis. The most general definition of period markers is "the distance between two markers is equal to the local fundamental period". Instead of this definition which only gives a relative positionning of the period markers, an exact positionning of the period markers is often required. This absolute positionning of the period markers depends on the analysis method used. This is the reason why it is essential to refer to the analysis method used in the matrix. The matrix can contain other columns for the parameters of the method.**Method:**1

**Definition:**phase +Pi/2 of the 1st partial SINUSOID which represents this partial

**Note:**To be coherent with additive partial , a partial is represented by a sinusoid. The interesting point is at phase +Pi/2 since it is closer to the maximum of energy needed for PSOLA.

**Method:**2

**Definition:**group delay

**Note:**In brief, an amplitude wheigted mean group delay provides a delay which, added to the begining time of the analysis frame gives the time of the mark

**Method:**3

**Definition:**Glottale closure

**Note:**A method developed for speech

**Method:**4

**Definition:**Maximum of energy synchronised with F0

**Note:**Needs a precise definition

**Method:**5 ...

*Rows:*

*Columns:*

**Identifier****Parameter1****Parameter2****Parameter3**

*Description:*

*Rows:*

*Columns:*

**Index****Frequency****Amplitude****Phase**

*Description:*- Matrix identifier for transient in the signal such as occurs at tke beginning of percussive sounds and which need special care.

*Rows:*

*Columns:*

**Index**Identifier of the method used to detect and estimate this marker

Voiced-Unvoiced Decision |

*Description:*-
**1VUN**matrix type stands for**Voiced/Unvoiced Normalized**. It represents the voicing coefficient (harmonicity or periodicity coefficient in the general case) at a given time. The "voicing" coefficient is a measure of the part of the signal which is produced by the vocal folds (which is harmonic or periodic in the general case) at a given time. Range: [0,1]. 0 means purely unvoiced, 1 means purely voiced.

*Rows:*

*Columns:*

**VoicingCoefficient**Normalized voicing coefficient, range: [0,1]

*Description:*-
**1VUF**matrix type stands for**Voiced/Unvoiced Frequency**. It is the cutting frequency below which the signal is considered as voiced (harmonic or periodic in the general case) and above which it is considered as unvoiced (non-harmonic, non-periodic or noisy in the general case). Range: [0, 1/2 Nyquist Frequency].

*Rows:*

*Columns:*

**CuttingFrequency***[Hz]*Voiced/Unvoiced cutting frequency, range: [0, 1/2 Nyquist Frequency]

*Status:*- proposed

*Description:*-
**1MRK**frame type stands for**Marker**. Usually analysis method estimate parameters at a given time (examples: sinusoidal analysis, spectral analysis, ...). In the opposite, the results of some analysis method can be a serie of times (examples: detection of the times where abrupt changes occur, detection of the times of glottal closures, ...). The results of these analysis, which is essentially a series of times, is stored in as a series of frames. The time of each of these frames is the resulis of the analysis and the matrix inside each frame describe what this frame refer to, or in other word which analysis method has decided to point to this time (example: this frame time has been pointed by a glottal closure detection algorithm). In some cases, it is also important to add information related to the results (example: coded waveform of a transient).

*Matrices:*-
- 1PEM PeriodMarker
*required*PeriodMarker - ITMR TransientMarkerRepresentation
*required*TransientMarkerRepresentation - ITMI TransientMarkerIdentifier
*required*TransientMarkerIdentifier

- 1PEM PeriodMarker

*Status:*- proposed

*Description:*-
**1VUV**frame type stands for**Voiced/Unvoiced**. This frame contains information about the voicing (harmonicity or periodiodicity in the general case) property of the current time.

*Matrices:*-
- 1VUN VoicedUnvoicedNorm
*required*VoicedUnvoicedNorm - 1VUF VoicedUnvoicedFreq
*required*VoicedUnvoicedFreq

- 1VUN VoicedUnvoicedNorm

jMax Physical Model Parameters |

*Description:*

*Rows:*

*Columns:*

**Value****Index**

*Description:*

*Rows:*

*Columns:*

**Record**

*Status:*- private

*Description:*

*Matrices:*-
- EMPM Tableau
*required*Tableau - EMJR EndRecording
*required*EndRecording

- EMPM Tableau

About This Document
| |||

This document types-doc.html generated Tue Aug 22 20:33:24 2000 by xmltohtml.pl from sdiftypes.xml | |||

Generation | schwarz | Tue Aug 22 20:33:24 2000 | kethuk.ircam.fr /u/formes/schwarz/src/SDIF/types |
---|---|---|---|

Generator | xmltohtml.pl | Tue Aug 22 18:21:48 2000 | $Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $ |

Source file | sdiftypes.xml | Tue Aug 22 20:33:15 2000 | $Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $ |

Back | SDIF Home | Analysis/Synthesis Team | IRCAM |