Standard SDIF Types

SDIF Types Version: Ircam_1.5

SDIF Types CVS Revision: $Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $

Supporting Types

Matrix 1NVT NameValueTable

Status:: standard

Description:: Name-value table: list of key (name) and associated value. See the 1NVT frame. The format is a list of: name\tvalue\n. Name must only contain alphanumerical characters, value must not contain \n or the zero byte.

Rows:: Characters

Columns:

NVTText

Matrix 1TYP TypeDefinitions

Status:: standard

Description:: Description type definition. See the 1TYP frame.

Rows:: Characters

Columns:

TYPText

Matrix 1IDS StreamInfo

Status:: standard

Description:: Stream routing and synthesizer patch information. See the 1IDS frame.

Rows:: Characters

Columns:

IDSText

Frame 1NVT NameValueTable

Status:: standard

Description:: The 1NVT name-value table frame stores ASCII or numerical header information (value) for arbitrary keys (name). This is the place to store the name and version of the program that wrote the file, the date, the host, the user, the command-line arguments, and other parameters.

Matrices:

1NVT NameValueTable required

Frame 1TYP TypeDefinitions

Status:: standard

Description:: Definition of SDIF description types (frame and matrix types) local to the file. The definitions of any non-standard types used on generation of the SDIF file are copied to this frame, so that the file will work at other sites that don't know about the non-standard types.

Matrices:

1TYP TypeDefinitions required

Frame 1IDS StreamInfo

Status:: standard

Description:: Stream routing and synthesizer patch information for the Chant library, used e.g. in Diphone.

Matrices:

1IDS StreamInfo required

Types used in different contexts

Matrix 1GAI Gain

Description:: A linear gain factor for the data to which it is attached, such as an autoregressive filter, i.e. 1ARA

Rows: number=1

Columns:

Gain linear gain value

Frame 1GAI Gain

Description:: See matrix Gain

Matrices:

1GAI Gain required Gain

Matrix IWIN WindowInfo

Description:: Window information matrix. Windows used e.g. in spectral estimation. A portion of the signal is selected by multiplying it by the window function. The WindowIdentifier is a (usually positive) number which identifies the type of window as specified in window-enumeration.text
The negative number -1 means a not-yet specified window which then must appear in sampled values explictly in a following 1WIN matrix. Then the WindowSize should be the same as the number of rows in the following 1WIN .

Rows: number=1

Columns:

WindowIdentifier A number which identifies the type of window.
WindowSize in samples

Matrix 1WIN Window

Description:: Windows used e.g. in spectral estimation. A portion of the signal is selected by multiplying it by the window function the samples of which appear in the matrix. It has to be preceeded by a IWIN matrix.

Rows:: As many as samples describing the window function.

Columns:

Samples Samples

Frame 1WIN Window

Description:: See matrix IWIN

Matrices:

IWIN WindowInfo required
1WIN Window optional

Matrix 1CHA Channels

Description:: Gain factors to distribute signals among several channels with corresponding gains (simple panning).

Rows:: as many as signals to pan

Columns:

Channel1
Channel2 and so on optionally...

Fundamental Frequency Estimation

Matrix 1FQ0 FundamentalFrequencyEstimate

Description:: The fundamental frequency f0 of a signal is the inverse of the most likely (according to the estimation method used) periodicity at which the signal locally (on a short window) repeats nearly identical to itself. For synthesis, the synthesized signal is supposed to locally have the given periodicity (fundamental frequency) at the instant of the frame time. Even though f0 is often very close to the pitch at which the sound is perceived, this is not a psychoacoustic notion but a statistical notion. The psychoacoustic notion should go into a 1PCH or so matrix. Columns 2 to 4 (Confidence, Score, RealAmplitude) are not yet well defined. They are used by the f0 program at Ircam with the 'long output, -L' option.

Rows:

Columns:

Frequency
Confidence
Score
RealAmplitude

Frame 1FQ0 FundamentalFrequencyEstimate

Description:: Fundamental Frequency f0 (see matrix 1FQ0)

Matrices:

1FQ0 FundamentalFrequencyEstimate required Fundamental Frequency estimate

Sinusoidal Modeling

Matrix 1PIC PickedPeaks

Status:: standard

Description:: These are the peaks in a short-time frequency estimate such as the Short Time Fourier Transform. Frequency, Amplitude and Phase should be that of a constant sinusoid producing such a peak in the frequency estimate. Phase (and Frequency and Amplitude in case of nearly linearly time varying parameters) is relative to the center of the window.

Rows:: as many as peaks estimated in the frequency estimate

Columns:

Frequency
Amplitude
Phase
Confidence

Matrix 1TRC SinusoidalTracks

Status:: standard

Description:: Sinusoidal tracks are sinsusoids with time-varying Frequency, Amplitude and Phase. Obviusly, giving these values at discrete times (Frame times) only is imprecise unless an interpolation formula is provided. This is generally not done yet but can easily be added in the name value table 1NVT. One track is described by the rows with same index in successive adjacent frames and that's what index are for, row numbers being arbitrary. When such an index is absent in a frame it means that the track vanishes at the instant of the frame. Again what happens after the last or before the first occurrence of that index remains to be precised. The solution adopted at Ircam is to get sure that the first and the last occurrence always have a zero amplitude.

Rows:: as many as sinusoidal tracks.

Columns:

Index
Frequency
Amplitude
Phase

Matrix 1HRM HarmonicPartials

Status:: standard

Description:: Harmonic Partials are sinsusoids with time-varying Frequency, Amplitude and Phase and such that each one has a frequency which is close, or at least related, to the integer ith multiple (harmonic) of a common fundamental frequency (see 1FQ0). This integer number i, which is usually named the harmonic number, is named 'Index' here and figures in the first column. In a given 1HRM matrix, an index value should not appear more than once. Some Indexes may be absent in certain matrices meaning that the corresponding harmonic partial vanishes or has not been detected. See matrix 1TRC for other properties.

Rows:: A row describes one harmonic partial

Columns:

Index harmonic number of the partial
Frequency [Hz] partial frequency
Amplitude partial amplitude
Phase partial phase, between -pi and pi

Frame 1PIC PickedPeaks

Status:: standard

Description:

Matrices:

1PIC PickedPeaks required PickedPeaks

Frame 1TRC SinusoidalTracks

Status:: standard

Description:

Matrices:

1TRC SinusoidalTracks required SinusoidalTracks

Frame 1HRM HarmonicPartials

Status:: standard

Description:

Matrices:

1HRM HarmonicPartials required

Matrix 1HRE HarmonicityEstimate

Description:

Rows:

Columns:

MeanDeltaFrequency
Harmonicity
WeightedHarmonicity

Frame 1HRE HarmonicityEstimate

Status:: experimental

Description:

Matrices:

1HRE HarmonicityEstimate required Harmonicity

Spectral Envelopes, Transfer Functions and Filters

Matrix IENV SpectralEnvelopeInfo

Description:: Spectral envelopes information matrix, defines interpretation of following 1ENV matrix The linear+logarithmic scaling is linear under a certain frequency called Break Frequency, then logarithmic above. The exact formula is described in: Diemo Schwarz, Spectral Envelopes in Sound Analysis and Synthesis, Master's thesis, 1998.

Rows: number=1

Columns:

HighestBinFrequency [Hz] ?? frequency of the highest bin of the 1ENV matrix ??
ScaleType 0 for linear, 1 for linear+logarithmic scaling
BreakFrequency [Hz] break frequency when linear+logarithmic scaling

Matrix 1ENV SpectralEnvelope

Description:: Spectral envelope or magnitude transfer function in sampled representation. A spectral envelope is the envelope of the magnitude of a short-time frequency estimate such as the Short Time Fourier Transform. See for instance: Diemo Schwarz, Spectral Envelopes in Sound Analysis and Synthesis, Master's thesis, 1998.

Rows:: A row corresponds to one bin of the spectral envelope

Columns:

Env envelope bin

Frame 1ENV SpectralEnvelope

Status:: proposed

Description:: The reason for the optional 1GAI matrix is that in some cases it is useful to force the desired gain for the synthesised signal.

Matrices:

IENV SpectralEnvelopeInfo optional
1ENV SpectralEnvelope required
1GAI Gain optional gain of the original signal = desired gain for the synthesised signal

Matrix ITFC TransferFunctionCoefficientsInfo

Description:: Transfer function coefficients information matrix, defines interpretation of following transfer function coefficients matrix such as 1CEC, 1ARA, 1ARR, 1ARK.

Rows: number=1

Columns:

SamplingRate [Hz] SamplingRate corresponding to the transfer function coefficients.
Order Order of the estimation or number of coefficients of the corresponding filter.

Matrix 1CEC CepstralCoefs

Description:: Cepstral coefficients as in text books

Rows:: As many as Cepstral Coefficients wished.

Columns:

CepstralCoefficients

Frame 1CEC CepstralCoefs

Description:

Matrices:

1CEC CepstralCoefs required CepstralCoefs

Matrix 1ARA ARACoefs

Description:: Autoregressive coefficients as in text books

Rows:: The rows contain p + 1 autoregressive coefficients a₀ to a_p, where p is the order of the autoregressive filer.

Columns:

AutoRegressiveCoefficients

Matrix 1ARK ARKCoefs

Description:

Rows:: The rows contain p reflection coefficients k₁ to k_p, where p is the order of the lattice filer.

Columns:

ReflectionCoefficients

Matrix 1ARR ARRCoefs

Description:

Rows:: The rows contain p + 1 autocorrelation coefficients r₀ to r_p.

Columns:

AutoCorrelationCoefficients

Frame 1ARA ARKCoefs

Description:

Matrices:

1GAI Gain required Gain
1ARA ARACoefs required ARCoefs

Frame 1ARK ARKCoefs

Description:

Matrices:

1GAI Gain required Gain
1ARK ARKCoefs required ARCoefs

Frame 1ARR ARRCoefs

Description:

Matrices:

1ARR ARRCoefs required ARCoefs

Resonances

Matrix 1FOF Formants

Description:: Formant Waveforms as described in Chant.
Could be changed sometime to use 2RES plus a new 2FOF which would contain the columns Tex, DebAtt and Atten which are what FOF adds to RES.

Rows:

Columns:

Frequency
Amplitude
BandWidth
Tex
DebAtt
Atten
Phase

Matrix 1RES Filters

Description:

Resonances/Exponentially Decaying Sinusoids. Resonances data can describe the characteristics of a resonant system like a bank of second order section-filters, or can specify parameters for a model of sinusoids with fixed frequencies and exponentially decaying amplitudes. (If you put an impulse into such a group of filter banks, the output should be a sum of sinusoids with fixed frequencies and exponentially decaying amplitudes, so these two situations are in a certain sense the same.) The decay curve of a resonance should be the same as that of a two-pole filter with bandwidth equal to decay rate divided by pi. This formula gives the amplitude of each sinusoid over time:

        amp(t) = initial_amp * e ^ (- decay_rate * t)

The phase of a resonance specifies the initial phase of each decaying sinusoid. Ircam's programs still in the previous definition, i.e. columns 3 to 5 being:
3. BandWidth
4. Saliance
5. Normalisation of amplitude

Therefore the following columns should be that of 2RES:

Rows:: resonances

Columns:

Frequency
Amplitude
DecayRate
Phase

Matrix 1DIS NoiseDistribution

Description:: Noise Distribution matrix, defines a white noise random signal Column Distribution is an identifier (0 is uniform, 1 Gaussian...)

Rows:

Columns:

Distribution
Amplitude should be variance or standard deviation

Frame 1NOI

Description:

Matrices:

IDIS required NoiseInfo
1DIS NoiseDistribution required NoiseInfo

Frame 1FOB

Description:

Matrices:

1FQ0 FundamentalFrequencyEstimate required PitchModeHit
1FOF Formants required Formants
1CHA Channels required FormantsChannels

Frame 1REB

Description:

Matrices:

1RES Filters required Filters
1CHA Channels required FiltersChannels

Fourier Transform

Matrix ISTF FourierTransformInfo

Description:: See 1STF for details. WindowDuration is the duration of the window, in seconds,

Rows:

Columns:

PeriodOfTheDFT i.e., Sampling Rate in Hertz
WindowDuration in seconds.
FFTSize

Matrix 1STF FourierTransform

Description:

1STF frames represent the data that come out of a discrete short-term time-domain to frequency-domain transform such as an FFT. Here is a precise mathematical definition of this frame type:
Let s(i) be a discrete signal with sampling rate SR Hertz
Let w(m) be a window defined with the support [0, M-1], i.e., w(m)=0 for m<0 and m>=M . M is called the Window Size in Samples. The corresponding Window Duration in seconds is M/(Sampling Frequency).
Let N be the size of the transform
We define the input to the transform, x(n), as follows. Note that the windowed signal is 'put' at the beginning of the vector x(n).

Let x(n) =   s(i+n) * w(n)  for  0 lessThanOrEqual n lessThanOrEqual M-1
    x(n) =   0              for  M lessThanOrEqual n lessThanOrEqual N-1

(This is slightly redundant, since we define w(m)=0 when m>=M.) The 1STF matrix data is the Discrete Fourier Transform (DFT) of size N, i.e. the X(k) as follows. The DFT is a length N vector X, with these elements:

              N-1
       X(k) = sum  x(n) * exp(-j * 2 * pi * k * n/N)
              n=0

       0 lessThanOrEqual k lessThanOrEqual N-1

The time tag in a 1STF frame is the time of the center of the window, i.e., (i + M/2)/SR, not the beginning.
Notes:
This definition corresponds to the output of Matlab's (and UDI's) FFTfunction The real and imaginary parts come directly from this formula: therefore, if you compute a phase as atan2(imaginary, real), it is the phase of the corresponding COSINUSOID (and not sinusoid as we are used in additive synthesis) at time (i)/SR. Note that the windowed signal is 'put' at the beginning of the vector x(n) (then zero padding follows) and this is crucial for the phase definition. Because of aliasing and foldover above the Nyquist frequency (and below the negative Nyquist frequency), the output of the DFT can be thought of as a periodic function of frequency over the range -infinity to infinity. The period of this function is the range from the negative Nyquist frequency to the positive Nyquist frequency, in other words, the sampling rate of the input signal

Rows:: Bins

Columns:

Real
Imaginary

Frame ISTF FourierTransformInfo

Description:

Matrices:

ISTF FourierTransformInfo required Info

Frame 1STF FourierTransform

Description:

Matrices:

ISTF FourierTransformInfo required Info
1STF FourierTransform required FourierTransform
1WIN Window optional Window applied on the signal prior to Fourier transform.

Energy

Matrix INRG ScaleAndFactor

Description:: Energy information matrix, defines interpretation of following 1NRG matrix

Rows: number=1

Columns:

Scale
NormalisationFactor

Matrix 1NRG Energy

Description:

Short time energy of the signal, i.e.

                    m+N-1
       E(k) = 1/N *  sum  x(n) * x(n)
                    n=m

The time of the frame should be the center of the interval [m, m+N-1].
If a window is applied prior to calculation, it should be taken into acount and a IWIN and an optional 1WIN can be joined.

Rows: number=1

Columns:

Energy

Frame 1NRG

Description:

Matrices:

INRG ScaleAndFactor required ScaleAndFactor
1NRG Energy required Energy
IWIN WindowInfo optional WindowInfo
1WIN Window optional Window

Matrix 1BND Bands

Description:

Rows:

Columns:

LowerFrequencyLimit
UpperFrequencyLimit

Frame 1BND Bands

Description:

Matrices:

1BND Bands required Bands

Time-Domain Samples

Matrix ITDS TimeDomainSamplesInfo

Description:: Time-Domain Samples information matrix, defines interpretation of following 1TDS matrix

Rows: number=1

Columns:

SamplingRate

Matrix 1TDS TimeDomainSamples

Description:: Restrict this type to linearly quantized samples with no compression Unlike most other SDIF frame types, a frame of 1TDS data represents an interval of time (equal to the number of rows in the 1TDS matrix divided by the sampling rate) rather than an instant of time. The time tag of a 1TDS frame represents the beginning of this interval. Most SDIF streams containing 1TDS data will consist of a single large frame at time zero with all of the samples for the stream in a single matrix. The same data could be represented equivalently in a series of shorter frames. There is also the possibility of "gaps" in the time axis,

Rows:: Sample frames

Columns:

Amplitudes in each channel. Linear. All but the first are optional.

Time markers

Matrix 1PEM PeriodMarker

Description:: 1PEM matrix type stands for PeriodMarker. PeriodMarker are used for pitch-synchronous algorithm such as PSOLA synthesis, or pitch-synchronous analysis. The most general definition of period markers is "the distance between two markers is equal to the local fundamental period". Instead of this definition which only gives a relative positionning of the period markers, an exact positionning of the period markers is often required. This absolute positionning of the period markers depends on the analysis method used. This is the reason why it is essential to refer to the analysis method used in the matrix. The matrix can contain other columns for the parameters of the method.
Method: 1
Definition: phase +Pi/2 of the 1st partial SINUSOID which represents this partial
Note: To be coherent with additive partial , a partial is represented by a sinusoid. The interesting point is at phase +Pi/2 since it is closer to the maximum of energy needed for PSOLA.
Method: 2
Definition: group delay
Note: In brief, an amplitude wheigted mean group delay provides a delay which, added to the begining time of the analysis frame gives the time of the mark
Method: 3
Definition: Glottale closure
Note: A method developed for speech
Method: 4
Definition:Maximum of energy synchronised with F0
Note: Needs a precise definition
Method: 5 ...

Rows:

Columns:

Identifier
Parameter1
Parameter2
Parameter3

Matrix ITMR TransientMarkerRepresentation

Description:

Rows:

Columns:

Index
Frequency
Amplitude
Phase

Matrix ITMI TransientMarkerIdentifier

Description:: Matrix identifier for transient in the signal such as occurs at tke beginning of percussive sounds and which need special care.

Rows:

Columns:

Index Identifier of the method used to detect and estimate this marker

Voiced-Unvoiced Decision

Matrix 1VUN VoicedUnvoicedNorm

Description:: 1VUN matrix type stands for Voiced/Unvoiced Normalized. It represents the voicing coefficient (harmonicity or periodicity coefficient in the general case) at a given time. The "voicing" coefficient is a measure of the part of the signal which is produced by the vocal folds (which is harmonic or periodic in the general case) at a given time. Range: [0,1]. 0 means purely unvoiced, 1 means purely voiced.

Rows:

Columns:

VoicingCoefficient Normalized voicing coefficient, range: [0,1]

Matrix 1VUF VoicedUnvoicedFreq

Description:: 1VUF matrix type stands for Voiced/Unvoiced Frequency. It is the cutting frequency below which the signal is considered as voiced (harmonic or periodic in the general case) and above which it is considered as unvoiced (non-harmonic, non-periodic or noisy in the general case). Range: [0, 1/2 Nyquist Frequency].

Rows:

Columns:

CuttingFrequency [Hz] Voiced/Unvoiced cutting frequency, range: [0, 1/2 Nyquist Frequency]

Frame 1MRK Marker

Status:: proposed

Description:: 1MRK frame type stands for Marker. Usually analysis method estimate parameters at a given time (examples: sinusoidal analysis, spectral analysis, ...). In the opposite, the results of some analysis method can be a serie of times (examples: detection of the times where abrupt changes occur, detection of the times of glottal closures, ...). The results of these analysis, which is essentially a series of times, is stored in as a series of frames. The time of each of these frames is the resulis of the analysis and the matrix inside each frame describe what this frame refer to, or in other word which analysis method has decided to point to this time (example: this frame time has been pointed by a glottal closure detection algorithm). In some cases, it is also important to add information related to the results (example: coded waveform of a transient).

Matrices:

1PEM PeriodMarker required PeriodMarker
ITMR TransientMarkerRepresentation required TransientMarkerRepresentation
ITMI TransientMarkerIdentifier required TransientMarkerIdentifier

Frame 1VUV VoicedUnvoiced

Status:: proposed

Description:: 1VUV frame type stands for Voiced/Unvoiced. This frame contains information about the voicing (harmonicity or periodiodicity in the general case) property of the current time.

Matrices:

1VUN VoicedUnvoicedNorm required VoicedUnvoicedNorm
1VUF VoicedUnvoicedFreq required VoicedUnvoicedFreq

jMax Physical Model Parameters

Matrix EMPM Tableau

Description:

Rows:

Columns:

Value
Index

Matrix EMJR EndRecording

Description:

Rows:

Columns:

Record

Frame EFPM PhysicalModelParameters

Status:: private

Description:

Matrices:

EMPM Tableau required Tableau
EMJR EndRecording required EndRecording

About This Document
This document types-doc.html generated Tue Aug 22 20:33:24 2000 by xmltohtml.pl from sdiftypes.xml
Generation	schwarz	Tue Aug 22 20:33:24 2000	kethuk.ircam.fr /u/formes/schwarz/src/SDIF/types
Generator	xmltohtml.pl	Tue Aug 22 18:21:48 2000	$Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $
Source file	sdiftypes.xml	Tue Aug 22 20:33:15 2000	$Id: types-doc.html,v 1.1 2008/09/10 16:14:49 diemo Exp $
Back	SDIF Home	Analysis/Synthesis Team	IRCAM