The Sound Description Interchange Format (SDIF) is an established standard for the well-defined and extensible interchange of a variety of sound descriptions including representations of the signal for analysis-synthesis like spectral, sinusoidal, time-domain, or higher-level models, sound descriptors like loudness or fundamental frequency ,markers, labels, and statistical models. SDIF consists of a basic data format framework and an extensible set of standard sound descriptions.
The SDIF standard has been created in collaboration by Ircam-Centre Pompidou, Paris, France, CNMAT, University of Berkeley, USA, and the Music Technology Group (MTG) of the Universitat Pompeu Fabra, Barcelona, Spain. There are many references on how to use it from C, Matlab, command line, etc. This is a rapid top-down introduction to the concept for the beginner. It is also useful to refer to the more detailed bottom-up introduction of the the SDIF standard definition. SDIF is a standard format for storage of sound descriptors, e.g., F0, frequencies, amplitudes, and phases of the partials, spectral envelope, or even time markers and time selections. A simple SDIF file contains a collection of frames, organised into one or more parallel streams. There are a few special frames (file header, information, type definition etc), and data frames. Frames are distinguished by means of their time position, their stream-id and their type. The sequence of frames in an SDIF file has to be sorted such that frame time is never decreasing. There may exist multiple frames for the same time - however you should never store two frames with the same time, type and streamid into the same file. Streams can be used to group frames at different time instances together. They may be used as well to distinguish frames at the same time position. IRCAM programs will often use streams to group data of the different channels of a multi channel file. In this case the first channel is usually stored in stream 0. Note, however, that the connection of streams and channels is not defined by the SDIF standard. For each frame, the data is then stored as one or several matrices of arbitrary size and type depending on the data stored. The matrix types that are allowed in a frame are defined here. Note that this document reflects IRCAM's notion of the standard, which is a superset of the types agreed upon with other institutions (CNMAT). Frame and matrix types are distinguished by a sequence of 4 characters -- the type signatures. The signature for the frame that contains information about fundamental frequency is 1FQ0. This frame contains 1FQ0 matrices, which in turn are used to describe the fundamental frequency. We see that frame signatures and matrix signatures can be the same. For each frame there exists required and optional matrices. Matrices in turn contain columns of data. Again there exist required and optional columns in the matrices. The 1FQ0 matrix contains the required column for the F0 value and 3 optional columns for additional information. The IRCAM standard frame types are known to the SDIF library. If you want to store additional matrix or frame types in an SDIF file, the extended frame and matrix definitions have to be added to the SDIF file header. If you define new frame or matrix types you should start their signature with an X such that extensions and standard types are easy to distinguish (AudioSculpt/supervp for example use the XTRD matrix signature to store an extended transient description). Note, that you are not allowed to store more than one matrix of the same type in a single frame!