Alfred Lang

University of Bern, Switzerland

Research Project 1988

Timbre Constancy in Space:

Hearing and missing spatially induced sound variation


@Audit @GenPsy @EnvPsy

46 / 53KB  Last revised 98.11.01

In collaboration with Roland Calmonte

Excerpts in English of the NF-Research-Proposals 1.403-1.86 and 1113-025480

© 1998 by Alfred Lang

Scientific and educational use permitted

Home ||

1986 Proposal

1988 Follow-up Proposal



This English version renders the essential parts of the original project (March 1986) and of the follow-up proposal (September 1988). Parts in italics are comments or summaries of passages that are not essential to understanding the objectives and the proposed procedures of the project.


1986 Proposal

Abstract: In walking with a speaking person through a series of different rooms (empty corridor, carpeted cabinet, high halls, etc.) one usually hears the person's voice (timbre quality) remaining quite the same, in spite of the fact that the sound impinging at the listener's ear undergoes massive changes due to the varying reflecting conditions and natural fluctuations in speaking manner. Of course, there are extreme conditions (e.g. a nearly reflection-free chamber, a reverberating hall, or intentional vocal disguise), where such changes are readily perceived; in addition, specifically attending to the varying timbre qualities (analytic hearing) can make them audible.

This is an example of "timbre constancy", a characteristic of perceptual processing which is structurally similar to size or shape or color constancy in vision (see Bischof 1966, Epstein 1977). Timbre constancy refers to the fact that auditory perception as a rule does not extract information from (i.e. "misses") certain aspects of sound variation and also adds its own characters.

In the examples given above, the auditory system in a way "compensates" or "rectifies" (Bischof 1966) the modifications introduced by the selective absorption and reflections of the sound introduced by room properties. In addition, a number of conditions are possible, where modifications of the sound occurring at or near the source or on the way to the listener are totally or partially missed, or where the perceiving system "reconstructs" (Bischof) or "construes" something which was not given as an actual stimulation. One can also think of distortions of the human voice by permanent or passing handicaps or by inadequate transmission techniques etc. In addition to the spatially induced sound distortions there are superimpositions of intruding sounds from secondary sources, the effects of which are probably only partially explained (away) by spatial selectivity of hearing.

The present project aims at an explorative inventorisation of timbre constancy; emphasis is placed on the spatially induced variation or invariance (constancy) of timbre. The project therefore attempts at a "mapping" of the constancy of some selected sounds (in particular the human voice and musical instruments) under various spatial conditions. "Constancy mapping" refers to finding and describing those variations of the physical characters of a sound which, under given hearing conditions, do not lead to directly corresponding variations in the perception of the given sound. We want to predict and to produce in subjects specific discriminations between and confoundations among sounds by analyzing and systematically re-synthesizing given sound-patterns. For the moment, the study is of an explorative nature: we want to find out which sound patterns are "distinct" (i.e. preferably realized, in the sense of Prägnanz) in perception, i.e. which sound characters are conducive to perceptual invariances, or more generally, which (combination of) sound parameters play which role in timbre constancy. In addition, the hypothesis shall eventually be checked and refuted that all phenomena of timbre constancy in space are nothing but spatial selectivity of binaural hearing or source-congruent "streaming" (in the sense of Bregman).


2.2.1 Present state of knowledge, relevant literature

Strangly enough, timbre constancy has not been investigated so far, to our knowledge, it has not even been described as a research problem. In this section we therefore deal with some conceptual problems and background research.

The problem of timbre constancy could also be formulated as the question of why the perceptual system is so highly adaptive to spatially induced sound variation, although it is at the same time highly sensitive to invariances of sound variation that arises at the source (high recognition rates despite variation). Timbre constancy means that psycho-physical correspondence is not given between the proximal sound stimulus and the resulting perception. On the other hand, it may be possible to obtain psycho-physical correspondence by taking into account some additional stimulation and/or dispositions in the perceiving system (general "knowledge" about or particular experience with some properties of the world) in such a way that perception finds its explanation on a higher level, so that, as Brunswik maintains, perception attains in a way the "distal" object.

In functional terms, timbre constancy brings a reduction (resp. a modification) of the information transmitted from the sounding event to the hearing phenomenon. This reduction is obviously economically important and probably also ecologically meaningful, because, as a rule, it can be assumed that the "constant" percept screens out irrelevant stimulus- or excitation-components which are not necessary for the purpose of recognizing events or for the purpose of orienting oneself in the world of sound. Also probably those standing characters of the events are emphasized which are relevant in the long run; or the the process might even result in an idealized percept (Koffka) instead of simply being veridical to the factual sound. In this respect, the perceptual constancies are somehow similar, at least functionally, to the phenomena of perceptual categorization.

Here follow some conceptual remarks in view of a holistic psychophysics or a comprehensive understanding of the relationship between stimulus and perception. In auditory perception, the classical psychophysical problems seem to have a longer life than in visual perception, because the physical parameters of the signal (esp. frequency and amplitude) are so prominent and appear to be so directly related to hearing phenomena (such as pitch and loudness). The relativity of the relations between pitch and frequency as well as loudness and amplitude is pointed out.

Considering timbre qualitites (Klangfarbe, sound color) it is evident that a psychophysics starting from a selected stimulus quality has never been attempted because there is simply no such quality which lends itself to be readily quantified. Thus timbre qualities as a (group of) perceptual dimension(s) is an obvious candidate for the development of an holistic psychophysics. For more than a century after Helmholtz the view was not called in question that timbre perception can be explained on the basis of the spectral composition of the sound; although this view is not completely wrong, present-day statements are more cautious and also refer to - by the way in agreement with early observations of Stumpf - things like envelope characteristics and spectral shifts in time (see Moore 1982 and others). Timbre quality then is (at least in tones, i.e. periodic sounds of non-negligible duration) a collective term for all those qualities of an auditory percept which enable a listener to distinguish between two or more sounds of the same pitch and the same loudness (Schouten, 1968; Plomp 1970, 1976).

Research on timbre qualities , esp. factor analytic or multidimensional scaling work based on comparison judgments about selected complex sounds (musical instruments, synthesized sounds, human voice) is selectively reviewed. It is emphasized that knowledge on "timbre space" is an important starting point for our research, although there is actually not much convergence in the available studies and their results are all too dependent on the arbitrary selection of the particular sounds to be investigated. In addition, most of this work uses single tones; Grey (1977) has shown that in timbre judgments of tone series or chords attention is given to other characters of the sound than in judgments of single tones.

Reference is also made to the field of applied architectural acoustics (Schroeder 1979) and subjective musical room acoustics" (Rasch & Plomp 1982). Finally recent literature on methods using digital analysis and synthesis is mentioned.


2.2.2 Own research

Own research on absolute pitch (Tautenhahn 1976, Hurni-Schlegel & Lang 1978, Hurni-Schlegel 1983, Andres 1985 etc.) and on loudness constancy (Rytz 1977, Calmonte 1987 etc.) is briefly reviewed. The studies on loudness constancy started with (unsuccessful) attempts at replicating Mohrmann (1939) and arrived at the conclusion (which became the foundation of the present proposal) that the psychophysical relation between intensity and loudness cannot be conceived to be unidimensional and also should not be isolated from other parameters of sound and of audition.


2.2.3 Detailed research plan

The studies to be undertaken are planned to demonstrate in exemplary fashion, under what conditions and to what extent auditory perception is capable of abstracting exactly the irrelevant spatial information contained in signals that merge properties stemming from the source and from spatial circumstances. We shall make use of a digital analysis-synthesis-procedure on a dedicated lab computer with suitable DA- and AD-converters as well as recording and presentation devices. The accent is on objects that are psychologically attainable, both in the phenomenological domain and in the form of perceptual performances, such as discrimination and scaling judgments or recognitions. At any time we deem important to specify the correspondence between psychological data and exact physical descriptions of the stimulating sounds.

The research procedure can be described by specifying the following steps:

1) Concrete sounds of at least some seconds duration (human voice speaking, musical sequences) are recorded in a standardized way in a reflection free chamber.

2) These sounds are played and re-recorded under specified reflective conditions, (i.e. in a reflection free chamber with added reflective areas in graded amounts and systematic placements); thus a first set of sound recordings in digital form is made which represents some systematic variation of spatially induced sound parameters.

3) These sounds are presented to a sample of listeners and their pairwisely judged similarity is obtained; by means of a multidimensional scaling procedure a dimensional system or perceptual-cognitive organization ("timbre space") of these sounds is then calculated. It is intended to present the sounds by real-time DA-conversion directly from their digitized versions stored in the lab computer, thus enabling a flexible pairwise selection of the sounds and economic registering of the subjects' judgments.

4) A detailed physical analysis of those sounds is then made which have been found to be systematically grouped or differentiated in the perceptual analysis (in particular, it is planned to perform Fourier analyses using different window sizes; eventually it seems meaningful also to perform analyses using - instead of the simple frequency continuum - the critical-band-rate function proposed by Zwicker & Terhardt (1980) and Terhardt et al. (1982) which is physiologically more meaningful). The objective of theses analyses is to find and define physical sound qualities or parameters which covary or correlate with the psychological organization of the heard sounds. The analogy in vision would be to first obtain the perceptual order of colors, which is in fact the 3-dimensional color double-tetrahedron, and then to look for those qualities of light spectra that coordinate with any place in the tetrahedron. Metamerism of color stimuli is the well-known state of affairs in vision; what we are looking for here is the analogue of metamerism in audition. It will be indispensable to employ relatively fine-grained graphic presentation and comparison procedures both in the time- and the frequency domain of acoustic analysis. Ingenuity of the researcher has to be used in "partnership" with the rich new possibilities of computer-based information processing.

5) If it is possible to find such correlated properties - at first they should be defined on a tentative basis and used as guiding hypotheses - we can proceed at synthesizing sounds which articulate and exaggerate exactly these properties e.g. by increasing variation, by adding extreme values of a given variable, by systematically constructing fields of a multivariate matrix of properties, etc. In addition to re-synthesizing such sounds following manipulations in the frequency domain, it will probably be necessary to also directly manipulate the time domain signals by various digital filter operations. It is coercive to employ real-time processing for digitally recording and replaying the sounds; however, for analyses and syntheses, which will be partially algorithmic and partially intuitive and heuristic, normal computing procedures are sufficient. Beyond graphic representations of intermediate and final results of these computations it will also be necessary to have immediate real-time auditory control.

6) The new sounds which result from step (5) will be presented again to subjects to obtain comparative judgments and also to collect and evaluate their spontaneous impressions, phenomenological descriptions and comparisons etc. In addition discrimination and recognition experiments (confusion matrices) shall be performed, all aiming at the formulation of an organization of auditory phenomena together with its coordination to the physical variation of the sound; pertinent charachters are then known through constructing these sounds rather than by merely describing given sounds. We call this procedure "timbre constancy mapping" and expect some organization of auditory phenomena to occur in analogy to the well known criteria of "good" Gestalt or "Prägnanz" in such a way that we will be able to point to prototypically distinguished sound characters that are relatively immune against spatial modulations, whereas others can be characterized as intermediate or transitory.

7) Finally it will be interesting to check and falsify the hypothesis that all timbre constancy phenomena could be reduced to spatial selectivity of binaural hearing, i.e. to a form of "streaming" which is based exclusively on the localization of sound components, thus allowing to extract all reflection dependent reverberations. In parallel to the procedures used in step (5) above we shall synthesize sounds varying systematically in respect to intensity and delay of reverberations or which present original sound and reverberations in varying compounds to the two ears of the listener. Possibly, sound synthesis can be guided by Steeneken & Houtgast's (1980) spatial modulation transfer function. The hypothesis can be refuted, if it is possible to demonstrate the same perceptual invariants to occur with sounds containing different or no reverberation components.


2.2.4 Timing

The present research proposal is presented in essence as an entry project, planned for a period of two years. The first year will in the main be used to build up and test the instrumental setting; in the second year, the systematic studies described under 2.2.3 will be taken up. (Follow some procedural details.)

We expect that in the course of finding the first evidence for timbre constancy, a number of particular questions will come to the fore which we shall approach later in specific studies. It is probable that the apparative investments can be used later, although some additional equipment might be necessary. An analysis-synthesis-system, however, is general enough to be also suitable for research into related questions.


2.2.5 Conjectured importance of the proposed research

Although it is generally true that the perceptual dimensions of pitch, loudness, and timbre are related to the physical sound qualities of, respectively, frequency, intensity, and spectral composition, traditional psychophysics in the auditory realm can be judged as at best a coarse approximation or, in principle, as having failed its original objective. The last is particularly true for the province of timbre qualities where a rich variety of perceived qualities (of voices, musical instruments etc.) is probably mappable into rather few dimensions in spite of the fact that they represent an enormously vast or infinite number of possible stimulus configurations (viz. all frequency specific amplitude changes over time). In parallel to color vision (it is impossible to predict the seen color for any possible light spectrum without prior knowledge of the spectral sensitivity of the cones, of the neural processing at several levels, and of the present state of large parts of the perceptual system!) it might thus be preferable to start the investigation of auditory perception at the hearing phenomena rather than at the stimulus properties, although it is unforgoable to keep contact with the relevant stimulus properties at any time. Psychologically (perceptually or phenomenologically) simple entities can correspond to physically complex entities which must eventually be described comprehensively. Thus the task of a "reverse" or holistic psychophysics is put up, i.e. "primary" auditory events, or auditory events tending towards invariance have to be looked for which can be coordinated to an exactly specifiable variation range of physical parameters within a specific contextual framework.

The classical definition of the problem of perceptual constancies states: how is the perceptual system capable of producing a constant perception in spite varying stimulus conditions? Usually "constant" is understood to mean an approximation towards "reality", although this is not always the case. We would prefer to define: what are the preconditions in the perceptual system and how do they work so that one and the same ("constant") perception results from a set of stimulus variation (metamerism), and how can such sets of variations be portrayed?

It may then be indispensable to ask together with the constancy question its complement, i.e. the discrimination question: what are the preconditions in the perceptual system and in the stimulus manifold, that is always implied in an actual stimulus configuration, for the possibility of treating two or more things or events differently? Related questions are those for categorization and for order as well as those for recognition and for scaling. On the other hand the ever encumbering distinction between the analytic-distal vs. the "naive"-proximal attitude loses its import, because it is just one of many frame conditions in the perceptual system.

The proposed research into the not yet defined or investigated problem of timbre constancy in space is suitable to serve as an example and vehicle for elaborating such a wider conception of the psychophysical question and thus to deepen our understanding of perception in general. In addition, on the concrete level some indirect contribution towards a better understanding of acoustic architecture can be expected form this project.


Top of Page  

1988 Follow-up Proposal

Abstract: Timbre constancy, understood as the systematic yet equivocal relation, i.e. not one-to-one, between physical characteristics of a sound and those qualities of the heard which do not immediately appear as pitch or loudness, is proposed to be empirically investigated for the first time in an attempt at inventorisation and defining the relationship to perceptual concepts such as categorization. The main interest is to clarify the effects of those components of naturally occurring sounds which are added to the signal of a given source by the spatial surrounding and which, according to everyday experience, are mostly missed by the listener, and which, however, under certain circumstances, can also be evaluated for information about the spatial setting. The role of these modifying circumstances, of the sound properties relevant for timbre constancy, and of the resulting auditory dimensions are to be elucidated in several series of experiments which build upon each other.

In the physical realm, the experiments employ an analysis-synthesis-technique on the basis of digitized sounds; sound signals are to be analyzed and modified ("manipulated") in the frequency and in the time domain by means of various algorithmic and graphically supported intuitive procedures with the aim of defining and systematically modifying perceptually relevant sound parameters. For the perceptual side, sound stimuli recorded in concrete settings as well as the above described synthesized sounds are to be presented to listeners in order to obtain comparative perceptual (similarity) judgments which in turn are subjected to multidimensional scaling or factorization procedures in an attempt at revealing the perceptual-cognitive order or internal representation of timbral and spatial characters. The research thus aims at defining timbre constancy, i.e. the specific coordination between the relevant sound properties on the one hand and the perceptual-cognitve auditory organization on the other hand.

This follow-up project proposes to proceed on the entry project of 1987-89. The equipment now built up has proven to be functional as well as suitable for the perception psychological objective proposed, although some technical shortcomings have come to the fore which need to be corrected. The proposal will touch in general terms upon some possible improvements of the equipment planned for 1989/90 as well as upon criteria to be considered in an eventual reconfiguring the system; however, details will not be proposed, before the respective hardware and software are available on the market.A supplemental report on the the results of the first psychological study will be presented in November/December 1988 (see appendix A).

The following research proposal is supposed to be read on the background of the project of march 1986 and also in connection with the first interim report of 15.5.88. The general question is the same, viz. the exploratory inventorisation of the so far not researched phenomenon of timbreconstancy, i.e. the equivocal yet systematic coordination of sound and timbre, especially considering the spatially induced sound distortions. The empirical studies aim at elucidating the phenomenon in connection with other general perception problems, in particular the extended psychophysical question (see 2.2.5 of the 1986 proposal); in addition some insights into questions of architectural acoustics are to be expected.


2.2.1 Present state of knowledge, relevant literature

Timbre constancy being a not yet researched topic, it is not surprising that no directly relevant titles could be found in the literature so far. A casual mention of the "missing to hear"-aspect of our question in the most recent Annual Review article on auditory psychophysics (Dreschler 1987, p. 188) is significant for the neglect of the perceptual pertinence of timbre constancy: the reviewer speaks of "decoloration" as the "removal of audible effects caused by reflections", and he attributes this removal summarily to binaural hearing. Everyday experience, however, shows this "removal" at least to be selective, in that spatially induced sound distortions in part also are evaluated for information about the spatial setting. Our research project aims at finding out, under what conditions the human perceptive system is missing which sound qualities, and which other sound qualities it is processing into which auditory qualities, timbral and other qualities.

In psycho-acoustic literature one still finds dominating the practice of departing from a spectro-temporal analysis of sound, although insight is spreading - especially in connection with the equivocal relationship between sound and phoneme in spoken language - that spectro-temporal information is an insufficient, although certainly essential, basis for hearing (see Pisoni 1985 an others). Out of the recent psychophysical studies we find two aspects of special interest:

(a) The concept of critical bands is of utmost importance for understanding, at least heuristically, the connection between the different hearing phenomena amongst each other (timbre and pitch, loudness, spatiality etc.), in particular the idea of interactions between different (also remote) bands - in spite of the still prevailing scarcity of facts (de Boer & Dreschler 1987).

(b) In matters of temporal properties of sound, presentday research almost exclusively is concerned with questions of resolution and integration (see Michelson 1985), whilst our interest is directed towards the information contained in temporal intervals and durations between sound events that can be or are related between themselves (e.g. reverberations) and which hearing can miss or make use of. Hafter & Buell (1985) were concerned with temporal structures in sound, but in a more limited way.

At this time it is not foreseeable whether or not the concept of profile analysis proposed by Green & Mason (1985) is applicable to reverberating sound, whether or not a relationship with streaming exists (e.g. Bregman & Pinker 1978), and whether or not it can be applied to our type of question.

The literature on architectural acoustics has to our knowledge not revealed new points of view. The physicist and musicologist Dorothea Baumann in Zurich is working on similar problems as we do, although from a practical point of view. She records music in real concert halls under systematic variation of the recording location and has the records judged by listeners. However, she employs an exclusively intuitive method, both physically and psychologically; exchange and partial cooperation are on the way.

Another most active field is computer music. Equipment and systems for analyzing and synthesizing sound are pouring on the market in a nearly unsurveyable profusion, most of which in connection with the feasibility and dispersion of digital processing. As far as we could see, no recent developments seem to be immediately pertinent to our cause, although we hope increasingly to profit from the technical innovations.

Experiences with the analysis-synthesis-technique made in the group of Klaus E. Scherer (Giessen, and now Geneva) are important to us. Although Scherer's field of interest is emotional expression and impression formation in (verbal, paraverbal) communication, his and our procedures have much in common; we have started mutual exchange of knowledge.


2.2.2 Own research

Under this heading reference is first made to 2 finished doctoral dissertations which are not directly pertinent to the project. Furthermore the interim report of 15.5.88 and the request for prolongation of the project of 14.6.88 are mentioned. The text broadly reviews some of the technical problems and accomplishments of the period from April 1987 to September 1988. It suffices here to summarize the most important points, among them our unexpected problems with Hewlett-Packard (see also appendix B).

The present configuration of the system is represented in the schema below: :

(a) There were (minor) configuration errors which, after at first misleading suggestions by H-P, could be identified only in March 1988 (due to (b) below).

(b) Software was delivered in usable versions rather late so that some parts of the system could only be tested in June 1988: Basic 5.0 in August 1987, SPAM in January 1988, CAT in March 1988 (software) resp. June (manuals); additional interface card in June 1988.

(c) With the help of a consultant specialized in signal analysis on H-P-Systems (Gremli AG) the system was functional and successfully tested in mid-June 1988.

(d) We have been digitally recording ("throughputting") 144 sound samples (see appendix A) in the second week of July 1988. It was obvious, from our own listening to the sounds recorded and also from the first physical analyses, that we had collected a usable material for our planned procedure. There were sounds with clearly distinguishable spectra which sounded practically the same to a listener. Immediately afterwards we were starting our first systematic physical and psychological analyses, betd then we run into our next disaster.

(e) Since the 144 one-second sound samples (sampled at of 65kHz) would not find enough space on the throughput hard disk, we were backing up the data on tape and an additional disk system and copying those files back to the main disk which were to be used for analysis and presentation. Because downloading of one sample to the sound production system takes time of the order of half a minute, we had to copy the sounds used for presentation to subjects on a (private DAT-)tape deck in the suitable pairwise comparison format.

(f) It happened then that our data became gradually rotten as if a computer virus had taken possession of our system. What had happened was that the file copy operation of the (not supported) SPAM software supplied to us by Hewlett-Packard did - without warning on screen or in manual! - not actually copy or backup the data files but rather its headers only. So in time we gradually destroyed our data by simply "copying" and "recopying" them!

(g) With the sounds available on the DAT-tape we could restitute a reasonable selection of items to enable us to proceed with a first cycle of analysis, although the quality loss due to the several successive DA-, AD-, DA-, and AD-conversions between the lab computer and the DAT should not be forgotten. (Preliminary results are given in appendix A.)


2.2.3 Research plan and 2.2.4 Timing

For further investigations we will need new data recordings in the reflection free chamber. At this time (September 1988) it is not possible to give precise details as to the research plan beyond the statements made in the proposal of 1986 (see 2.2.3on page 2f. above), because such statements should be based on the results of the first studies. Nevertheless, our results so far reasonably confirm the feasibility of our proposed procedure.

The report on our first two studies given in appendix A (December 1988) is presented as a further support of the above statements.

In the course of the 3 years proposed for the follow-up project we should like to summarize the following intermediate objectives and procedures:

1) On the basis of the analyses performed on the remaining sounds and together with new sounds (called starting sounds in the Preliminry Report of December 1988) recorded and analyzed during the winter 1988/89 we shall be able

(a) to describe perceptual dimensions which summarize the results of the perceptual-cognitve analyses, and

(b) to determine by means of some analytical tools in the frequency and time domain (algorithms and graphically supported intuitive procedures) some physical sound features or properties which might be correlated with the perceptual dimensions found.

2) Starting with these findings on the particular (starting) sounds analyzed so far, we shall program procedures which allow to enhance the saliency of such features or properties by operations on the digitized sound signal (purposive sound manipulation).

3) The application of these manipulating procedures to selected (starting) sound signals will lead to the construction of new sound series, about which we are able to formulate hypotheses as to their probable position on the perceptual dimensions of timbre space.

4) These sound series are presented to subjects in the usual pairwise comparison paradigm, and on the basis of their judgments new multidimensional scaling analyses are calculated which allow a quasi-experimental testing of such hypotheses.

As to the (starting) sound material to be used we plan to construct musiclike chord sequences, spoken words and perhaps everyday noises. In any single study, for gaining perceptual-cognitive auditory dimensions (timbre space) we typically use subject groups of up to 20 persons which are presented some I=10 to 15 sounds for I2/2 similarity comparison judgments. Multidimensional scaling procedures typically will use INDSCAL algorithms. Of a given set of sound recordings it is possible to extract the material for several sets of sounds and to perform several sets of sound manipulations in order to pursue specific topics.

It will be desirable to run through the cycle indicated in points 1 to 4 above for 2 to 3 times, and to make use of the results of the previous cycle in matters of substantial points. The details of each study are planned in view of the overall criteria of inventorisation of the phenomenon of timbre constancy, with particular emphasis on hearing and missing of spatially induced sound distortions. Whether it is already possible to study explicitly the role of binaural parameters in timbre constancy (see 2.2.3, point 7 of the 1986 proposal), or whether it is advisable to defer this topic to later studies, is to depend on the progress of the studies.

It is intended to report regularly to expert conventions and in the specialized literature as soon as valuable results are available.


2.2.5 Conjectured importance

Enough has been said about the significance of the planned research in the 1986 proposal. However, it is perhaps be desirable to make explicit some thinking that may have become already too self-evident to the applicant.

The concept of perceptual constancy, in the second third of this century, has been only weakly influenced by the Gestalt theory tradition (esp. Koffka); much stronger, if not almost exclusively so, it has been determined by functionalist traditions (esp. Brunswik, and then Gibson and his followers). In my understanding, the concept in this second tradition has been prejudiced by the veridicality issue, i.e. the concept of attainment of reality or the achievement component which lies in the often implicit assumption that perception is exclusively determined by the real world out there. Since one can know nothing about the real world out there without perception, this constancy concept is circular in principle. In describing the phenomenon already enters a particular theoretical point of view. We want to break that vicious circle by conceiving perceptual constancy simply as a clustering of and a coordination between stimulus properties and phenomenal qualities. (Actually Shepard (1981) has argued along very similar lines.) In our opinion timbre constancy is an especially good example for that purpose, because the pertinent stimulus properties are not only too complex but also completely different, to nourish any expectations of perceived timbre being some approximation to "timbral" properties that could be thought as given in the sound itself and being measurable as such. Of course, the same question is also to put in reference to size constancy and all other constancies for that matter; because, in what respect is seen size of a man or of the moon an approximation toward their respective real size?


2.2.6 Technical shortcomings and their eventual elimination

In this section mention is made of 3 shortcomings of the present configuration of our analysis-synthesis-system which need correction, partly because of possible severe data degradation, partly for economical reasons of data collection and handling. Possible solutions are described, but at the moment any decision is deemed premature, because of the quick changes in the marketplace.

1) File management. The problems with the file copy operation should be solved with the help of Hewlett-Packard (see appendix B). Our provisional solution using cascaded analog-to-digital-to-analog conversions is not acceptable in the long for reasons of data degradation.

2) Digital resolution of the sounds recorded is presently at 12 bits. Our experience demonstrates that this is not sufficient, in that glitches are audible in the output signal which are interpreted to be sampling artifacts. Upgrading the input- and output-systems to at least 16 bits resolution is imperative also for reasons of reducing possible artifacts produced by rounding operations.

3) The output-system used (HP-Multiprogrammer) is leading to very long download times (of the order of half a minute) so that it is impossible to present sounds to subjects directly from their digital form for the pairwise presentation. Again we have to admit the data degradation connected with DA-AD-conversion;, in addition, also the time spent in producing the tapes with some hundred sound pairs is wasted time that costs much more than suitable new equipment.

At this time (September 1988) we see 3possible solutions:

a) Since the Hewlett-Packard system is of very high quality as concerns the input-subsystem (Paragon) and of reasonable good quality in the central unit (somewhat aged and involved operating system and programming language, expensive mass storage), and since we already dispose of considerable experience with this system, it might be advisable to upgrade the HP-system to 16 bits and to improve on the output subsystem. At this time nothing is known about the availability of 16 bit conversion cards (see Appendix B).

b) Change to a similar system on DOS- or MacIntosh-PC basis. 16 bit conversion and processing cards appear on the market at the present time; however, it is not yet clear whether software of the same quality as the one used now will also be available in the near future. In addition we are afraid that the input conversion (filtering, anti-aliasing) will not be of the same quality as in the HP-system; this will be of less import perhaps in the case when cards with much higher conversion rates become feasible.

c) Shift all DA- and AD-conversion onto commercially available technology used in the CD- and DAT-market. This solution is only feasible when an interface card to an usable computer system becomes available. Such a card is presently announced for the MacIntosh NuBus. This technique would solve the input-quality problem present in solution (b) above, however, the software problem of solution (b) would still remain open.

The conclusion at this time is, to proceed for some time with the present system, adding DAT-input and output according to the schema on page 6 above, and allowing for the data degradation produced by the several AD- and DA-conversion steps involved. As soon as it becomes possible to make a rational decision on the above sketched long term solutions or on some additional solution not yet known, we should be able to improve our system in order to prevent reproaches as to the possibility of artifactual properties in our data which might eventually hide or, worse, fake the looked for coordination between sound and percept.

Top of Page 


(1988 references have year of publication in parentheses after author name)

Andres, K.: Stand in der Erforschung des Absoluten Gehörs - die Funktion eines Langzeitgedächtnisses für Tonhöhhen inder Musik. Dissertation, Universität Bern, 1985

Baumann, D. (1987): Musizieren im Raum - live. Pp. 47-59 in: Baumann, D. & Jecklin, Jürg (Eds.): Mono - Stereo - Quadro: die Aufnahme und Wiedergabe von Musik. Basel, Reinhardt.

Bischof, N.: Psychophysik der Raumwahrnehmung. In Metzger, W. & Erke, H. (Hrsg.): Handbuch der Psychologie. Göttingen: Hogrefe, 1966, 1. Halbband S.307-395.

Bregman, A.S. & Pinker, S.: Auditory Streaming and the Building of Timbre. Canadian Journal of Psychology, 1978, 32(1), 19-31.

Calmonte, Roland (1987): Lautheitskonstanz: ein ganzheitlicher Erklärungsversuch audiovisueller Wahrnehmungshänomene. Doktordissertation, phil.-hist. Fakultät, Universität Bern, 1987.

Deutsch, W.A.: Musik und Computer. In Bruhn, H., Oerter, R. & Roesing, H .(Hrsg.): Musikpsychologie. München: Urban & Schwarzenberg, 1985.

de Boer E. & Dreschler, W.A. (1987): Auditory psychophysics: spectrotemporal representation of signals. Ann.Rev.Psychol. 38 181-202.

de Bruijn, A.: Timbre-Classification of Complex Tones. Acustica, 1978, 40, 108-114.

Edwards, R.M.: A Subjective Assessment of Concert Hall Acoustics. Acustica, 1974, 30, 183-195.

Epstein, W.(Hrsg.): Stability and Constancy in Visual Perception. New York: Wiley, 1977.

Goldbeck, T.P.; Standke, R. & Scherer, K.R. (1988): Techniken der digitalen Signalverarbeitung in der vokalen Kommunikationsforschung. Psychol.Rundschau 39 191-200.

Green, D.M. & Mason, C. R. (1985): Auditory profile analysis: frequency, phase, and Weber's law. J.Acoust.Soc.Amer. 77 1155-1161.

Grey, J.M. & Moorer, J.A.: Perceptual evaluations of synthesized musical instrument tones. Journal of the Acoustical Society of Amarica., 1977, 62(2), 454-462.

Grey, J.M.: Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 1977, 61(5), 1270-1277.

Hafter, E.R. & Buell, T.N. (1985): The importance of transients for maintaining the separation of signals in space. In Posner & Martin (Eds.): Attention and Performance. Vol. 11, 337-354.

Hawkes, R.J. & Douglas, H.: Subjective Acoustic Experience in Concert Auditoria.Acustica., 1971, 24, 236-250.

Howell, P., Cross, I. & West R. (Hrsg.): Musical structure and cognition. London: Academic Press, 1985.

Hurni-Schlegel, L. & Lang, A.: Verteilung, Korrelate und Veränderbarkeit der Tonhöhen-Identifikation (sog.absolutes Musikgehör). Schweizerische Zetischrift für Psychologie, 1978, 37(4), 265-292.

Hurni-Schlegel, L.: Das Absolute Musikgehör: Analyse von Gehörstestdaten, Bestandesaufnahme bei Absoluthörern, Lernversuch zum Singen, Stimmen und Hören. Dissertation, Universität Bern, 1983.

Mathews, M.V.:The technology of computer music. Cambridge: MIT Press, 1969.

Michelson, A. (Ed., 1985): Time resolution in auditory systems. Berlin, Springer.

Mohrmann, K.: Lautheitskonstanz im Entfernungswechsel. Zeitschrif f. Psychologie 1939, 145, 146-199.

Moore, B.C.J.: Psychology of hearing. London: Academic Press, 1982.

Patterson, B.: Musical dynamics. Scientific American, 1974,231, 78-95.

Pisoni, D.B. (1985): Speech perception: some new directions in research and theory. J.Acoust.Soc.Amer. 78 381-388.

Plomp, R.: Aspects of tone sensation. London: Academic Press, 1976.

Plomp, R.: Timbre as a multidimensional attribute of complex tones. In Plomp, R. & Smoorenburg, G.F. (Hrsg): Frequenmy analysis and periodicity detection in hearing, Leiden: Sijthoff, 1970.

Rasch, R.A. & Plomp, R.: The Listener and the Acoustic Environment. In Deutsch, D. (Hrsg.): The psychology of music. London: Academic Press, 1983.

Reichardt, W. & Lehmann, U.: Raumeindruck als Oberbegriff von Räumlichkeit und Halligkeit, Erläuterungen des Raumeindruckmasses R. Acustica, 1978, 40, 277-290.

Risset, J.C. & Wessel, D.L.: Exploration of Timbre by Analysis and Synthesis. In Deutsch, D.(Hrsg.): The psychology of music. London: Academic Press, 1982.

Rytz, W.: Zum Konstanzprinzip in der auditiven Wahrnehmung. Dissertation, Universität Bern, 1977.

Schouten, J.F.: The perception of timbre. In: Reports 6th internat.congr. Acoustics, Tokio, Japan, 1968, Vol.1.

Schroeder, M.R., Gottlob, D. & Siebrasse, K.F.: Comparative Study of European concert halls: correlation of subjective with geometric and acoustic parameters. Journal of the Acoustical Society of Amarica, 1974, 56(4), 1195-1201.

Schroeder, M.R.: Music perception in concert halls. Stockholm: Royal Swedish Academy of Music, 1979.

Shepard, R.N.: Psychophysical complementarity. In: Kubovy, M. & Pomerantz, J.R. (Eds.): Perceptual organization. Hillsdale NJ, Erlbaum, 1981, 279-341.

Shigenaga, S.: The Constancy of Loudness and of Acoustic Distance. In Akishige, Y.: Experimental researches on the structure of the perceptual space. V., Bull. Fac. Lit. Kyushu Univ., 1965,, 289-333.

Stadler, Stefanie (1988): Eine entwicklungspsychologische Untersuchung zum Erwerb des Tonsystems bei Kindern zwischen 4 bis 9 Jahren in ihren Vokalisationen. Doktordissertation, phil.-hist. Fakultät, Universität Bern, 1988.

Steeneken, H.J.M. & Houtgast, T.: A physical method for measuring speech-transmission quality. Journal of the Acoustical Society of Amarica, 1980, 67, 318-326.

Tautenhahn, B.: Untersuchung zur Klangfarbenabhängigkeit der Tonhöhenbestimmung bei Personen mit absolutem Gehör. Schweizerische Zeitschrift für Psychologie, 1976, 32(2), 85-98 (based on Masters Thesis Univ. of Bern).

Terhardt, E., Stoll, G & Seewann, M.: Algorithm for extraction of pitch and pitch salience from complex tonal signals. Journal of the Acoustical Society of Amarica., 1982, 71, 679-688.

Tosi, O., Oyer, H., Lashbrook, W., Pedrey, C., Nicol, J. & Nash, E.: Experiments on Voice Identification. Journal of the Acoustical Society of Amarica, 1972, 51(6), 2030-2043.

Wendin, L. & Goude, G.: Dimension analysis of the perception of instrumental timbre. Scandinavian Journal of Psychology, 1972, 13, 228-240.

Wessel, D.L.: Timbre Space as a Musical Control Structure. Computer Music Journal, 1979, 3(2), 45-52.

Wilkens, H.: Mehrdimensionale Beschreibung subjektiver Beurteilungen der Akustik von Konzertsälen. Acustica, 1977, 38, 10-23.

Zwicker, E. & Terhardt, E.: Analytic expressions for critical-band rate and critical bandwith as a function of frequency. Journal of the Acoustical Society of America., 1980, 68, 1523-1525.


Appendix (separate file):

On Timbre Constancy in Space: preliminary report on the first two studies. by Alfred Lang, Roland Calmonte, Heinrich Zimmermann (December 1988).

Top of Page