Voices off: Why do we all hate our voice on WhatsApp?

In voice messages we don't sound like ourselves, which is very annoying. A professor of hearing, experts in sound files, a senior psychoanalyst, and a skilled narrator explain why this happens, and what can be done about it

Adva Kiselstein | 10:18, 28.08.22

Most of us are not used to hearing our voice recorded. If we are not presenters or announcers, singers, actors, or lecturers who often record their lectures, we do not hear ourselves. And when we do get to hear ourselves, it's strange. It was weird when we played with the tape recorder's microphone as kids, it was weird when we recorded our first voice mail message, and it's weird now when we send voice messages on WhatsApp.

Voicemails are pretty annoying, but we're all guilty of them because sometimes there's no choice. And then our voice doesn't sound like us. Even in other recordings - in video, for example - we don't hear ourselves the way we hear ourselves inside our heads, but in WhatsApp messages the gap is particularly large. The feeling of foreignness that our voice evokes in us is stronger.

Mine, for example, comes out really squeaky. On WhatsApp it’s even worse. On other recordings, my voice sounds higher to me than how I hear it, but not to an extreme, and I live with it in peace. In high school, for example, I recorded the summaries for my matriculation exams on a tape recorder and listened to the recordings before the tests. It was a little weird, but not disconcerting. And now, on WhatsApp, I feel really embarrassed listening to my own voice. It's me, but it's not really me. At least not someone that "I" would like to be.

Amir Ascher is a narrator (voice over artist) who encounters this dilemma again and again. "I conduct a workshop that teaches the correct use of the voice, for example to teachers and others who suffer from sore throats or whose voice gets tired at the end of the day because they don't use it correctly," he says. "The first slide in the workshop says 'Why do we sound strange to ourselves?', and my first question to the participants is how many of them sound strange to themselves in their WhatsApp voice messages. 70%-80% of the participants in each workshop say they do, some really shyly." He has a surprising solution to the matter, but before we get to it, it's worth understanding why it happens in the first place: why do we so much not sound like ourselves, and why does it undermine us?

First of all, because even before the recording we hear ourselves different from any other person who hears us "from the outside". Prof. Yael Henkin, one of the leading hearing experts in Israel, directs the focus to our skull.

"When I speak, there are two paths in which I hear myself, from the outside and from the inside," says Henkin, director of the Hearing, Language and Speech Institute and the Communication Disorders Service at Sheba Medical Center, Tel Hashomer, and a senior faculty member in the Department of Communication Disorders at the Faculty of Medicine at Tel Aviv University. "The first path is that of sound traveling through the air, as we hear anything else: the sound waves enter our auditory canal and vibrate the eardrum, which in turn vibrates three tiny bones that are behind it - the smallest bones in the human body. The last bone is at the opening of the auditory cochlea, the cochlea, and its vibration vibrates the fluid in the cochlea, which in turn stimulates the hair cells. These are small sensors inside the cochlea, which move and bend and activate the auditory nerve. This nerve, which consists of 30,000 fibers, transmits the information to the brain - and the brain decodes the signals."

Lots of motion, lots of work.

"That's right, these are signals that start as sound waves in the air and at the end of the process reach the nerve and the brain, acoustic energy that turns into mechanical energy that turns into electrical energy transmitted by the nerve."

And how is the skull connected to this?

"Because we hear ourselves not only through air conduction but also through bone conduction. When my vocal cords vibrate, they also vibrate the bones of the skull - and this vibrates the fluid inside the cochlea. This way our voice also reaches the cochlea in a direct route, skipping the air, the outer ear and the middle ear.”

Doesn't it confuse the cochlea to hear us both inside and outside?

"The liquid receives the vibration from two places at the same time, and it's not confusing, everything integrates. This is the genius of the hearing system, it's an amazing system. Things happen in units of micro and milliseconds, and everything ticks, everything works."

If the voices come together, why is there a difference between hearing ourselves from outside and hearing ourselves inside?

"Because the information is different. In air conduction there is a wide range of frequencies - low, medium and high. Bone has different conduction properties than air, so we hear more low frequencies and receive less information at high frequencies.

"It so happens that when I listen to myself both in air conduction and in bone conduction, my voice sounds with a lower tone, whereas when I hear myself in a recording, on TV or on WhatsApp, I hear myself only in air conduction, I receive less information in low frequencies and my voice has a higher tone."

And yes, for Henkin herself it is also strange to hear recordings of her voice. Everyone is weird, she says. "Even when an opera singer hears herself at a concert and then hears a recording of the concert, even if it is of the best quality - it will sound different to her."

The technology: the smartphone likes the high ones

Most of us are not opera singers, and we don't get to hear ourselves on good recordings. So from here everything gets worse and worse. After the gap between the conduction of bone and the conduction of air comes a more severe matter - the fault of technology. "Even in high-quality recordings, I'm still surprised to hear my voice," admits Dr. Felix Flomen, an expert in technological speech processing, and his colleague Yossi Zeda says: "I'm not a child, I'm 60 years old, and I understand all the technology behind our voice recordings — and I still don't want to hear myself recorded, it annoys me."

Floman is the CTO of Media and Zeda is the VP of management solutions in communication systems at the long-standing public company Audiocodes, which develops products for the transmission of voice and data in communication networks and is traded at a value of approximately $730 million. With decades of cumulative experience in handling sound files, such as those that go through WhatsApp, Flomen and Zeda know how to explain how many tractors run over our voice on the way to the ear that hears the message. "In studio recordings with quality equipment, for example, there should be no distortion of the voice," Flomen says. “The greater the bandwidth of the microphones, the more accurate they are, and as the bandwidth decreases, the further away you are from the source - that is, in cheap devices, both the transmission of the speech and the audio are less good, and the voice is distorted."

Then you have to transfer this voice, which no longer sounds like our voice anyway, as a file. And this file needs to go very quickly, over a basic communication network, with limited bandwidth, sometimes via WiFi. A lot of demands, and to meet them you have to keep sacrificing quality. Just as a picture sent on WhatsApp goes through a much lower resolution than the picture in its original size, so the quality of the sound file also decreases - it goes through compressors that shrink and distort it. They have no choice, because along the way they have to overcome quite a few difficulties: loss of information units, delays, changes in bandwidth that can occur right during the recording. A lot of technology is trying to deliver these annoying voice messages to us, and we still have complaints.

"The compressors aim to convey speech," explains Zeda, "not Puccini's aria," - it turns out that operas are a necessary point of reference for discussing the issue - "therefore in advance they do not convey all the frequencies, they convey our voice in a limited frequency range."

Is this narrowing of the frequency range what makes me sound squeaky?

"Yes, the recording you do on WhatsApp is a fast file that is compressed more, and the more you compress, the more frequencies you give up - and everything sounds more squeaky."

Reducing the frequency range eliminates both low and high frequencies, but the high frequencies are still more dominant - because of the device itself. "A smartphone speaker can't play low frequencies, simply because it's small," Flowmen says, mentioning as an example how big good speakers should be that play bass at concerts. "Our speech is in the 50 Hz range, but even if the microphone in the device were to pick up from 50 Hz - in the loudspeaker you might hear from 150 Hz. So some of the low frequencies in the speech are lost, and thus the high frequencies are more emphasized."

This does not happen when we listen to music through the smartphone.

"It's true, because music goes through a different path than that of a phone call or voice messages, the processing of the music on the device is different, and pre-targeted for better quality. It sounds better than our voice."

Low tones that disappear and high tones that grow stronger have a significant meaning in creating the psychological effect we experience in the face of our voice messages. And this, of course, begins with birth, if not before.

"Sound is something very strong, which is associated with early childhood and even earlier with infancy," says Dr. Shlomit Yadlin-Gadot, psychoanalyst and teacher at the Tel Aviv Institute for Contemporary Psychoanalysis and in the psychotherapy program at Tel Aviv University, who, among other things, examines the place of the body and its experiences in our emotional system. "Babies don't have a semantic dimension, so they remember the sensory things first: the smell, the touch - and the sound. They connect to the dimension of the sound, not the words, and babies are put to sleep with songs, with the noises, and the low tones are the ones that calm us. It can be thought that it comes from the womb - the low frequencies are the ones that are heard underwater, and are linked to the sounds of the mother's body and to some kind of relaxation."

Then we hear messages on WhatsApp and are flooded precisely with the high frequencies.

"True, and when this happens, our voice is alien to ourselves. In general, when we speak we focus on the words we say, on the semantic content, and we are less aware of the sound we make, the rhythm or the intonation. When we hear our voice in recordings, in messages for example, the opposite happens: suddenly we don't pay attention to the words, and focus on the sound. It's not under our control, and it hits us from the outside. We move into the material dimension, and it throws us into the early stages, where the sound had a strong effect on us—Lacan talks about the sound actually being registered in the body—and when we're thrown there, we prepare to meet the low tones that calm us, that calmed us back then, and instead we meet the higher ones, which are more shrill, which are experienced as penetrating the body's boundaries."

And that's only part of the problem. The small part of the problem, actually. To understand the big part, we need Freud and one of his basic concepts, the “uncany”.

"Freud refers to phenomena that arouse in us a certain and unique type of anxiety. These are situations where on the one hand there is something very familiar to us, even if we don't know what, and at the same time there is an element in it that we experience as completely foreign," explains Yadlin-Gadot. "This mix of strangeness and familiarity results in panic and apprehension, which are different from normal anxieties and fears. This is the feeling we can have if, for example, we enter our bedroom and discover that someone has changed the position of the bed. We enter the most familiar place, but it is different. At first you don't perceive what has changed, only a closed effect of foreignness is created in the most familiar place, and it causes anxiety. And this is the feeling that also arises in other cases that mix past and present, the inside and the outside, reality and fantasy."

And why is this triggered by the voice messages on WhatsApp?

"All the voices you hear come from outside. You never really hear your own voice, because when you use your voice you concentrate on producing it, not hearing it. So you know that the voice always comes from the outside, you place it in agents that are external to you. That is, something that is always internal to you is now external. You are split from yourself, and immediately the experience of foreignness, from yourself is created."

Like the moments when we don't recognize ourselves in the mirror or in photos.

"True, but we see visual reflections all the time, in mirrors, in shop windows, in lakes. We do not encounter reflections of our own voice, on the other hand. If we encounter a visual reflection where we do not expect it, it is very possible that we will not recognize ourselves, and when we understand that it is us - we will be startled. In a recording of our sound, since we are not used to encountering it, it happens every time, it is always experienced as if we are external and foreign to ourselves, and it is always startling."

Why is this so frightening?

"Because the most familiar thing, me, has become a stranger. The self is no longer me, and I am split from myself.”

Can't this split work in our favor? Can our voice in the recording sound better to us than how we hear ourselves?

“It can, but it's still a split, and a destabilizing split. And there's also the question of how in many cases the recorded voice will sound better to us. After all, even when you take a selfie, you delete 17 photos you didn't like and are left with one."

And there is one more point that Yadlin-Gadot dwells on (yes, she doesn't like hearing herself recorded either, "The Zoom lectures were a shocking experience, I looked and listened and said to myself: 'Who is this?'"). And this point is that sound is something almost violent. "There is a very domineering element in the sound - you can't block it, it's not like closing your eyes. You can't run away from it. And suddenly this outside voice, of the other, that you can't escape from, that appears as a sound and not as words, is our voice. Of course it's unsettling."

The solution: make Hebrew a primary language

Back to Amir Ascher, the only interviewee in the article for whom it is not strange to hear his own voice messages. "I'm so used to hearing myself, I know my voice before and after technician work, with distortions and without, so I don't have a problem with the voice messages," he says. "But when I first arrived on the IDF airwaves I didn't want to be an announcer at all but a music editor, and when I was assigned to the announcers department I was surrounded by people with voices that made me ashamed of my own. So I worked on it a lot, and for years. I worked with speech therapists, with teachers for voice development and singing, and with the help of countless exercises. To this day, I do voice exercises every morning."

So, you’re saying there is hope here? You believe we can actually change the way we sound?

"Yes, and easily and quickly."

Which, as it turns out, Ascher helps people with. In addition to his work as an announcer (among other things, he also presents some of the recordings of Calcalist’s supplement articles), he is also the owner of On Air, a school for narration, radio and podcasts, and in various courses and workshops he teaches not only how to use the voice correctly, but also how to play with it more.

What does it mean? What do you teach?

"First of all, I tell people to come to terms with how they sound. This is who you are, stop being ashamed. But you can also work on it, with voice development and pronunciation exercises. Hebrew is a problematic language for pronunciation, it is difficult to produce, it is a 'backward', guttural language (where the tendency is to produce the words from the throat). But when you learn proper voice production, you learn to produce Hebrew from the front of the mouth. As soon as we switch to a frontal voice, we immediately produce it with less effort, and it sounds more pleasant, softer, what is called 'radiophonic'. And when we invest less effort in its production - you can play with it more, tune it to be more pleasant. It's very easy to learn to switch from a back voice to a front voice, thus changing the way we sound."

But it still doesn't change the pitch. It still won't give the squeaky voice people a very sexy bass sound.

"There is the physiological framework with which we were born, but it is possible to expand it. Voice development exercises can work on our vocal cords so that we can produce, relatively easily, higher or lower sounds, and so that we feel more comfortable playing with the voice in a way that sounds natural. This combination - of playing with the pitch of the voice, and front pronunciation without effort, along with recognizing that this is who I am and this is my voice - can change the way we sound to others, and also to ourselves."

So there is a chance, if you choose to work on it. And you can, of course, choose to avoid it, stop leaving voicemails, and continue living in a dream about ourselves and about how others perceive us.

Voices off: Why do we all hate our voice on WhatsApp?

In voice messages we don't sound like ourselves, which is very annoying. A professor of hearing, experts in sound files, a senior psychoanalyst, and a skilled narrator explain why this happens, and what can be done about it

Related articles:

TAGS