One of the biggest issues with voice call quality testing is the availability of an objective metric to measure the performance of lines and call quality.  When holding suppliers to SLA it is imperative to provide an objective score of the line quality to resolve issues rather than just saying the quality of the calls is bad.  

To overcome this, many companies utilize the Mean Opinion Score (MOS).  However, while MOS scores have their place, when it comes to voice call testing, what we have written on how to improve MOS score tend to be more subjective and can be misleading. Spearline recommends the objective Perceptual Evaluation of Speech Quality (PESQ) score, or its successor the Perceptual Objective Listening Quality Analysis (POLQA) score are used instead.

In order to highlight the differences between the options available, and their respective strengths and weaknesses, it’s important to understand how each is calculated.  This in turn will give you a more objective understanding of which score is more suitable for your individual needs. We have mentioned some important information about VoIP quality scores in detail, you can check below

Page Contents

How is MOS measured?

The MOS provides a score between 1.0 and 5.0 that indicates the perceived quality of a voice or video session by a human user after it is transmitted and compressed using relevant codecs. We have mentioned certain standard scores on how to measure mos score,

Originally MOS scores were provided by expert human listeners who would rate the quality they actually heard.  This was costly and time-consuming; Especially after the International Telecommunication Union (ITU) introduced the ITU-T P.800 methods.  ITU-T P.800 specified certain standards such as:

  • the talker should sit in a quiet room with:
    • a volume between 30 and 120 dB, and;
    • a reverberation time less than 500 ms (preferably in the range 200 300 ms).;
  • The room noise level must be below 30 dBA with no dominant peaks in the spectrum.

MOS was also calculated using surveys from tests obtained from human subjects watching a video or listening to audio samples.

Today, however, the MOS used in call number testing is most often calculated using the ‘objective measurement method’. This method monitors certain factors which are then used to calculate an approximate idea of the quality of the experience for the human user; and creates a ranking based on this approximation.

What factors are monitored when using the objective measurement method to calculate MOS scores for voice call testing?

Voice quality is affected by many factors which all need to be measured and rated.  The following network parameters are the ones most commonly used to calculate MOS scores in voice call quality testing:

  • bandwidth;
  • loss of packets;
  • latency; and
  • jitter.

Instead of being focused on the audio characteristics, MOS scores mainly consider factors that are network-related.

What formula is used to calculate MOS scores for call number testing?

The obtained ratings for the individual elements tracked are calculated together to produce an R factor that is used in a MOS formula such as the one below:

MOS = 1 + (0.035) * R + (.000007) * R * (R-60) * (100-R)

What is a good MOS score for voice call testing?

MOS is calculated on a scale between 1 and 5, with 1 being poor quality and 5 being considered as clear as if you were communicating face-to-face.

The following is a guideline commonly used to assess the MOS score VoIP and give an indication of the voice quality of a VoIP line.

Maximum for G.711 (non-compressed) codec4.4
Very satisfied4.3-5.0
Some users satisfied3.6-4.0
Many users dissatisfied3.1-3.6
Nearly all users dissatisfied2.6-3.1
Not recommended1.0-2.6

Read More: best selling audiobooks

How Is PESQ measured?

Adopted in 2001, PESQ is an International Telecommunication Union (ITU) industry standard designed to provide an objective score of audio quality. 

Standardized under ITU-T P.862 as ‘an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs.  This standard has been described as the MOS adapted specifically for VoIP.

What factors are measured to provide PESQ scores for voice call quality testing?

As previously mentioned, audio quality is affected by many factors which all need to be measured and rated individually.  The following characteristics are the ones most commonly taken into consideration in PESQ Scores:

  • Audio sharpness.
  • Call volume.
  • Background noise.
  • Variable latency or lag in the audio.
  • Clipping.
  • Audio interference.

Unlike MOS scores, which are focused more on the network, PESQ is focused more on the audio characteristics.

What formula is used to calculate PESQ scores?

The PESQ test from Spearline utilizes true voice samples as test signals using a ‘full reference’ (FR) algorithm.  This means that it compares the audio from the talker (the reference) to the signal received by the listener (the test/sample) to produce an objective score of the difference between the two.  The resulting score is considered more accurate than methods that predict audio quality based on network performance – like the MOS.

PESQ can be used to assess audio quality on all types of phone calls, such as VoIP, PSTN, toll, and toll-free numbers regardless of if they operate on mobile or fixed-line networks.

What is a Good PESQ Score For Voice Call Quality Testing?

PESQ returns a score from 0.5 to 4.5, with higher scores indicating better quality.  The quality of these scores is generally grouped into six bands as described below:

Bands for Voice Call Quality TestingScore
Complete relaxation possible; no effort required3.8-4.50
Attention is necessary; no appreciable effort is required3.4-3.79
Attention is necessary; a small amount of effort required2.8-3.29
Moderate effort required2.4-2.79
Considerable effort required2.0-2.39
No meaning is understood with any feasible effort1.0-1.99

How is POLQA measured?

The successor of PESQ – POLQA works in a similar way and is standardized under ITU-T P.863.  Similar to PESQ, POLQA provides a ‘full reference’ (FR) algorithm that compares the output of a test with an original reference signal to produce an objective measure of the difference between the two.

The main difference between PESQ and POLQA is that POLQA can handle higher bandwidth audio signals including super-wideband (HD) and full-band voice signals, as well as the most recent voice coding and VoIP/VoLTE transmission technologies. 

Effects caused by new voice services like stretching and compression of speech signals in the time domain can all be easily handled by POLQA. The quality prediction for new and old codecs is improved and allows the direct comparison of AMR and EVRC. POLQA combines an excellent psychoacoustic and cognitive model with a new time alignment algorithm that perfectly handles varying delays.  


POLQA is the easiest way to assess audio quality by using a wideband or HD voice. It tests a real audio sample, allows for end-to-end evaluation, and generates an analytical metric, similar to PESQ, but with the additional versatility of being able to be used with HD and wideband frequencies. It’s a measure Spearline beginning to introduce as we find more of our customers using wideband and HD voice, but there’s no significant advantage to using it otherwise. If you’re using a standard narrowband voice, then PESQ is still your best option.