vlrMemos, Audio Memos, Voice Quality and 3D Positional Sound

by GEORGES SAMAKE

GEORGES SAMAKE wants some feedback on this project. What do you like? What could be better? Anything missing?

Leave feedback

No feedback yet! Be the first to leave feedback

This is not a live project. This is a draft shared by GEORGES SAMAKE for feedback.

Note that all projects must comply with the Kickstarter Rules to launch. GEORGES SAMAKE may disable this link at any time.

vlrMemos, Audio Memos, Voice Quality and 3D Positional Sound project video thumbnail
Replay with sound
Play with
sound
€0.00 pledged of €20,000 goal
backers
By GEORGES SAMAKE
€0.00 pledged of €20,000 goal
backers

About

Version Française    French Version

vlrMemos is an app to record voice or audio memos and to measure the quality of the voice. The app will calculate and display in real-time acoustic parameters such as the LTAS (Long-Term Average Spectrum) and the HPR (High-Frequency Power Ratio).

When playing recordings, an advanced parameter, the CPPS (Smoothed Cepstral Peak Prominence), which is a reliable measure of the dysphonia, will be also calculated and displayed.

During the sound renderings, before the decompressions, personalized FIR (Finite Impulse Response) filters, generated from normalized and non normalized audiograms, will be applied to the channels in the Fourier domain (fast convolutions), for highly optimized and tailored audio outputs.

The app will be available for computers, tablets, smartphones, smartwatches and connected objects.

It can use an audio codec (audio compression and decompression method). This very fast and high-quality audio codec is based on FFT (Fast Fourier Transform).

This codec is quasi-lossless in energy: the energy of an uncompressed frame is almost the same as the energy of the compressed frame.

This codec can provide the audio in 3D. During the sound renderings, before the decompressions, generic or personalized HRTF (Head-Related Transfer Function) filters are applied in the Fourier domain (very fast operations) to the channels, for high quality 3D positional audio outputs.

The app will be compatible with the body sounds, the physiological signals and the variability data.

Using this app, one can:

- Detect anomalies in the voice. 

- Monitor the effectiveness of a treatment of the voice. 

- Monitor the progress during a training of the voice or during a speech therapy.

- Record and analyze the heartbeat sounds and the lung sounds. 

- Record and analyze the physiological signals.

- Optionally, perform the sonification of the physiological signals and the variability data.

- Optionally, send the average values of the parameters in the form of codes of intensity and / or color (light notifications) to connected bulbs or to bridges of connected bulbs.

The Parameters:  

- The LTAS: Long-Term Average Spectrum. This parameter allows to measure the quality of the voice. It provides an objective measure of the evaluation of this quality, which usually depends on the auditory perception.

- The HPR: High-Frequency Power Ratio. This parameter allows to detect breathy voices. It compares the proportion of the acoustic energy in the high frequencies to the proportion of energy in the low frequencies.

- The CPPS: Smoothed Cepstral Peak Prominence. This parameter allows to estimate the dysphonia severity. It is a good predictor and a reliable measure of the dysphonia.

The File Format

The saving of data in memory or on disk will be in the classic WAVE format (compressed or uncompressed). By default one will offer the VLC HQ 48 codec and the compressed WAVE format.

The Audio Codecs:

- VLC HQ 48 Codec:

Very fast and high quality audio codec, using FFT. The recordings will be in compressed WAVE format. They directly contain the coded values of frequencies (positions), magnitudes and phases. Because the codec uses the frequency domain and FFT, during the readings, there is no need to perform FFT to recalculate the acoustic parameters if one uses the compressed WAVE format. With the uncompressed WAVE format, one must do again FFT.

It should be noted that the current version of the codec is quasi-lossless in energy: the energy of an uncompressed frame is almost the same as the energy of the compressed frame. There is no concept of psycho-acoustic, all points can be taken into account. There is no concept of similar frames, useful concept for the communications.

It should be noted also that the use of the compression allows to require less memory, limits the amount of data to transfer and saves the storage space. Without compression, with one channel, 16 bits and 48 kHz sampling rate, a second of voice occupies 0.768 Mbits (megabits), 30 seconds occupy 23.040 Mbits, a minute occupies 46.080 Mbits and 5 minutes occupy 230.400 Mbits. With compression by the VLC HQ 48 codec at 64000 bps, and without additional lossless compression, a second of voice occupies 0.064 Mbits (megabits), 30 seconds occupy 1.92 MBit, a minute occupies 3.84 Mbits and 5 minutes occupy 19.2 Mbits. This codec will support the multichannel (in option).

One will find more information about this codec at the following addresses:

Algorithms

VLB

- VLC HQ 16 Codec:

To take into account the sounds of the body (very low frequencies) and the very long recording durations, a lower sampling rate (16 kHz and less instead of 48 kHz) will be used.

The VLC HQ 16 codec will further support the multichannel (in option), for the transmission of data such the ECG (ElectroCardioGram). Data such as the EEG (ElectroEncephaloGram) and the EMG (ElectroMyoGram) will be supported. The ABP (Arterial Blood Pressure) waveforms data and the plethysmographic waveforms data (from the pulse oximetry) will be supported too. Lastly, the blood glucose waveforms data will be supported. The multichannel will be compatible with the USB 2.0 Audio Interface.

The number of frames per second is about 31.25 for the audio. It will be around 0.5 to 2.0 frames per second for the physiological signals. One will find more information about the inclusion of the ECG data and the use of this codec for the telemonitoring at the following address:

Telemonitoring

- VLC 3D 48 and VLC HQ 3D 48 Codecs

These codecs will be compatible with the 3D positional audio. The HRTF filters (Head-Related Transfer Function), customizable, will be applied to outputs in mono, stereo or multichannel. The custom HRTF filters are useful not only for the 3D audio effects, but also as hearing aids for the hard of hearing. 

It should be noted an interesting property that is found in no other non FFT audio codec: compressed frames being directly in the Fourier domain, it is not necessary to make FFT transforms in order to apply the HRTF filters.

Custom FIR Filters:  

Possibility to load custom FIR (Finite Impulse Response) filters for all the codecs and all the sampling rates. This is useful for personalized audio output and hearing corrections. The filters are generated from text files containing the relative sensitivity of each ear at different frequencies (as the audiogram data). The length of the FIR filters may be up to 1536 samples for a single channel with a sampling frequency of 48 kHz. The FIR filters are applied in the Fourier domain (fast convolutions).

Variability Data:

We are interested in data such as the changes in the heart rate as a function of the time or the changes in the systolic blood pressure as a function the time. These data are used to calculate the heart rate variability or the blood pressure variability. There are typically 60 to 100 samples per second, therefore 5 minutes of data occupy a buffer of 300 to 500 samples. One will issue frames containing 1024 samples. Other types of data can be considered.  

The input data consist of lines in the text CSV format (time,data). If there are N channels, the lines will be in the form:

           - (time1,data1,time2,data2,...,timeN,dataN).

The (minimum) sampling rate after interpolation will be:

           - sampling rate = (total samples / total time).

The (minimum) number of frames per second will be:

           - frames per second = (sampling rate / 1024).

Very low or very high sampling rates are not problems with our codecs.The recordings will be compressed or uncompressed WAVE files depending on the backup option. Instead of displaying the values of the LTAS or of the HPR, we will display the spectral energy for low frequencies (LF), the spectral energy for the high frequencies (HF) and the LF/HF ratio. 

Sonification:

The Sonification concerns the physiological signals and the variability data. During the recordings or the readings, by default, there is no sound for these signals or data, but the displays of the parameter values for a channel or for the average of the channels.

Optionally, we will generate a sound per channel (the multichannel will be possible) using a good quality sonification algorithm. We will use the sonification by the spectral mapping (Spectral Mapping Sonification). The spectral mapping sonification allows to monitor all the frequencies or a specific band of frequencies.

Recent studies have shown, for example, you could hear the difference between a normal heart rate and an abnormal heart rate thanks to the sonification of the ECG signals.

More information on the data sonification with vlrMemos at the following address:

Data Sonification with vlrMemos

Send and Share:

It is not planned in the immediate to have sending and sharing features for the files created by vlrMemos. One will be able to use messenging apps which allow to send files (WhatsApp, Skype, ...). 

One will be able to play WAVE files (uncompressed or compressed with vlrMemos) in readable directories. Using vlrMemos in reading, one will be able to use custom FIR and HTRF filters. With the VLC codecs, one will be able to use more effectively those filters, because there will no need to be placed in the Fourier domain.  

Operating Systems:

We will consider the following operating systems:

 - Windows.

 - Android and Android Wear.

 - iOS (iPhone, iPad) and watchOS (Apple Watch).

vlrMemos will include some proprietary parts, chiefly the graphical interface. The PJSIP library, all the VLC codecs as well as other libraries are Open Source. Open Source libraries will be statically or dynamically linked to the proprietary modules. The sources codes of all the Open Source libraries will be public.

One will find more information about the PJSIP library at the following address:

PJSIP

Usefulness:

The calculated and displayed acoustic parameters will allow to measure the quality of the voice in order to:

 - detect anomalies in the voice;

 - monitor the effectiveness of a treatment of the voice;

 - monitor the progress during a training of the voice or during a speech therapy.

We will note in red the data below the thresholds that can be considered as pathological.

For some professions (such as speakers, coaches, teachers, animators and singers for example), the voice quality is fundamental.

For smokers, the detection of a persistent abnormality of the voice can enable the early detection of a serious illness such as the lung cancer.

The interest of the heart rate variability (HRV) has been demonstrated in the analysis of the recovery of the athletes. The HRV is an excellent general health level indicator and a predictor of the hypertension. A decrease in the spectral energy is a sign of risks of cardiac events.

The measure of the dysphonia allows the detection and the effective monitoring of the Parkinson disease.

The Alzheimer disease is characterized among other things by the slowdown of the EEG (electroencephalogram), that is to say, a rise in the power of magnitudes in the lower frequencies. From the EEG signals, the power ratio allows to quantify this anomaly.

Finally, one can point that the power spectral analysis of the EEG signals is the most common tool used in the sleep research.

One will find more information about the vlrMemos project at the following address:

vlrPhone / vlrMemos

vlrMemos

Listening Page

3D Positional Audio

Final Notes:  

- Next Versions:

This description is for the first version (V1) of vlrMemos (vlrMemos Light V1 and vlrMemos Full V1). Many other features are planned for the next versions. See the vlrMemos project home page for more information. 

One can mention (for vlrMemos Light and Full):

        - The GPU (Graphics Processing Unit) support: our codecs are based on FFT, so can be accelerated with the GPU support, for a very low battery consumption.

        - W64: we will propose the W64 format (Sony Pictures Digital Wave 64), with compressed or uncompressed samples. This format supports files larger than 4 Go. 

        - Other parameters (the Jitter, the Shimmer, the HNR, ...). 

The next versions will be free for the backers.

- Options:

The multichannel, the sonification and the notifications to connected bulbs will be implemented only in vlrMemos Full V1. Note that the stereo mode (two channels) will be included in vlrMemos Light V1.

- Use of the Funds

A portion of raised funds will be used to buy and send the counterparties promised to the contributors. Another part of the funds will be used to buy the test hardware. 

- Patents

The methods of our codecs are patented in France (INPI) and are being studied in the U.S. (USPTO). More exactly, the U.S. patent application is in an unintentional state of abandonment. We will try to revive it after a successful fundraising. 

- App Interfaces

Play, Record, Parameters, Advanced Parameters and Information buttons.
Play, Record, Parameters, Advanced Parameters and Information buttons.
Pause and Stop buttons.
Pause and Stop buttons.
Choice of Parameters.
Choice of Parameters.
              

Rewards:

- Earbuds and Headphones

Good quality earbuds.
Good quality earbuds.
Good Quality Headphones.
Good Quality Headphones.

No revolutionary or unique earbuds or headphones, but customisable, good quality and chiefly, by default, vlrMemos will include their complete and detailed frequency responses. Using them with vlrMemos, their perceived frequency responses will be very flat. 

- vlrMemos Software and Apps

Windows Software. 

Android Apps (Android, Android Wear). 

iOS Apps (iPhone, iPad) and watchOS Apps (Apple Watch). 

The Software and Apps (Light V1) will be finished before June 30, 2016.

The Software and Apps (Full V1) will be finished before September 30, 2016.

- Earbuds

Good quality earbuds, perceived flat frequency responses with vlrMemos. Free shipping. 

Included: Software and Apps (Light V1).

- Contributor:

Quality of Contributor. Links to the contributors websites. Credits and links in a page linked to the app web page, in the contributors section. Give a name and a web page address. 

Included: Software and Apps (Light V1).

Included: Earbuds and Headphones. Free Shipping.

- Options

vlrMemos with the following options:  

        - Multichannel. 

        - Sonification of the physiological signals and the variability data. 

        - Notifications to connected bulbs. 

Included: Software and Apps (Light V1).

Included: Earbuds and Headphones. Free Shipping.

Included: Quality of Contributor (links). 

Sponsor I:

Quality of Sponsor I. Links to the sponsors websites. Logos, credits and links in a page linked to the app web page, in the sponsors section. Give a logo, a name and a web page address. 

Included: Software and Apps (Light V1).

Included: Options (Full V1).

Included: Earbuds and Headphones. Free Shipping.

- Sponsor II

Quality of Sponsor II. Logos, credits and links in the apps (when available), in the sponsors section. Links to the sponsors websites. Logos, credits and links in a page linked to the app web page, in the sponsors section. Give a logo, a name and a web page address. 

Included: Software and Apps (Light V1).

Included: Options (Full V1).

Included: Earbuds and Headphones. Free Shipping.

- Sponsor III

Quality of Sponsor III. Logos, credits and links in the apps (when available), in the sponsors section. Links to the sponsors websites. Logos, credits and links in a page linked to the app web page, in the sponsors section. Give a logo, a name and a web page address. 

Rights to use all the audio codecs included in vlrmemos. These rights are limited to one company and one product, or one company and one software, or one company and one service. 

Included: Software and Apps (Light V1).

Included: Options (Full V1).

Included: Earbuds and Headphones. Free Shipping.

Risks and challenges

There is little risk except a small delay in the developments.
The codecs to include in the vlrMemos app were tested. The sources codes are available with vlrPhone (Windows version). One can estimate their quality by using the utilities provided with vlrPhone (test.exe and play.exe) or more quickly by visiting the listening page and the 3D positional audio page.
One will be able to check the quasi-lossless property (in energy) of the VLC HQ 48 codec or look at examples of results on the vlrMemos app home page.

Learn about accountability on Kickstarter

Questions about this project? Check out the FAQ

Support

  1. Select this reward

    Pledge €5 or more About

    vlrMemos Software and Apps (Light V1):
    Windows Software.
    Android Apps (Android, Android Wear).
    iOS Apps (iPhone, iPad) and watchOS Apps (Apple Watch).
    The Software and Apps (Light V1) will be finished before June 30, 2016.

    Less
    Estimated delivery
    0 backers
    Kickstarter is not a store.

    It's a way to bring creative projects to life.

    Learn more about accountability.
  2. Select this reward

    Pledge €25 or more About

    Earbuds:
    Good quality earbuds, perceived flat frequency responses with vlrMemos.
    Free Shipping.
    Included: Software and Apps (Light V1).

    Less
    Estimated delivery
    Ships to Anywhere in the world
    0 backers
    Kickstarter is not a store.

    It's a way to bring creative projects to life.

    Learn more about accountability.
  3. Select this reward

    Pledge €100 or more About

    Contributor:
    Quality of Contributor.
    Links to the contributors websites. Credits and links in a page linked to the app web page, in the contributors section.
    Give a name and a web page address.
    Included: Software and Apps (Light V1).
    Included: Earbuds and Headphones. Free Shipping.

    Less
    Estimated delivery
    Ships to Anywhere in the world
    0 backers
    Kickstarter is not a store.

    It's a way to bring creative projects to life.

    Learn more about accountability.
  4. Select this reward

    Pledge €200 or more About

    Options:
    vlrMemos with the following options:
    - Multichannel.
    - Sonification of the physiological signals and the variability data.
    - Notifications to connected bulbs.
    Included: Software and Apps (Light V1).
    Included: Earbuds and Headphones. Free Shipping.
    Included: Quality of Contributor (links).

    Less
    Estimated delivery
    Ships to Anywhere in the world
    0 backers
    Kickstarter is not a store.

    It's a way to bring creative projects to life.

    Learn more about accountability.
  5. Select this reward

    Pledge €500 or more About

    Sponsor I:
    Quality of Sponsor I.
    Links to the sponsors websites. Logos, credits and links in a page linked to the app web page, in the sponsors section.
    Give a logo, a name and a web page address.
    Included: Software and Apps (Light V1).
    Included: Options (Full V1).
    Included: Earbuds and Headphones. Free Shipping.

    Less
    Estimated delivery
    Ships to Anywhere in the world
    0 backers
    Kickstarter is not a store.

    It's a way to bring creative projects to life.

    Learn more about accountability.
  6. Select this reward

    Pledge €1,000 or more About

    Sponsor II:
    Quality of Sponsor II.
    Logos, credits and links in the apps (when available), in the sponsors section.
    Links to the sponsors websites. Logos, credits and links in a page linked to the app web page, in the sponsors section.
    Give a logo, a name and a web page address.
    Included: Software and Apps (Light V1).
    Included: Options (Full V1).
    Included: Earbuds and Headphones. Free Shipping.

    Less
    Estimated delivery
    Ships to Anywhere in the world
    0 backers
    Kickstarter is not a store.

    It's a way to bring creative projects to life.

    Learn more about accountability.
  7. Select this reward

    Pledge €7,000 or more About

    Sponsor III:
    Quality of Sponsor III.
    Logos, credits and links in the apps (when available), in the sponsors section.
    Links to the sponsors websites. Logos, credits and links in a page linked to the app web page, in the sponsors section.
    Give a logo, a name and a web page address.
    Rights to use all the audio codecs included in vlrmemos.
    These rights are limited to one company and one product, or one company and one software, or one company and one service.
    Included: Software and Apps (Light V1).
    Included: Options (Full V1).
    Included: Earbuds and Headphones. Free Shipping.

    Less
    Estimated delivery
    Ships to Anywhere in the world
    0 backers
    Kickstarter is not a store.

    It's a way to bring creative projects to life.

    Learn more about accountability.