10
June
2021
15:54

Creating MP3 audio files for learning CW based on a dictionary

10 June 2021 15:54

The task is to create an MP3 file for training telegraph reception by ear, based on words of the Russian language, which would follow in random order, but would not be repeated.

The maximum plan is to output an unlimited number of MP3 files for listening in a pocket MP3 / FM receiver.

1. Creation of a dictionary of the most commonly used Russian words

I found a frequency dictionary of the Russian language on the Internet and selected the most frequently used words from it.
freqrnc2011.ods.zip

Using the standard LibreOffice filter, Calc removed proper names and words with hyphens from the table and divided them into files by letter length. From 2 letters, from 3 letters, etc., up to 18 letters.

Download files with Russian words in the archive:

top_rus.zip

Proper names in the archive:
names_rus.zip

Example of 4 letter words:
torso
horde
myrtle
brig
shame
yoke
iron
Chud
wax
imam
antennae
oils
internet
Eric
minced meat
fuchs
treasure
carcass
great
icon case
bedbug
khaki

2. Reverse sorting (shuffling) of words in random order.

I composed the command:

cat top4.txt | tr ' ' '\n' | uniq | sort -R

where:

  • top4.txt - input file name.
  • tr command replaces spaces with newlines.
  • sort - R - operation inverse to sorting (shuffling words in random order).

Output is produced to standard stream stdout т.е. можно после этой команды использовать другие, берущие исходные данные из потока ввода stdin.

3. Generating an MP3 file - using the ebook2cw command

sudo apt-get install ebook2cw

The program is launched using the ebook2cw command, specifying the input file and parameters, something like this:

ebook2cw top3.txt -w 20 -e 20 -f 800 -b 64 -o 20wpm -d 60 -u -t 20wpm

ebook2cw ebook2cw is a command line program which converts a plain text ebook to Morse
code audio files. It works on several platforms, including Windows and Linux.

where:

-w 20 - WPM speed (words per minute)
-e 20 - effective WPM speed (words per minute)
-f 800 - telegraph tone frequency 800 (Hertz)
-b 64 - bitrate, i.e. speed of the resulting MP3 file
-q 8 - MP3 quality from 1 to 9 (1 is the best quality, 9 is the worst)
-o 20wpm - file name prefix "20wpm"
-d 60 - split the file into fragments (duration) of 60 seconds each
-u - use Unicode encoding for the Russian language.
-t 20wpm - header in the ID tag of the output file

As a result, an mp3 file with Morse code will be created very quickly (at a speed 300 times faster than usual):

ebook2cw 0.8.2 - (c) 2013 by Fabian Kurz, DJ1YFK

Reading configuration file: /home/vladimir/.ebook2cw/ebook2cw.conf

Speed: 20wpm, Freq: 800Hz, Chapter: >CHAPTER<, Encoding: UTF-8

Effective speed: 20wpm, Extra word spaces: 0.0, QRQ: 0min, reset QRQ: yes

Chapter limit: 60 seconds, Encoder: MP3

Starting 20wpm0000.mp3

Warning: don't know CW for unicode &#1019;

words: 405, time: 14:30

Finishing 20wpm0000.mp3

Total words: 405, total time: 14:30

Conversion time: 2s (Speedup: 290.0x)

4. Connected everything together

The command to run in the terminal took the following form:

cat top2.txt | tr ' ' '\n' | uniq | sort -R | ebook2cw -w 20 -e 20 -f 800 -b 64 -o 20wpm-2- -d 60 -u -t 20wpm

where top2.txt is the name of the input file.
20wpm-2- is the MP3 output file prefix.

An example of the result of the team's work
(For a set of 4 letter words, the CW speed is 18 WPM)

Original words:
1.txt.zip

Team:

cat 1.txt | tr ' ' '\n' | uniq | sort -R | ebook2cw -w 18 -e 18 -f 800 -b 64 -o 18wpm-1- -d 60 -u -t 18wpm

Startup result (-o 18wpm-1-):

Result of restart (-o 18wpm-2-):

Result of restart (-o 18wpm-3-):

5. Run the command multiple times to create different MP3 files

The command can be run multiple times. If the -o option is the same, a file with the same name but different content is created each time.

I set myself the task of changing the file suffix (-01, -02 - 03, etc.) after each command run.
I solved the problem the next day. I wrote a script in BASH command file language:

#!/bin/bash

#

# #This is a script that creates number of mp3 files with morse code

# Usage: hello some.txt 5 20

# where some.txt - input text file

# 5 - number of files

# 20 - speed wpm

txt=$1

max=$2

speed=$3

if [ -z txt ]; then

echo "Input text file:"

read txt

fi

if [ -z $max ]; then

echo "Number of MP3 files to create:"

read max

fi

if [ -z $speed ]; then

echo "CW speed in WPM:"

read speed

fi

echo max = $max

echo speed = $speed

delim="-"

ext=".mp3"

for (( cnt=1; cnt<=$max; cnt++ )); do

name1=$(printf "%02dwpm" $speed)

name2=$(printf "%03d" $cnt)

title=$txt$delim$name1$delim$name2

name=$title$ext

cat $txt | tr ' ' '\n' | uniq | sort -R | ebook2cw -u -f 800 -b 64 -w $speed -e $speed -t $title

mv -v "Chapter0000.mp3" "${name}"

done

exit 0


Using likecw.sh:

launch

bash likecw.sh 1-й_параметр 2-й_параметр 3-й_параметр

1st_parameter - input file name, for example top7.txt
2nd_parameter - number of files to generate
3rd_parameter - WPM speed

Example:

bash likecw.sh top7.txt 5 20
  • will create 5 MP3 files based on top7.txt at CW 20wpm.

    likecw.sh.zip

6. How to process all TXT files with one command

For each of the files from top2.txt to top18.txt, 5 variants of MP3 files are created at a speed of 20wpm:

bash likecw.sh top{2..18}.txt 5 20


Related publications