Developing SIP Phone with GUI on STM32 MCU

When should you use MCUs? The answer to this question is obvious when you want to have a cheaper, more reliable, or less power-consuming solution. But it seems in this case It is necessary to greatly reduce the functionality because the MCUs have a maximum of a couple of megabytes of internal memory. And they certainly do not work at gigahertz. We in the Embox project do not agree that the functionality is available only on large platforms. Since hundreds of kilobytes of memory are enough for a lot of complex tasks if you use fit tools.

We have an STM32F769I-Disco board. In our mind, it is almost a smartphone. There is an 800x480 touchscreen, an audio interface, a network interface (even if it is not wireless). There are 2 MB flash and 512 RAM on-chip memory, and also 64 MB QSPI Flash, and 16 MB SDRAM on the board. So we decided to try how much time it takes us with Embox to develop a SIP-phone with GUI. This article contains two parts. First, we will show how to fit a VoIP phone into the on-chip memory of the STM32F769I MCU. And in the second part, we will tell how to design the SIP-phone with GUI on a host fast and run it on an MCU.

Embox is a configurable OS for embedded systems. A distinctive feature is that it allows the use of Linux software without changing the source code on systems with restricted resources including MCUs.

One of the most popular VoIP phone projects is PJSIP. We will use it for our purpose.

Build PJSIP on Linux

First of all, it is necessary to download, build and run the main part — PJSIP, an open-source SIP stack. Download the latest version. At the moment this is version 2.10.

Then, you need to build the project. It’s easy to do for your host OS. In my case, it’s Linux.

I did not specify any options except ‘prefix’, it is the paths where the compiled libraries and header files will be installed.

Let’s build

Launch PJSIP on Linux

There are some demo applications in the PJSIP. Let’s start something simple. Since we need a bidirectional call, we take ‘pjsip-apps/src/samples/simple_pjsua.c’. This is a simple application with automatic call answering. Let’s edit the selected example ‘simple_pjsua.c’ in order to specify a SIP account value. The following lines are responsible for this:

Rebuild the PJSIP and launch the demo:

You have to see something like this:

Now you can receive incoming calls.

Build PJSIP in Embox

Let’s do the same on Embox. First, in order not to worry about the amount of memory, we will do it on the QEMU emulator.

Embox has a mechanism for the usage of external projects. It allows to set a download link, apply patches if required, and set rules for three stages: configure, build, install.

To use this mechanism, it is enough to indicate that in the ‘@Build’ annotation ‘script = $ (EXTERNAL_MAKE)’ should be used:

The following Makefile is used to build PJSIP for Embox:

As you can see, these are the same ‘configure’, ‘make dep’, ‘make’ as for Linux. Of course, when configuring, we indicate that you need to use cross-compilation (‘ — host’, ‘ — target’, ‘CC’, ‘CXX’) for the target platform.

In addition, you can notice another difference. We specify ‘ — with-external-pa’, that is, we say that for audio it needs to use external drivers (from Embox). The audio drivers in Embox provide a ‘portaudio’ interface, which is also available on Linux.

We also modify a bit the code, extract the SIP account settings from the C- file to a ‘simple_pjsua_sip_account.inc’ file, and place it in a configuration folder ( ‘conf/’ folder). That is, to build the application with a different SIP account, you only need to change this file. Content is kept the same.

We just need to create Mybuild file with a new Embox command:

Here in Mybuild we specify the source files (the code is not modified). After this to enable our util in a final image it is enough to include the module to Embox mods.conf:

Run the ‘simple_pjsua’ as before on Linux. Now it is possible to accept incoming calls.

Launch PJSIP on STM32F769I-Discovery

It remains to change the Embox configuration from PJSIP for QEMU to the configuration for the target board — STM32F769I-Discovery. To configure Embox you need several components:

  • A file with compiler flags (build.conf)
  • A linker file, where memory segments are described (lds.conf).
  • A file with modules list or a system description (mods.conf)
  • The ‘simple_pjsua_sip_account.inc’ file. The SIP account specification

The first two points are usually easy to figure out. These are compiler and linker options, and they rarely change for the same board. Except perhaps for optimization flags. The main work on the setting of the final system will be done in the third and fourth points.

First, let’s take a look at the Embox configuration. How is this different from running on Linux? On Linux we had an almost infinite amount of memory, we didn’t care about a number of tasks, memory allocations, etc. Now we have only 2MB of ROM and 512KB of RAM, excluding external memory. Accordingly, it is necessary to set how many resources it requires for our purposes.

For example, PJSIP runs on its own ‘thread’. For each new calling connection, there is another ‘thread’. And it is required one more ‘thread’ for working with audio. Thus, even with the one connection, we need at least 3 ‘threads’. Also, we want to add DHCP — so adding one more ‘thread’. In total, we need 4 ‘threads’. That have to specify in the configuration (mods.conf):

We have set the fixed-size stacks. But it is possible to use the threads with different stack sizes. It depends on a task requirement.

Next, we specify the number of required network frames:

Set the heap size (where malloc () works from):

The configuration related to the PJSIP remains the same as on QEMU.

Finding out the heap size

The main question that arises when drawing up a configuration is how to choose the necessary parameters? For example, why is the heap 0x3C000 bytes, the number of network packets is 28, and the stack size is 12KB? I often use the following approach. The first step is to find out the stack size and the system heap size. The heap size can be explored on Linux using Valgrind. You can use Valgrind-Massif profiler for this purpose. It works on “snapshots” at certain points in time and shows how much memory is requested by which function.

Launch Valgrind with our application:

After the application finished, we visualize the data using the massif-visualizer:

It can be seen that memory is spent not only on PJSIP, but also on a standard library, as well as ‘libasound’ (this is the host sound — ALSA). PJSIP is presented in the bottom red subplot. And at the peak, it uses 600 KB, while during the connection about 320 KB. Our target board has only 512KB on-chip RAM, therefore, it is necessary to configure PJSIP to reduce memory consumption.

I prepared the following configuration:

We copied it to PJSIP into a ‘pjlib/include/pj/config_site.h’ file, rebuilt it and ran again. The result:

Now it uses about 300KB, so it can fit on the board.

Also, I set the heap size 300KB in Embox (but the final heap size was reduced to 240KB). I made PJSIP pools debuggable to see if something overflows. The debugging of the pools is enabled in the same ‘pjlib/include/pj/config_site.h’ with the option “#define PJ_POOL_DEBUG 1”.

It also needs to configure the thread stacks and the number of network packets (so-called, “skbuf”). It is important to correctly distribute the remaining resources. For example, if there are too few network packets, then the sound will simply “choke”, that is you will hear only scraps of the incoming sound. If you allocate too many packages, then there will be not enough memory for the stacks. The stacks are certainly more important. If the stack overflows, everything is gone.

Therefore, we start with the maximum possible stack size and then reduce it until the software breaks or the stack sizes do not become small enough. If we catch damage to the stack, we stop. Except for this, we will use a separate stack for interrupts handling.

After that, we give the remaining resources to the network packets. As I mentioned above, we can use 28 packets.

This is enough to ‘simple_pjsia’ works successfully on 512KB internal memory.

Adding GUI

After successfully launching the console version, we need to add the GUI. For simplicity, we will assume that it includes the following. When the application starts, there should be an introductory text on the screen, for example, “PJSIP DEMO”. If there is an incoming call, the screen displays where the call came from, and two buttons with icons appear — “Accept”, “Decline”. The call can be either accepted or rejected. If the call is accepted, the conversation starts, the contact information about the subscriber is displayed, and one button remains on the screen — “Hang”. If the call was initially rejected — everything is trivial here — we return to the initial picture with “PJSIP DEMO”.

Design GUI on Linux

Since Embox already had support for Nuklear (lightweight graphic framework), I decided to use this project. Although we already have the console version of the phone on the microcontroller, it is much easier to modify the UI on Linux, as it was already done with the PJSIP settings above.

To do this, let’s take two examples. The first example is ‘simple_pjsua ‘from PJSIP. The second example is ‘demo/x11_rawfb/’ from Nuklear. We should make them work together under Linux.

The first thing I did was to replace the automatic answer with an external event (such as a button press). And then, I wrote the logic in Nuklear.

During the development process, it turned out that the icons are not drawn inside the buttons. In the picture below, you can see the phone icons inside the green and red buttons. But this is already a corrected version. Initially, only white squares were drawn. It turned out that the problem was in a ‘rawfb’ plugin implementation. To fix the problem I have added some code that copies the contents of the image to the correct Nuklear memory region.

At the end of a working day, I got the following:

Since the STM32F76I-Discovery has an 800x480 screen, I set the required resolution in Nuklear. The resulting code is as follows:

Launch on the target board (STM32)

It remains for us to transfer our application to Embox. To do this, just create a Mybuild file in Embox:

The sources are listed. Icons and fonts are placed in an internal file system and will be available as regular read-only files in runtime. Also, we added dependencies on pjsip and nuklear libraries.

After running the application on the board, I noticed that a default font from Nuclear looks terrible on the board screen. Some of the letters were simply lost. For example “1” looked like “|” and “m” looked like “n”. I had to connect fonts from TTF files — ‘Roboto-Regular.ttf’. This font takes up about 150 KB of flash memory, but the text is readable.

After checking on Linux how it looks I tried using different font sizes 32 and 38 but got a segfault. In the end, I gave up the idea of loading multiple font sizes from the file, and only loaded the 32nd font and scaled it.

Specifics of launch on the board

Let’s return to running on the board. It is obvious that it’s necessary to allocate video memory to use graphics A device that contains video memory and controls the screen is a framebuffer. There is an 800x480 screen on the board. Therefore even for 8-bit per pixel palette we need 800 * 480 * 1 = 384000 bytes (375 KB) RAM. Obviously, that is too much to locate in the internal memory. But there is 16 MB external SDRAM, let’s use it. Therefore we have enough memory. Let’s use 32bit (4bytes) per pixel format. So it requires 800 * 480 * 4 = 1536000 bytes or 1.5 MB RAM.

We set the framebuffer address to the beginning of SDRAM (0x60000000):

I have already described the effects of flickering when using one buffer in another article. Therefore, we will take into account that the system uses double buffering, and, therefore, it needs 1.5 MB additional memory. Except for this, the fonts require yet 256 KB. In total, you need to increase the heap by 2 MB. We also place it in the external memory:

There are two heaps in the system now — one in the internal memory for PJSIP, and the other in SDRAM for graphics. It helps us. After all If we merge these heaps into one, in SDRAM, we get a drop in performance. This is due to the fact that SDRAM works very well with sequential calls, but in the case when the framebuffer is being worked in parallel with the continuous sound processing, the SDRAM speed may be insufficient.

Also, add an input device ( touchscreen):

The touchscreen will be available as ‘/dev/stm32-ts’ in a ‘devfs’ filesystem in Embox. Therefore, it will be possible to work with it through the usual open ()/read()/… .

For quality work, we still needed to enable caches. We talked about how caches work in the article.

The network packet descriptors and data, as well as the audio buffers that DMA operates with, are located in a special section of memory, marked as non-cacheable in the MPU. This is required so that the state of objects in this memory is always in a coherent state.

The memory distribution looks like this

As a result, we get a working SIP phone with GUI that works quite well.

Development process

My development process can be represented as follows.

And it took very little time. One day for the application on Linux and one more day to improve on the target platform. Yes, Embox already had a display, network card, and audio drivers for this board. But the development of these parts also takes a little time, no more than a week for each driver. It takes much more time to develop such features right on the board. In our case, most of the functionality is developed under the convenient host system environment. It is this that allowed us to significantly reduce the development time.

You can see the results in this short video