Arduino Due: S1V30120 text to speech

Making your Arduino talk? It can be done, using a true text-to-speech implementation using the Epson S1V30120 IC, here on a text-to-speech click board from MikroElektronika. This particular implementation uses SPI communication, plus several control pins. This includes a reset pin, a mute pin that silences the output amplifier, as well as a data ready pin used by the S1V30120 to signal that it has data to send to the host.

It might look simple, but in fact it’s quite complicated, as the S1V30120 uses a proprietary messaging system over SPI. To make matter worse, the S1V30120 must be initialized every time after the reset, including the upload of a large firmware file. The current firmware file is just under 32kb, and this limits the use of the text-to-speech board to Arduino boards that come with enough memory to store this file. Such Arduino boards include the Arduyino Due and the Arduino M0.

In this post I used one Flip & click board, which is a derivative of the Arduino Due, with four mikroBUS sockets, matching the click board format. The text-to-speech click is placed in socket #1.

For the given moment there isn’t any code library for the S1V230120 that works with Arduino, so I had to start everything from scratch. All I had is the code example provided by MikroElektronika, but porting that code from MikroC to Arduino proved to be a real pain in the well-you-know. In the end I went for my own implementation, using the S1V30120 Message Protocol Specification.

The result is a just a code example implementing basic text-to-speech functionality. There’s no ADPCM implementation, and there’s a partial implementation of the messaging protocol. Even with this limitations, it can speak. Only the male voice in English is used, but one can alter the code easily for other voices. Also the tempo of the voice is fixed, a function to change this will be provided in a future version of the code. I also plan to transform this code into a library, but there’s a lot to work to do before this will happen…

S1V30120: boot mode

Once the S1V30120 is reset, it enters boot mode. A limited command set is available in this mode, and one must upload the firmware file before proceeding. So, the workflow in this mode is as follows:

  • The host issues ISC_VERSION_REQ message
  • The S1V30120 responds with ISC_VERSION_RESP
  • Firmware file is loaded by sending chunks of data, each chunk being of maximum 2048 bytes in lengh, including the header. Each chunk is sent via an ISC_BOOT_LOAD_REQ message.
  • After receiving each chunk of data the S1V30120 responds with an ISC_BOOT_LOAD_RESP message
  • After the whole firmware file is loaded the host sends the ISC_BOOT_RUN_REQ message
  • S1V30120 responds with ISC_BOOT_RUN_RESP
  • If no error codes are received, we wait for about 150ms for the firmware file to execute
  • Now we can send again the ISC_VERSION_REQ message

Each message requires a special padding to flush the receive and the send channels. The normal padding requires the sending of 16 padding bytes, while the ISC_BOOT_RUN_RESP command requires only 8 padding bytes. The S1V30120 runs in full-duplex mode, so it can send data to the host while still receiving padding bytes. Considering that most messages issued by the S1V30120 are 6 bytes in length and they have to be followed by another 16 bytes padding, I made a simplifying assumption and I send the padding only for the received messages.

Each message consists on a message command (sending 0xAA over the SPI bus), followed by a four bytes header and the message payload. The first two bytes of the header are for the message length, and the last two bytes of the header are for the message code.

Main mode

Once the boot image is run, we go to the  main mode. S1V30120 initialization must be performed before we are able to get any speech:

  • The S1V30120 must be registered with the host. To do this the host sends ISC_TEST_REQ message, and the S1V30120 responds with ISC_TEST_RESP.
  • Once again we send ISC_VERSION_REQ message
  • The S1V30120 responds with ISC_VERSION_RESP and we shall get the current firmware version and firmware features
  • Audio output settings are configured via an ISC_AUDIO_CONFIG_REQ message. The response is ISC_AUDIO_CONFIG_RESP
  • Audio volume is configured using ISC_AUDIO_VOLUME_REQ, the response being ISC_AUDIO_VOLUME_RESP
  • Text-To-Speech parameters are set via ISC_TTS_CONFIG_REQ, with the response being ISC_TTS_CONFIG_RESP

We also unmute the audio amplifier by setting the S1V30120_MUTE line low (in the Flip & Click this is pin 54).

Now we are ready to do some speaking. To speak we must send the text as a string, ending it with a null character. The command to speak some text is ISC_TTS_SPEAK_REQ, which gets an ISC_TTS_SPEAK_RESP mesage as response. Besides this acknowledging message, the text-to-speech click sends also some indication messages while it’s speaking. In particular the code is waiting for an indication of  ISC_TTS_FINISHED_IND, which means that the text-to-speech has finished and it’s ready to receive a new text. In my code implementation this waiting loop is a blocking routine.

Text to speech Arduino code

And now we come to the code listing:

#include <SPI.h>
#include <string.h>
#include "S1V30120_defines.h"
#include "text_to_speech_img.h"

#define S1V30120_RST    33
#define S1V30120_RDY    26
#define S1V30120_CS     77
#define S1V30120_MUTE   54

String mytext = "Success! Look at me, I can speak. I'm the best!";

// Variables
// Most received messages are 6 bytes
char rcvd_msg[20] = {0};

// Educated guess
// “Over the whole document, make the average sentence length
// 15-20 words, 25-33 syllables and 75-100 characters.”
static volatile char send_msg[200] = {0};
static volatile unsigned short msg_len;
static volatile unsigned short txt_len;

unsigned short tmp;
long idx;
bool success;

// Used to download image data. This is changed by the
// This is why is declares as static volatile.
// Note: unsigned short is max 32767, while our image data is 31208 in length
// one must change this to unsigned long if future image data becomes larger
static volatile unsigned short TTS_DATA_IDX;

void setup() {
  //Pin settings
  pinMode(S1V30120_RST, OUTPUT);
  pinMode(S1V30120_RDY, INPUT);
  pinMode(S1V30120_CS, OUTPUT);
  pinMode(S1V30120_MUTE, OUTPUT);

  // Unmute

  // for debugging


  tmp = S1V30120_get_version();
  if (tmp == 0x0402)
    Serial.println("S1V30120 found. Downloading boot image!");
  success = S1V30120_download();
  Serial.print("Boot image download: ");
  success = S1V30120_boot_run();
  Serial.print("Boot image run: ");
  delay(150); // Wait for the boot image to execute
  Serial.print("Registering: ");
  success = S1V30120_registration();
  // Once again print version information
  success = S1V30120_configure_audio();
  Serial.print("Configuring audio: ");
  success = S1V30120_set_volume();
  Serial.print("Setting volume: ");

  success = S1V30120_configure_tts();
  Serial.print("Configure TTS: ");

  success = S1V30120_speech(mytext,0);
  Serial.print("Speaking1: ");

  Serial.print("Speaking2: ");
  success = S1V30120_speech("test",0);


   success = S1V30120_speech("2",0);
  Serial.print("Speaking3: ");


void loop() {
  // put your main code here, to run repeatedly:

// This function resets the S1V30120 chip and loads the firmware code
void S1V30120_reset(void)
  digitalWrite(S1V30120_CS,HIGH); // S1V30120 not selected
  // send one dummy byte, this will leave the clock line high
  SPI.beginTransaction(SPISettings(750000, MSBFIRST, SPI_MODE3));

unsigned short S1V30120_get_version(void)
    // Querry version
    unsigned short S1V30120_version = 0;
    unsigned short tmp_disp;
    // Sending ISC_VERSION_REQ = [0x00, 0x04, 0x00, 0x05];
    char msg_ver[] = {0x04, 0x00, 0x05, 0x00};
    S1V30120_send_message(msg_ver, 0x04);

    //wait for ready signal
    while(digitalRead(S1V30120_RDY) == 0);

    // receive 20 bytes
    SPI.beginTransaction(SPISettings(750000, MSBFIRST, SPI_MODE3));
    // wait for message start
    while(SPI.transfer(0x00) != 0xAA);
    for (int i = 0; i < 20; i++)
      rcvd_msg[i]= SPI.transfer(0x00);
    // Send 16 bytes padding
    S1V30120_version = rcvd_msg[4] << 8 | rcvd_msg[5];
    Serial.print("HW version ");
    Serial.print("Firmware version ");
    Serial.print("Firmware features ");
    Serial.println(((rcvd_msg[11] << 24) | (rcvd_msg[10] << 16) | (rcvd_msg[9] << 8) | rcvd_msg[8]),HEX);
    Serial.print("Firmware extended features ");
    Serial.println(((rcvd_msg[15] << 24) | (rcvd_msg[14] << 16) | (rcvd_msg[13] << 8) | rcvd_msg[12]),HEX);
    return S1V30120_version;

bool S1V30120_download(void)
   // TTS_INIT_DATA is of unsigned char type (one byte)
   unsigned short len = sizeof (TTS_INIT_DATA);
   unsigned short fullchunks;
   unsigned short remaining;
   bool chunk_result;
   long data_index = 0;
   Serial.print("TTS_INIT_DATA length is ");
   // We are loading chunks of data
   // Each chunk, including header must be of maximum 2048 bytes
   // as the header is 4 bytes, this leaves 2044 bytes to load each time
   // Computing number of chunks
   fullchunks = len / 2044;
   remaining = len - fullchunks * 2044;
   Serial.print("Full chunks to load: ");
   Serial.print("Remaining bytes: ");
   // Load a chunk of data
   for (int num_chunks = 0; num_chunks < fullchunks; num_chunks++)
     chunk_result = S1V30120_load_chunk (2044);
     if (chunk_result)
       Serial.print("Failed at chunk ");
       return 0;
   // Now load the last chunk of data
   chunk_result = S1V30120_load_chunk (remaining);
   if (chunk_result)
     Serial.print("Failed at last chunk ");
     return 0;
// All was OK, returning 1
return 1;

bool S1V30120_load_chunk(unsigned short chunk_len)
  // Load a chunk of data
  char len_msb = ((chunk_len + 4) & 0xFF00) >> 8;
  char len_lsb = (chunk_len + 4) & 0xFF;
  SPI.beginTransaction(SPISettings(750000, MSBFIRST, SPI_MODE3));
  SPI.transfer(0xAA);  // Start Message Command
  SPI.transfer(len_lsb);  // Message length is 2048 bytes = 0x0800
  SPI.transfer(len_msb);  // LSB first
  SPI.transfer(0x00);  // Send SC_BOOT_LOAD_REQ (0x1000)
  for (int chunk_idx = 0; chunk_idx < chunk_len; chunk_idx++)
  return S1V30120_parse_response(ISC_BOOT_LOAD_RESP, 0x0001, 16);

bool S1V30120_boot_run(void)
    char boot_run_msg[] = {0x04, 0x00, 0x02, 0x10};
    S1V30120_send_message(boot_run_msg, 0x04);
    return S1V30120_parse_response(ISC_BOOT_RUN_RESP, 0x0001, 8);

void show_response(bool response)
    Serial.println("Failed. System halted!");

bool S1V30120_registration(void)
  SPI.beginTransaction(SPISettings(750000, MSBFIRST, SPI_MODE3));
  char reg_code[] = {0x0C, 0x00, 0x03, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
  S1V30120_send_message(reg_code, 0x0C);
  return S1V30120_parse_response(ISC_TEST_RESP, 0x0000, 16);

// Message parser
// This function receives as parameter the expected response code and result
// And returns 1 if the expected result is received, 0 otherwise
// As an observation, most messages are 6 bytes in length
// (2 bytes length + 2 bytes response code + 2 bytes response)
bool S1V30120_parse_response(unsigned short expected_message, unsigned short expected_result, unsigned short padding_bytes)
    unsigned short rcvd_tmp;

    //wait for ready signal
    while(digitalRead(S1V30120_RDY) == 0);

    // receive 6 bytes
    SPI.beginTransaction(SPISettings(750000, MSBFIRST, SPI_MODE3));
    // wait for message start
    while(SPI.transfer(0x00) != 0xAA);
    for (int i = 0; i < 6; i++)
      rcvd_msg[i]= SPI.transfer(0x00);
    // padding bytes
    // Are we successfull? We shall check
    rcvd_tmp = rcvd_msg[3] << 8 | rcvd_msg[2];
    if (rcvd_tmp == expected_message) // Have we received ISC_BOOT_RUN_RESP?
       // We check the response
       rcvd_tmp = rcvd_msg[5] << 8 | rcvd_msg[4];
       if (rcvd_tmp == expected_result) // success, return 1
         return 1;
         return 0;
    else // We received something else
    return 0;

// Padding function
// Sends a num_padding_bytes over the SPI bus
void S1V30120_send_padding(unsigned short num_padding_bytes)
  for (int i = 0; i < num_padding_bytes; i++)

// Functions that run in normal mode

void S1V30120_send_message(volatile char message[], unsigned char message_length)
  // Check to see if there's an incoming response or indication
  while(digitalRead(S1V30120_RDY) == 1);  // blocking
  // OK, we can proceed
  SPI.beginTransaction(SPISettings(750000, MSBFIRST, SPI_MODE3));
  SPI.transfer(0xAA);  // Start Message Command
  for (int i = 0; i < message_length; i++)

bool S1V30120_configure_audio(void)
  msg_len = 0x0C;
  send_msg[0] = msg_len & 0xFF;          // LSB of msg len
  send_msg[1] = (msg_len & 0xFF00) >> 8; // MSB of msg len
  send_msg[2] = ISC_AUDIO_CONFIG_REQ & 0xFF;
  send_msg[3] = (ISC_AUDIO_CONFIG_REQ & 0xFF00) >> 8;
  send_msg[4] = TTS_AUDIO_CONF_AS;
  send_msg[5] = TTS_AUDIO_CONF_AG;
  send_msg[6] = TTS_AUDIO_CONF_AMP;
  send_msg[7] = TTS_AUDIO_CONF_ASR;
  send_msg[8] = TTS_AUDIO_CONF_AR;
  send_msg[9] = TTS_AUDIO_CONF_ATC;
  send_msg[10] = TTS_AUDIO_CONF_ACS;
  send_msg[11] = TTS_AUDIO_CONF_DC;
  S1V30120_send_message(send_msg, msg_len);
  return S1V30120_parse_response(ISC_AUDIO_CONFIG_RESP, 0x0000, 16);

// set gain to 0 db
bool S1V30120_set_volume(void)
  char setvol_code[]={0x06, 0x00, 0x0A, 0x00, 0x00, 0x00};
  S1V30120_send_message(setvol_code, 0x06);
  return S1V30120_parse_response(ISC_AUDIO_VOLUME_RESP, 0x0000, 16);

bool S1V30120_configure_tts(void)
  msg_len = 0x0C;
  send_msg[0] = msg_len & 0xFF;          // LSB of msg len
  send_msg[1] = (msg_len & 0xFF00) >> 8; // MSB of msg len
  send_msg[2] = ISC_TTS_CONFIG_REQ & 0xFF;
  send_msg[3] = (ISC_TTS_CONFIG_REQ & 0xFF00) >> 8;
  send_msg[4] = ISC_TTS_SAMPLE_RATE;
  send_msg[5] = ISC_TTS_VOICE;
  send_msg[6] = ISC_TTS_EPSON_PARSE;
  send_msg[7] = ISC_TTS_LANGUAGE;
  send_msg[8] = ISC_TTS_SPEAK_RATE_LSB;
  send_msg[9] = ISC_TTS_SPEAK_RATE_MSB;
  send_msg[10] = ISC_TTS_DATASOURCE;
  send_msg[11] = 0x00;
  S1V30120_send_message(send_msg, msg_len);
  return S1V30120_parse_response(ISC_TTS_CONFIG_RESP, 0x0000, 16);

// bool S1V30120_speech(void)
bool S1V30120_speech(String text_to_speech, unsigned char flush_enable)
  bool response;
  txt_len = text_to_speech.length();
  msg_len = txt_len + 6;
  send_msg[0] = msg_len & 0xFF;          // LSB of msg len
  send_msg[1] = (msg_len & 0xFF00) >> 8; // MSB of msg len
  send_msg[2] = ISC_TTS_SPEAK_REQ & 0xFF;
  send_msg[3] = (ISC_TTS_SPEAK_REQ & 0xFF00) >> 8;
  send_msg[4] = flush_enable; // flush control
  for (int i = 0; i < txt_len; i++)
     send_msg[i+5] = text_to_speech[i];
  send_msg[msg_len-1] = '\0'; // null character
  S1V30120_send_message(send_msg, msg_len);
  response = S1V30120_parse_response(ISC_TTS_SPEAK_RESP, 0x0000, 16);
  while (!S1V30120_parse_response(ISC_TTS_FINISHED_IND, 0x0000, 16)); // blocking
  return response;

To increase the readability of the code I moved some #defines into a separate header file:

// Defines parameters for S1V30120

// Commands
// Boot mode
#define ISC_VERSION_REQ			0x0005
#define ISC_BOOT_LOAD_REQ		0x1000
#define ISC_BOOT_RUN_REQ 		0x1002
#define ISC_TEST_REQ 			0x0003

// Normal (run) mode

#define ISC_AUDIO_CONFIG_REQ 	0x0008
#define ISC_AUDIO_MUTE_REQ		0x000C

#define ISC_TTS_CONFIG_REQ		0x0012
//11 kHz
#define ISC_TTS_SAMPLE_RATE		0x01
#define ISC_TTS_VOICE			0x00
#define ISC_TTS_EPSON_PARSE  	0x01
#define ISC_TTS_LANGUAGE  		0x00
// 200 words/min
#define ISC_TTS_DATASOURCE		0x00

#define ISC_TTS_SPEAK_REQ 		0x0014

// Response messages
// Boot mode
#define ISC_VERSION_RESP		0x0006
#define ISC_BOOT_LOAD_RESP		0x1001
#define ISC_BOOT_RUN_RESP 		0x1003
#define ISC_TEST_RESP 			0x0004

// Normal (run) mode

#define ISC_AUDIO_CONFIG_RESP   0x0009
#define ISC_AUDIO_MUTE_RESP		0x000D

#define ISC_TTS_CONFIG_RESP		0x0013

#define ISC_TTS_SPEAK_RESP 		0x0015

// Fatal error indication
#define ISC_ERROR_IND			0x0000

// Request blocked
#define ISC_MSG_BLOCKED_RESP	0x0007

#define ISC_TTS_FINISHED_IND	0x0021

// Parameters

// Audio config
// See page 42 in S1V30120 Message Protocol Specification

// MONO = 0x00, all other values = reserved
#define TTS_AUDIO_CONF_AS 	0x00

// Audio gain = +18 db
#define TTS_AUDIO_CONF_AG 	0x43

// Audio amp not selected
#define TTS_AUDIO_CONF_AMP	0x00

// Sample rate 11kHz
#define TTS_AUDIO_CONF_ASR 	0x01

// Audio routing: application to DAC
#define TTS_AUDIO_CONF_AR 	0x00

// Audio tone control: depreciated, set to 0
#define TTS_AUDIO_CONF_ATC 	0x00

// Audio click source: internal, set to 0
#define TTS_AUDIO_CONF_ACS 	0x00

// DAC is on only while speech decoder
// or TTS synthesis is outputting audio
#define TTS_AUDIO_CONF_DC 	0x00

// TTS Config

Finally, the firmware image file is the one provided in MikroElektronika example. The complete code can be downloaded here.

A few final thoughts

This is a work in progress! I would be very happy to hear which problems you encounter with this code so I can fix them.

Making this code into a library? Perhaps… One all the code issues are solved I will do this. Until then it’s just this, plain code (but at least it should work).

Using Arduino Uno? The only issue is the big firmware file. I’m thinking to use one SD shield, and put that file on a SD card…

Wishlist? Unimplemented features? Your opinion counts! Don’t be afraid to use the comments section…

Post a Comment