Topic: Suggested refinements « The Mind of Bill Porter

This topic has 9 replies, 2 voices, and was last updated 11 years, 1 month ago by Michael P. Flaga.

Viewing 10 posts - 1 through 10 (of 10 total)

Author

Posts
June 15, 2013 at 8:40 pm #2648

Anonymous
Inactive

I’ve just started looking through this library and have some of comments / suggestions regarding SFEMP3Shield::refill():

1- this method assumes that all files are always a multiple of 32 bytes thus potentially truncating up to 31 bytes of a file. Probably doesn’t really matter but not strictly correct;

2- I was wondering whether it might be slightly more efficient to read more than 32 bytes from the file each time; let’s say 128 bytes. Thus each time the DAC needs another 32 bytes we first check our local cache and only when it’s empty do we grab another 128 bytes from file. It might free up a little CPU time;

3- do we need to call SPI.setClockDivider(spiRate) every time we send data? (Called from SFEMP3Shield::dcs_low());

4- the VS1053 has a 2kbyte buffer. I therefore do not think that it is necessary to respond to every request for 32 bytes. Maybe it would be best to ignore a few interrupts and only transfer say, 128 byte chunks. Not sure of the best way to do that. Probably need to set a timer every time an interrupt signal is received. When that timer expires we would then send 128 bytes though for this to work we’d need to know the playback rate for the track. Maybe this suggestion would be worse as it would mean that when we did transfer data the CPU would take much longer transferring that lump of data. Maybe several small jobs would be better than fewer big jobs. It all depends on the overheads of the current version of refill();

5- the Arduino Due can transfer more than one byte at a time over SPI. We should definitely make use of that capability.

I make the above comments simply because I believe that the refill method needs to be as fast as possible so that the processor can be available to do other tasks if required.

June 17, 2013 at 10:39 am #2649

Michael P. Flaga
Member

Thanks for the feedback your observations are very astute.

The call to SPI.setClockDivider(spiRate) was implemented as a safeguard. As libraries (aka Drivers) are implemented for different external resources there are often overlap. This is the case with the SPI. Where initially Bill implemented SPI settings in MP3player.resumeDataStream(), as to allow the use of other devices on the SPI, such as Temp Sensors. Where the resumeDataStream() restored the config away from whatever deviations the other SPI devices implemented. That said the SdCard is a SPI device and does not necessarily (or even typically) have the same clock speed as the Vdsp. It is faster and lead to un-reliability in reading back data. Though, miraculously it worked in sending. I re-discovered this in attempts to expand the library and then later found this stated on VSLI’s forum.

So yes it is more overhead to reconfigure the SPI with each block of transfer, but insignificant. It is a necessary requirement for robust and interoperable library/driver. I actually see that the SdFat does this. Never assume shared resources are what you expect.

As you have noted the Vdsp has a 2K buffer, where the driver transfers 32 at a time. Yes, it would appear that it would be a good idea to transfer larger chunks. Whereas there are more subtle issues at hand. First the Vdsp has a 32 byte pre buffer that does not require over fill checking. Hence this driver as recommended by VLSI is to send 32 at a time and then check. While the DREQ check is not required after each byte within the 32bytes, it is still required to allow proper time for the 32B to be transferred. Sending without waiting may cause skips.

It is very insightful watching a logic analyzer scope of the transfers. It reveals that Software or Timer Polling appears more efficient. Note that the library supports several methods of refilling the buffer. Please review http://mpflaga.github.io/Sparkfun-MP3-Player-Shield-Arduino-Library/_s_f_e_m_p3_shield_config_8h.html#a4c60fb7c286789d19f9ed13a19891653 . Where Timer method initially appears to transfers blocks and interrupt continuously feeds the buffer as it needs it. Either way it is cosmetic in that it takes the same amount of time to transfer the data. Looking closer reveals, the overhead of doing smaller blocks versus larger blocks, is mitigated by the need to check and wait for the DREQ internal transfer every 32.

The Due’s Extended SPI is attractive. In that it manages the interchanging of SPI configurations between assigned Chip Selects (aka channels), to avoid above mentioned conflicts. Where in reading the Due’s SPI.cpp it really is only software. The SPI_CONTINUE option is simply for the CS management, as each transfer is still individually done. Where I don’t see it transferring frames, Rather I read the spi.cpp of the SAM as transferring ONE byte at a time, not using the SAM’s internal DMA. Please point me to an example of it sending frames, Versus channels.

I would recommend reading the following thread: http://www.billporter.info/forum/topic/sfemp3shield-works-with-arduino-due/ as I came to the above conclusion in attempting to implement such improvements and found them not actually providing the expected benefits.

Normal
0

false
false
false

EN-US
X-NONE
X-NONE

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:”Calibri”,”sans-serif”;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:”Times New Roman”;
mso-bidi-theme-font:minor-bidi;}

June 17, 2013 at 3:52 pm #2651

Anonymous
Inactive

Thank you for your very thorough reply.

Although the actual transfer speeds may be the same, I’d still have thought that the time spent by the CPU would overall be less if transferring say, 64 bytes instead of 2 x 32 bytes. Although the difference maybe small there is the set up times for a transfer from the SD card to the CPU and then from the CPU to the DAC chip. Just the act of calling any method (eg rack.read() or dcs_low()) takes CPU cycles to push / pull parameters from the stack and make the jump. When using the Due-only SPI extensions, that difference in CPU time should be greater though probably not an awful lot.

This is probably all nit-picking. I’m not sure what percentage difference it would make.

I’ll be using the Sparkfun MP3 shield so I’m not sure that the channel selection capability of the Due helps. If I understand correctly, the shield uses specific output pins of the Due which can’t be changed without soldering a new connection. However, what does help is the ability of the Due to specify that a transfer will comprise of several bytes through the use of the SPI_CONTINUE parameter, eg:
It’s possible to send more than one byte in a transaction by telling the the transfer command to not deselect the SPI device after the transfer :

void loop(){
//transfer 0x0F to the device on pin 10, keep the chip selected
SPI.transfer(10, 0xF0, SPI_CONTINUE);
//transfer 0x00 to the device on pin 10, keep the chip selected
SPI.transfer(10, 0×00, SPI_CONTINUE);
//transfer 0x00 to the device on pin 10, store byte received in response1, keep the chip selected
byte response1 = SPI.transfer(10, 0×00, SPI_CONTINUE);
//transfer 0x00 to the device on pin 10, store byte received in response2, deselect the chip
byte response2 = SPI.transfer(10, 0×00);
}

The parameter SPI_CONTINUE ensures that chip selection is keep active between transfers. On the last transfer SPI_CONTINUE is not specified as it’s the last byte transferred.

The above excerpt was taken from: http://arduino.cc/en/Reference/DueExtendedSPI

In my case I will be running a stepper motor concurrently with playing music. Since the VLSI chip has a 2kbyte buffer, it is not a real-time device and can be handled with a lower priority than the motor which must receive a pulse at a specific time so as to maintain a specific velocity or acceleration. So for my specific case, what I am planning is to use a timer for generating the next motor pulse. I would then have a simple loop that polled the VLSI chip to see if I needed to send more data. Since the motor would be driven by a timer-based interrupt, it would have top priority by interrupting any step involving music playback.

What I’m not sure about is the best method of checking when to send more data to the VLSI chip. Constantly calling digitalRead(MP3_DREQ) would seem to be the least efficient. Testing the specific bit on the port would seem to be faster, but what I might do is have a timer-driven interrupt that simply sets a flag. My main loop would then need simply check the status of that flag. Although the triggering of that timer would interrupt any currently executing step involving the motor, the time taken for the method to be called that only sets one flag should be insignificantly small. Using a timer would also give me the flexibility of less frequent transfers too if I so wanted. However there is a downside of increased complexity with having too many timers. Maybe I should just keep it simple by testing elapsed time instead of using a timer.

Your views would be very welcome.

Thanks for the link to the forum discussion re. the Ethernet and MP3 shields connected to a Due. If I have understood correctly, you say that the use of the new SPI_CONTINUE parameter has no real world benefit. Is that correct? According to the Arduino page I listed above:

The chip selection is handled automatically by the SPI controller, the transfer command implies the following:

Select device by setting pin 4 to LOW
Send 0xFF through the SPI bus and return the byte received
Deselect device by setting pin 4 to HIGH

So from reading the other forum I think you made your conclusion from reading the source code whereas the above implies that the chip is automatically selecting / deselecting which maybe is not evident from reading the Arduino source code.

June 17, 2013 at 4:56 pm #2653

Michael P. Flaga
Member

YES, what you want to do is actually already built in to this library as a configurable option. As mentioned above the SFEMP3ShieldConfig.h can be configured to implement the refilling method as either a timer or soft poll. This works out well with other implementations that I have done that require non interfering interrupts. Simply setting the USE_MP3_REFILL_MEANS to USE_MP3_Polled allows the use of the MP3player.available(); command in the main loop to fill the buffers, outside of interrupts (aka real-time). And further specifying it to USE_MP3_SimpleTimer along with having the prescribed optional library installed cause the refill only to fill on a scheduled period. The only draw back is the pause and resume commands have issues, in these modes. But will do exactly what you described.

Note that this library depends on SdFAT, not developed here. Where SdFAT caches 500 bytes. This is good and bad. Cache is good until you need to fill it. It is this filling that if done in while in an interrupt causes the lack of real time. So by using not using the USE_MP3_INTx refill realtime is spared.

If you are real concerned about the precision of the Step Motor pulses I would recommend using the OutputCompare functions to pre-program the cycle time of the next transactions and then update it latent before the next toggle. It similar to the Timer, but it toggles the pin at that hard time regardless of any other code and or interrupt, then creates an interrupt for servicing.

As for the efficiency of the ExtendedSPI features, it is purely software no hardware. If you read into the code of the Due’s (aka SAM for ARM) SPI.CPP it is simply keeping track of the CS for you, in code, no hardware DMA. It still has the same overhead. One’s application still has to send the data one byte at a time and still check the DREQ status every 32 bytes. The real potential benefit here would be to make the SPI transfer non-blocking for a FRAME, either soft are DMA. But then one needs to add FINISH interrupts and more complexity. Especially on receiving, as each receive is linked to a specific transmit in the case of the SPI. Unlike the case of Serial UART, where rx and tx are a-coupled. As a result their are trade offs. In this case it becomes more confusing to have the library do the same thing two different ways and no real benefit.

June 18, 2013 at 10:54 am #2655

Anonymous
Inactive

I’m not sure I agree with what you say about the usefulness of the SPI_CONTINUE flag. Here’s the source code for SPIClass::transfer():

byte SPIClass::transfer(byte _pin, uint8_t _data, SPITransferMode _mode) {

uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);

// Reverse bit order

if (bitOrder[ch] == LSBFIRST)

_data = __REV(__RBIT(_data));

uint32_t d = _data | SPI_PCS(ch);

if (_mode == SPI_LAST)

d |= SPI_TDR_LASTXFER;

// SPI_Write(spi, _channel, _data);

while ((spi->SPI_SR & SPI_SR_TDRE) == 0)

;

spi->SPI_TDR = d;

// return SPI_Read(spi);

while ((spi->SPI_SR & SPI_SR_RDRF) == 0)

;

d = spi->SPI_RDR;

// Reverse bit order

if (bitOrder[ch] == LSBFIRST)

d = __REV(__RBIT(d));

return d & 0xFF;

}

Note that if SPI_LAST is passed as a flag, d is OR’d with SPI_TDR_LASTXFER and d (a 32-bit word) is written to the SPI port. So passing the SPI_CONTINUE flag does not simply affect how the Arduino code keeps track of the CS.

Thanks for the tip re. OutputCompare . I’ll try and find more information about it.

June 18, 2013 at 12:42 pm #2656

Michael P. Flaga
Member

Yes, the hardware is actually controlling the chipselect. If one of the prescribed allowed pins. Where the code builds (aka packs) a 32 bit word that contains the mask of chipselects, LASTXFER bit/flag, and data value to send into the SPI_TDR register. All necessary to transmit the byte, as opposed to the ATmega328 that has different SPDR and SPCR. Where as the code is simply writing the TDR as opposed to writing the digital out directly. So basically the same affect and cycles.

The Big advantage to having hardware queued control of the CS would be in using the DMA, to move data while non blocking. Only need to start the transfer, go and do something else and come back and check if it was done. And this is where hard CS support is useful. Where hear the spi.tranfer blocks for each byte, before until it is ready and after until it is sent.

I just remembered while, looking at it again. I recall and see that only pins D10, D4, D52 and D78 of the Due can be used with this queue controlled method for Chip Selects. Where the SFE MP3 shield uses D6, D7 and D9 as chip selects. So it become academic, without jumper’ing.

June 18, 2013 at 5:30 pm #2657

Anonymous
Inactive

That’s a pain.

Just for completeness, in case this subject comes up again, according to this site, http://forum.arduino.cc/index.php?topic=132130.0 and http://arduino.cc/en/Reference/DueExtendedSPI the Due has three pins that could be used for chip select: D10 (same as D77), D4 (same as D87) or D52.

BTW, does SPIClass::transfer() really need to read back from SPI every time we write to the MP3 shield? I’m wondering why there are aren’t separate read() and write() methods instead and whether just writing would be faster when writing audio data.

June 18, 2013 at 8:02 pm #2658

Michael P. Flaga
Member

That is the nature of SPI, being master / slave. To get data from a slave the master must wait for the response.

Where as there are often cases peripherals need only be told and don’t respond. In those case one can blindly send. But then need to ensure that the next send does not stomp on prior sends. So it is simpler just to typically wait until the current is done.

June 19, 2013 at 9:15 am #2659

Anonymous
Inactive

I don’t believe that there is a problem with the rate of transmission / ingestion. SPI sends the data in serial form. The rate at which the data is sent can be set up to a maximum rate defined by the hardware specification. That defined rate should be sufficient. If there was ever a concern otherwise then a lower clock rate should be set. After all, why only pause by checking for a response from the slave every 8 bits, we could have stomped on data before then.

Furthermore, there is already an inherent delay between sending bytes. The time taken to call the routine SPIClass::transfer(), the initial steps of that routine prior to sending data and the steps taken by the calling routine before every call already constitute an appreciable delay.

My view is that SPIClass::transfer() has been written for generic devices. Some may respond after a byte has been sent with eg an error message, or the slave may send an unsolicited message and if it weren’t read quickly it could be lost. But with the VLSI chip there is a separate line used for handshaking (DREQ) and AFAIK, there are no unsolicited messages to expect.

So in the specific case of sending data to this VLSI chip I do not see a need to read after every byte has been sent, only to check DREQ after every 32 bits.

June 19, 2013 at 1:16 pm #2660

Michael P. Flaga
Member

Yes, when streaming data to the Vdsp there is no need to read back the transfer. However, the Due’s AT91SAM’s SPI_TDR is only double buffered and it appears the UNO ATmega’s SPDR is not even. Effectively this means it is possible to overrun the transmission’s DR (buffer). Hence the need to either wait before or after refilling until finished sending the next byte, as not to drop data.

I recall, the Vdsp with the implemented crystal is specified for a max SPI clock speed of approximately 6MgHz, where the default SdCard is 8MgHz. It was found that this over speed is tolerated for sending stream data, but found to be a problem in getting a response, causing unpredictable bit shifted delayed responses. Hence the need for the Vdsp and the SdCard to play together nicely along with other future devices.

Yes, it would be possible to write one’s own SPI equivalent driver. I believe the SdFat did so. Whereas the objectives were successful, using the standard libraries, we have not bothered to create exotic resources, that need maintaining. I had thought about using the SdFat SPI calls, however, this was avoided as not to create dependencies.

It is always a series of give and take with best practices versus best design aka… “Technical Debt versus Technical Cost along with Minimalistic Funcitonal Requirements“
Author

Posts

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.

The Mind of Bill Porter > Blog > Topics > Suggested refinements

Find me