Saturday, May 23, 2015

nRF24l01 control with 2 MCU pins using time-division duplexed SPI

Doing more with pin-limited MCUs seems to be a popular challenge, as my post nrf24l01+ control with 3 ATtiny85 pins is by far the most popular on my blog.  A couple months ago I had an idea of how to multiplex the MOSI and MISO pins, and got around to working on it over the past couple weeks.  The result is that I was able to control a nRF24l01+ module using just two pins on an ATtiny13a.  I also simplified my design for multiplexing the SCK and CSN lines so it uses just a resistor and capacitor.  Here's the circuit:

Starting with the top of the circuit, MOMI represents the AVR pin used for both input and output.  The circuit is simply a direct connection to the slave's MOSI (data in) pin, and a resistor to the MISO.  Since this is not a standard SPI configuration, I've written some bit-bang SPI code that works with the multiplexing circuit.  To read the data, the MOMI pin is simply set to input.  Before bringing SCK high, MOMI is set to output and the pin is set high or low according to the current data bit.  The 4.7k resistor keeps the slave from shorting out the output from the AVR if the AVR outputs high, or vice-verse.

Looking at the SCK/CSN multiplexing part of the circuit, I've removed the diode that was in the original version.  The purpose of the diode was to discharge the capacitor during the low portion of the SCK clock cycles, so the voltage on the CSN pin wouldn't move up in accordance with the typical 50% duty cycle of the SPI clock.  My bit-bang duplex SPI code is written so the clock duty cycle is less than 25%, keeping CSN from going high while data is being transmitted.  The values for C1 and R1 are not critical and are just based on what was within reach when I built the circuit; in fact I'd recommend lower values.  470Ohms * .22uF gives an RC time constant of 103uS, meaning SCK needs to be held low for >103uS for C1 to discharge enough for CSN to go low.  Something like a 220Ohm resistor and .1uF capacitor would reduce the delay required for CSN to go low to around 25uS.

The R2 is far more important.  The first value I tried was 1.5K, and after fixing a couple minor software bugs, it seemed to be working OK.  When I looked at the signals on my scope, I saw a problem:

The yellow trace shows the voltage level detected on the MOMI pin at the AVR.  Each successive high bit was a bit lower voltage, so after more than a few bytes of data, all the bits would likely be read as zero.  I suspect this has something to do with the internal capacitance of the output drivers on the nRF module, as well as it's somewhat weak drive strength, documented in the datasheet at table 13.  A 4.7K resistor seems to be optimal, though anything from 3.3K to 6.8K should work.


Here is the AVR code for the time-division duplexed SPI:
uint8_t spi_byte(uint8_t dataout)
    uint8_t datain, bits = 8;

        datain <<= 1;
        if(SPI_PIN & (1<<SPI_MOMI)) datain++;

        sbi (SPI_DDR, SPI_MOMI);        // output mode
        if (dataout & 0x80) sbi (SPI_PORT, SPI_MOMI);
        SPI_PIN = (1<<SPI_SCK);
        cbi (SPI_DDR, SPI_MOMI);        // input mode
        SPI_PIN = (1<<SPI_SCK);         // toggle SCK

        cbi (SPI_PORT, SPI_MOMI);
        dataout <<= 1;


    return datain;

I also wrote unidirectional spi_in and spi_out functions that work with the multiplexed MOSI/MISO.  Besides being faster than spi_byte, these functions work with the SE8R01 modules that have inconsistent drive strength on their MISO line.

The functions are in halfduplexspi.h, and I also wrote spitest.c, which will print the value of registers 15 through 0.  Here's a screen capture of the output from spitest.c:

Monday, April 20, 2015

Adapting an ESP-01 module for breadboard use

While esp8266 ESP-01 modules can easily be programmed after hooking up some dupont jumpers to a USB-TTL module, using them on a breadboard without an adapter or modification is not possible.  The obvious method of making an adapter with a 2x4-pin female header, some stripboard, and 2 1x4 male headers.  I thought of an even simpler way of adapting the ESP-01 for a breadboard without using any extra parts.

Of the 8 pins on the ESP-01, the CH_PD pin should be permanently tied to Vcc, so only 3 of the four middle pins are needed.  If you use my zero-wire reset solution, only the GPIO0 and GPIO2 pins are needed.  To modify the ESP-01 for breadboard use, heat up the CH_PD pin with a soldering iron, then pull it out with a pair of needle-nose pliers after the solder is liquid.  Then solder a short wire from Vcc to the CH_PD pad.  Next heat up the remaining 3 middle pins, and push them until they stick up out of the PCB.  To do this I put my needle-nose pliars under the pins, then pushed down on the module.  If you go too fast and get lumps of solder on the pin, add some flux and heat up the solder to level out the solder so jumper wires can smoothly plug into the pins.  If you're not using my reset solution, I still recommend a capacitor on RST as it will reduce or eliminate spurious resets.  The RST line on the esp8266 is very sensitive, at least compared to RST on 8-bit AVR MCUs.  When you are done you your module should look like one below, and can easily plug into a breadboard.

Wednesday, April 15, 2015

Continuous Integration: the wonderful world of free build servers

While contributing to the esp8266/Arduino project, Ivan posted a link to a test build using Appveyor.  After a bit of research, I learned that there is a whole slew of companies offering cloud-based build servers in this space called continuous integration.  More impressive is that it is free for open-source projects with most companies in this space.

When I first started writing software it was in basic and assembler on a Commodore 64.  When writing small programs on fixed-configuration systems like that, the development cycle was reasonably quick, with even "large" assembler programs taking seconds to build.  Deployment testing was simple as well; if it worked on my C64 it would work on everyone else's.  As computers got bigger and more complex, so did the development cycle.  While working on large projects at places like Nortel, full system builds could take several hours or even days.  Being able get quick feedback on small code changes is very important to software development productivity.  The availability of low-cost, on-demand services like Amazon Web Services has enabled companies in the CI space to offer build services with minimal infrastructure investment.

Who's Who

The esp8266/Arduino project uses Appveyor for Windows builds and Travis for Linux builds.  Other CI companies offering Linux-based build services include, Snap CI, and my favorite, Codeship.  All of these companies offer some level of service for free to open-source developers, so I decided to try all four of the Linux-based CI services.

For my work with embedded systems, I have been writing build scripts for avr-gcc, which I intend to extend to building a gcc cross-compiler for the xtensa lx106 CPU on the esp8266.  Full builds of binutils, avr-gcc, and avr-libc take a few hours on an Intel Core i5, so getting a working build was a slow process.  Having a large build like this also turned out to be helpful differentiating between the different CI services.

One thing all the CI services have in common is they make it easy to set up an account and try their service if you already have a github account.  With Google Code shutting down, everyone in open-source development should have a github account already.  While the CI services support a number of different languages, I was only concerned with C++ using the gcc compiler.  All the servers had gcc >= 4.4 and typical tools like autoconf, flex, and bison.


Travis was the first CI service I tried, and it turned out to be one of the more complicated.  In order to get Travis to set up for your build (set environment variables, download dependencies), you need to make a .travis.yml file in the root of your repository.  The format is similar to a shell script, so it wasn't too hard to figure out.  After a bit of experimenting I was able to get a build started.

From some of the posts I read online, I was concerned whether the build would complete in the allowed time of 50 minutes.  The problem I ended up having was not build time but build output.  After 4MB of log output, Travis terminates the build.  If my build failed I wanted to see where in the build process the problem occurred.  So I turned off minor log output from things like tar extracting dependency libraries, but I still hit the 4MB limit.

Another problem you might have with large amounts of log output relates to how your browser handles it.  Firefox started freezing on me when I tried to view a 4MB log file, but Chrome was OK.

Drone's service was easier to setup, allowing a shell script to be written in their web interface, which would be run to start your build.  Drone has a limit of only 15 minutes on free builds, which turned out to be the showstopper with their service.


I almost missed the boat on Codeship since they don't even list C/C++ in their supported languages.  I guess a gcc installation is taken for granted for Linux-based CI.  Codeship, like Drone, allows you to write a build script in their web interface.  Unlike Travis and Drone, no sudo access is availabe on the build servers, so installing updated packages is not possible.  Since the servers have a reasonably recent version of gcc (4.8.2-19ubuntu1) and gnu tools, this was not a problem for me.  Their build servers (running on AWS) are nice and fast, with a full build, using make -j4, taking about 13 minutes.

Codeship doesn't seem to have any limit on build time, though they do have a 10MB log output limit.  Fortunately that is just a limit of what is displayed in the web inteface, and the build does not stop.   The most impressive thing about Codeship's service is that they give you ssh access to a build server instance for debugging!  Clicking on Open SSH Debug Session gives you the IP and port to ssh into, assuming you've already updated your account with your ssh public key.

On the debug server, your code is already copied to the "clone" directory.  The servers are running Ubuntu 14.04.2 LTS, The servers seem to have un-throttled GigE ports, as a download of gcc 4.9.2 clocks in at 24MB/s or 200mbps.  With the debug server I was able to manually run my build, review config.log files, and copy files using scp to my computer for later review.

Snap CI

One way Snap CI is different than the other services, is that their servers run CentOS instead of Ubuntu.  I started using RedHat before Debian and Ubuntu existed, and never had a good reason to leave rpm-based distributions, so I like the CentOS support.  The version of gcc on their servers (4.4.7) is rather old, but they enable sudo so you can upgrade that with a newer RPM.  They also have a limited shell interface called snap-shell.  It's not full ssh access like Codeship, but it does make it easy to check the environment by running things like "gcc --version".

Snap CI also uses AWS servers, and build times were very similar to Codeship.  If your builds require downloading a lot of prerequisite files, Snap may take a bit longer than Codeship though, as gcc 4.9.2 took about twice as long to download on Snap.


CI services eliminate the time and cost of setting up and maintaining build servers.  They simplify software testing by having a clean server instantiated for each build.  No more broken or incompatible builds because someone installed a custom library version on the build server that normal users don't have on their machine.  I closed my account, and will probably close my Travis account too.  I'll keep using both Codeship and Snap, to be sure the software I'm working on can build on both Ubunto and Centos.  If I continue to support programs like picoboot-avrdude that builds under Linux and MinGW, I'll also try out Appveyor.

Thursday, April 9, 2015

Building avr-gcc from source

Although 8-bit AVR MCUs are widely used, it is hard to find recent builds of avr-gcc.  The latest releases of gcc are 4.9.2 and 4.8.4, yet the latest release of the Atmel AVR Toolchain only includes gcc 4.8.1.  For CentOS 6, the most recent RPM I could find is 4.7.1.  To take advantage of the latest improvements to compiler features like link-time optimization, it is often necessary to build gcc from source.  When I first attempted to build gcc for AVR targets, I quickly discovered it's not as simple as  downloading the source then running "configure; make install".  Picking away at it over the course of several months, I've figured out how to do it.  This method should work for avr targets, and with a small change to the build options, for other targets like Arm and Mips.

Although both the avr-libc and gcc sites have some information on building gcc, I found both fell short of being concise and unambiguous.  The biggest source of problems I encountered was other libraries that gcc requires for building.  GCC's prerequisites indicates, "Several support libraries are necessary to build GCC, some are required, others optional."  I (mis)interpreted the list of libraries starting with GNU Multiple Precision Library as being required only if those features were enabled in the compiler.  In the end I was only able to build avr-gcc 4.9.2 when I included GMP, MPFR, and MPC.  The ISL library was not required.

Required source files

The first thing to download before GCC is gnu binutils, which includes utilities like objdump for disassembling files.  If you already have an earlier version of binutils, it is not necessary to build a new version.  For example, on my server I have Atmel AVR Toolchain 3.4.4, which includes avr-gcc 4.8.1 and binutils 2.24.  In order to build avr-gcc 4.9.2 I don't need to make a new build of binutils.  I did try building binutils 2.25 (the latest), but instead of debugging a compile error I decided to stick with 2.24.  For building binutils, the following configure options were sufficient (though perhaps not necessary):
-v --target=avr --quiet --enable-install-libbfd --with-dwarf2 --disable-werror CFLAGS="-Wno-format-security"

The next thing to download is GCC, followed by GMP, MPFR, and MPC.  I used GMP 5.1.3, MPFR 3.1.2, and MPC 1.0.3.  After extracting all the packages, links in the GCC source directory need to be made, named gmp, mpfr, and mpc respectively, linking to their source trees.  Then gcc can be configured with the following options before running make:
-v --target=avr --disable-nls --enable-languages="c,c++" --disable-libssp --with-dwarf2

Build script

Rather than downloading, extracting, and building gcc manually, I started with a build script made by Rod Moffitt and a couple other contributors.  To use it, first run which will download the files, then  After a long build process, the binaries will be in /usr/local/avr/bin/, which you should then add to your shell PATH variable.


You can download my build for Linux x86_64.  It's dynamically linked, and built on CentOS 6, so you may have to add some symlinks in /lib64 for other Linux distributions.

Monday, April 6, 2015

Zero-wire auto-reset for esp8266/Arduino

A little over a year ago I developed a zero-wire auto-reset solution for Arduino.  After I started using Arduino for the esp8266, I realized I could do the same thing with the ESP-01.

Flashing the esp8266

In order to download code to the esp8266 after reset, GPIO0 and GPIO15 must be low, and GPIO2 must be high.  The ESP-01 has GPIO15 grounded, and GPIO2 is set high after reset.  GPIO0 is pulled up to Vcc after reset, so in order to download code to the flash, this must be pulled low.  Although esptool-ck supports using RTS and DTR for flashing the esp8266, many cheap USB-TTL modules don't break out those lines.  With USB-TTL modules that break out DTR, the DTR line should be connected to GPIO0 in order to pull the line low after reset.  Otherwise DTR needs to be grounded with a jumper or by connecting a push-button switch to ground.

The circuit

The auto-reset circuit I used on the esp8266 is a simplified version of the circuit I used with the pro mini.  It consists of just a 7.5K resistor between Rx and RST, and a 4.7uF capacitor between RST and Vcc.  The values are not critical, as long as the RC constant is between 10ms and 100ms, so if what you have on hand is a 15K resistor and 1uF capacitor, that should work fine.  A serial break signal is 250ms long, which is why I suggest a an RC constant of less than 100ms to allow the capacitor to discharge and trigger a reset before the break signal ends.  If the RC constant is less than 10ms, a sequence of zero bytes transmitted to the esp8266 could unintentionally trigger a reset.  At 9600bps, each bit is 104.2us long, so 8 zero bits plus the start bit would last 938us.  Several zero-bytes in a row, even with the high voltage of the stop bit in between, could trigger a reset.  The esptool default upload speed is 115.2kbps, so unwanted resets are quite unlikely.

The  auto-reset circuit has an added benefit of improving the stability of the esp8266 module.  The RST pin on the esp8266 is extremely sensitive.  Before I added the auto-reset circuit, simply touching a probe from my multimeter to the RST pin would usually reset the module, even when I tried adding a 15K pullup resistor to Vcc.  I would also get intermittent "espcomm_sync failed" messages when trying to upload code.  Since adding the auto-reset circuit, I can probe the RST pin without triggering a reset, and my uploads have been error-free.

Getting the updated esptool-ck

By the time you are reading this, Ivan may have already integrated my patch for esptool-ck.  If the issue is still open, then you can download the updated esptool-ck.  Extract the esptool.exe into hardware\tools\esp8266.  This version also includes support for 921.6kbps uploads, which can be enabled by modifying putting esp01.upload.speed=921600 in hardware\esp8266com\esp8266\boards.txt.

Wednesday, April 1, 2015

A 4mbps shiftOut for esp8266/Arduino

Since I finished writing the fastest possible bit-banged SPI for AVR, I wanted to see how fast the ESP8266 is at bit-banging SPI.  The NodeMCU eLua interpreter I initially tested out on my ESP-01 has little hope of high-performance since it is at best byte-code compiled.  For a simple way to develop C programs for the ESP8266, I decided to use ESP8266/Arduino, using Jeroen's installer for my existing Arduino 1.6.1 installation.  Starting with a basic shiftOut function that worked at around 640kbps, I was able to write an optimized version that is six times faster at almost 4mbps.

I modified the spi_byte AVR C code to use digitalWrite(), and call it twice in loop():
void shiftOut(byte dataPin, byte clkPin, byte data)
  byte i = 8;
      digitalWrite(clkPin, LOW);
      digitalWrite(dataPin, LOW);
      if(data & 0x80) digitalWrite(dataPin, HIGH);
      digitalWrite(clkPin, HIGH);
      data <<= 1;

void loop() {
  shiftOut(DATA, CLOCK, 'h'); 
  shiftOut(DATA, CLOCK, 'i'); 

Since I don't have a datasheet for the esp8266 that provides instruction timing, and am just starting to learn the lx106 assembler code, I used my oscilloscope to measure the timing of the data line:

The time to shift out 8 bits of data is around 12.5us, for a speed of 640kbps.  Looking at the signal in more detail I could see that the time between digitalWrite(dataPin, LOW) and digitalWrite(dataPin, HIGH) was 425ns.  Rather than setting the data pin low, then setting it high if the bit to shift out was a 1, I changed the code to do a single digitalWrite based on the bit being a 0 or a 1:
void shiftOut(byte dataPin, byte clkPin, byte data)
  byte i = 8;
      digitalWrite(clkPin, LOW);
      if(data & 0x80) digitalWrite(dataPin, HIGH);
      else digitalWrite(dataPin, LOW);
      digitalWrite(clkPin, HIGH);
      data <<= 1;

This change increased the speed slightly to 770kbps.  Suspecting the overhead of calling digitalWrite as being a large part of the performance limitations, I looked at the source for the digitalWrite function.  If I could get the compiler to inline the digitalWrite function, I figured it would provide a significant speedup.  From my previous investigation of the performance of digitalWrite, I knew gcc's link-time optimization could do this kind of global inlining.  I enabled lto by adding -flto to the compiler options in platform.txt.  Unfortunately, the xtensa-lx106-elf build of gcc 4.8.2 does not yet support lto.

After looking at the source for the digitalWrite function, I could see that I could replace the digitalWrite with a call to a esp8266 library function GPIO_REG_WRITE:
void shiftOutFast(byte data)
  byte i = 8;
      if(data & 0x80)
      data <<= 1;

This modified version was much faster - the oscilloscope screen shot at the beginning of this article shows the performance of shiftOutFast.  One bit time is 262.5ns, for a speed of 3.81mbps.  This would be quite adequate for driving a Nokia 5110 black and white LCD which has a maximum speed of 4mbps.


While 4mbps is fast enough for a low-resolution LCD display or some LEDs controlled by a shift register like the 74595, it's quite slow compared to the 80Mhz clock speed of the esp8266.  Each bit, at 262.5ns is taking 21 clock cycles.  I doubt the esp8266 supports modifying an I/O register in a single cyle like the AVR does, but it should be able to do it in two or three cycles.  While I don't have a proper datasheet for the esp8266, the Xtensa LX data book is a good start.  Combined with disassembling the compiled C, I should be able to further optimize the code, and maybe even figure out how to write the code in lx106 assembler.

Monday, March 30, 2015

ESP8266 SPI flash performance

Despite the popularity of the ESP8266, I have yet to see a detailed datasheet published.  Nava Whiteford, on his blog, has links to a summary datasheet and a Cadence tensilica core that the chip is based on.  None of this provides any details on how the memory controller pages in data from the SPI flash, nor the speed of the communications.  About all that is clear from the datasheet and chip markings is that it uses a quad-SPI serial flash chip.

I decided to find out the performance of the SPI flash, as well as get an idea of what the cache line fill size of the chip is.  By looking at the pin-out of the flash chip, I determined that pin 6 is the clock.  After some probing and playing with the settings on my scope, I captured the clock burst shown above.

Based on the 500ns horizontal scale, the clock burst lasts a little more than 2uS.  Zooming in shows that the clock is exactly 40Mhz, or half of the ESP8266 80Mhz clock and have of the maximum 80Mhz speed rating of the SPI flash.  Given the burst lasted a little more than 2uS, the total number of clock pulses is in the range of 85-90.  Accounting for the overhead of commands to enable quad SPI mode and address setup, it seems the burst corresponds to reading 32 bytes from the flash, and therefore the cache line size is likely 32 bytes.


The clock signal is clean, and with a rise + fall time of 11.1ns, could be increased to 90Mhz without significant distortion or attenuation.  With documentation on the registers to change the clock speed to 160Mhz, the ESP8266 can be run at double speed without overclocking the SPI flash.