Setback: A $58,000 lesson in mid-volume manufacturing
This week we learned our manufacturer SparkFun had shipped as many as 1,934 Microview units without a bootloader. This renders the unit effectively broken. :-( . We suspected something was up when we saw an elevated number of support inquiries with backers having issues uploading a sketch to their Microviews. Our CTO JP and I then immediately travelled around Sydney and the Blue Mountains (in Australia) capturing faulty units.
To the folks that are receiving broken Microviews over the next few days and weeks: We are really sorry to have messed up your first impressions of the Microview. SparkFun are going to make it right and will be shipping a replacement Microview for every defective unit that was shipped.
It's frustrating on many levels and especially because we've been so proud with SparkFun in shipping rewards early. You know what they say... pride comes before a fall.
You will be contacted by September 12th to let you know if you were or were not shipped a defective unit. SparkFun are still trying to establish how many units are affected. The worst case scenario is that 1,934 units need to be replaced outright. If you are part of the defective batch you will receive two units: one that has a broken bootloader (now and in the coming days) and one that works (by the beginning of November). If you’re willing to try, this is the perfect opportunity to learn some new skills: soldering and bootloader programming. Success means you’ll score a second working Microview, free of charge.
To the folks that received Microviews prior to July 18th and all backers of the Learning Kit tier, enjoy them. Your MicroViews should be fully functional.
How can I tell if I have a defective unit?
If you’ve got a Microview, follow all the steps to load a test sketch:
- Install the FTDI drivers if necessary
- Attach a Microview to the programmer
- Select Arduino Uno from the Arduino IDE drop-down menu
- Open the Examples->Blink sketch (or any sketch)
If you follow all these steps and get the error:
avrdude: stk500_recv(): programmer is not responding
avrdude: ser_recv(): programmer is not responding
avrdude: stk500_getsync() attempt 10 of 10: not in sync: resp=0x00
Then you know you’ve got a Microview that is missing its bootloader.
Who do I contact to get a replacement?
You don’t need to contact us, we will contact you if we find you’re in the defective batch. If you’ve got questions, concerns, comments, please send them to email@example.com or firstname.lastname@example.org. We want to hear from you, whatever it is, but please give us at least a few days to respond.
When can I expect my replacement unit?
SparkFun have ramped up to fix all the Microview units they have in the building, build new Microviews, and get you a replacement as quick as they can. We’re aiming for the end of October, early November. Again, we’re really sorry to have messed up. Please sit tight while we build more Microviews.
Can I fix the bad unit once I get it?
Yes you can, but it’s not easy. In the next few weeks SparkFun will post a full tutorial to show you how to reprogram and recover a defective Microview unit. In the mean time here’s a breakdown of what is required.
First you need a programmer capable of programming an ATmega. If you have an Arduino, you can use it as a programmer. If you want a cheap programmer in general the Pocket Programmer or the Tiny AVR Programmer are a great option.
Next, six connections are required for in-circuit serial programming (ISP). Three of the connections are located on external pins (easy to connect), and three of the connections are small vias on the internal PCB. This means the you need to open the Microview unit and attach (by soldering or holding) three wires to vias on the board, as well as attaching three wires to the exposed pins.
Next you need to run avrdude with this specific HEX file to burn the new firmware (that includes a bootloader!) onto the ATmega328. Once the firmware is loaded make sure you can upload sketches.
If everything is good, disconnect the ISP connections and carefully repack the Microview into its housing and snap the lens back into place.
Do a little dance, take a photo and tweet it to @sparkfun and @geekammo because you are so awesome. We will high-five you back.
How did this happen?
Over the past few months SparkFun have been building approximately 8,000 units for this Kickstarter. On July 18th, 2014, a new production firmware was created to better test the units. Unfortunately this new test firmware was defective and didn’t include the STK500 (aka Optiboot) bootloader.
Q. What is the bootloader?
A. The bootloader is a small piece of software that lets you run Arduino sketches on the ATmega chip without requiring additional hardware (such as an in system programmer or parallel programmer).
The test procedure correctly tested the Microview’s functionality (display graphics, toggled GPIOs, etc) but did not test the upload functionality (minor detail...). There is a reason for this: enumerating a COM port and uploading a sketch is much slower than pushing production firmware over SPI. Since 2011 SparkFun have been streamlining production by combining HEX files. This means they combine the HEX of the bootloader and the HEX of the production test code into one HEX file that gets programmed onto the final unit. SparkFun build nearly 90,000 products a month. This approach using combined HEX files has worked swimmingly for years. But you can see the Achilles' heel - if the HEX gets incorrectly formed it can be difficult to detect. On July 18th SparkFun started programming units with defective firmware and didn’t know it until August 17th, when customers started contacting us about the problem.
Why did SparkFun change the test procedure?
SparkFun modified the original test-code because they were experiencing larger than expected amperage draw while in test mode. The issue was that the Microviews were still in an active serial state, thus TX was sitting high, which was affecting the testing net that TX (DIO_1) is connected to. The only modification that was made to the test code was adding a serial.end() after the board receives the serial CMD from the test-bed to execute the self-test.
There are (worst case) 1,934 units that got programmed with test firmware but lack the STK500 (aka Optiboot) bootloader. The units are fully functional and should display the test sketch just fine. The problem is that there is no bootloader so you can’t upload a new sketch.
Am I in the bad batch?
SparkFun build batches of Microview 128 pieces at a time so there’s actually a large number of batch identifiers. They're still trying to pin down the exact batch numbers (there’s a lot of them) and how many units are still in the building. SparkFun have been doing a great job of building and shipping units ahead of schedule but we have a mixture of batches still in the building. Once SparkFun get everything figured out we will notify the backers that are affected. You can easily test your unit following the description above.
What did we learn?
1) No matter how much it costs, make it right with your backers. 1,934 units * $30 for the replacement unit and shipping worldwide = $58,020. This sucks, and we screwed up, and we’re going to do everything we can to make it right. Sorry for messing up that moment of joy when you get your Microview. We're working with SparkFun to get a replacement to you as quick as we can.
2) Don’t change production firmware mid-run. SparkFun have built hundreds of thousands of products, but in small batches. This is the first time that the stakes were so high. In the future, if we have to change the firmware we’ll send the new firmware through beta testers to make sure.
3) Test the bootloader. In the past SparkFun have not tested the bootloader because it required that we enumerate a COM port (which can take as long as 30 seconds) and send a test sketch (another 5-10 seconds). After the Microview error was discovered SparkFun quickly agreed that they should be testing to see if a bootloader is present. This test can be done by resetting the ATmega, sending characters 0x30 0x20, and waiting for a 0x14 0x10 response. This test can be done with a second ATmega on the test jig itself (no computer required) and adds very little time after the programming step.
4) Moving from low volume to mid-volume production requires a very different approach. SparkFun has made this type of mistake before (faulty firmware on a device) but it was on a smaller scale and we were agile enough to fix the problem before it became too large. As SparkFun started producing very large production runs they did not realize quality control and testing would need very different thinking. This was a painful lesson to learn but these checks and balances are needed. If it didn’t happen on Microview it would have happened on a larger production run someday in the future.
As the CEO of Geek Ammo, the buck stops with me. I’m personally sorry for the inconvenience this has caused you.
As always, please don't hesitate to contact me if you have any questions or queries whatsoever!