VMX Project: Computer Vision for Everyone
VMX Project: Computer Vision for Everyone
Webapp for real-time visual object recognition and an API for building vision-aware apps. Let our vision empower your vision.
Webapp for real-time visual object recognition and an API for building vision-aware apps. Let our vision empower your vision. Read more
About this project
(Update: January 5th 2014) In response to backer (and potential backer) demand, we will offer a single machine license and linux-binary for a local VMX installation. Local installation will not consume VMX Compute Hours nor require internet access. Optionally, VMX Developers will be able to download a fully-configured virtual machine running VMX on Linux for use inside VirtualBox. More here: Update #5
What if your computer was just a little bit smarter? What if your devices could understand what is going on in their surroundings merely by looking at the world through a camera? VMX gives your computer the ability to see. VMX could be used to make games more engaging, enhance your interactions with your devices, and allow your computer to automate many of your daily chores and responsibilities.
The VMX project was designed to bring cutting-edge computer vision technology to a very broad audience: hobbyists, researchers, artists, students, roboticists, engineers, businesses, and entrepreneurs. The VMX project will give you all the tools you need to bring your own creative computer vision projects to life.
You + VMX = A computer vision jedi
VMX allows for a variety of different input formats. Whether it’s a webcam, a YouTube video, or a map-flyover, if you can render it on your screen, VMX can use it. This means that whether your ideal application involves processing previously recorded videos, learning from Google Image search, or having a camera watch your refrigerator, the possibilities are endless.
Here is a video showing real-time face parts and hand-gesture recognition. It's easy to add your own gestures and customize their performance directly in our GUI. You'll be amazed at how much time you'll save by using our GUI.
Here is a video showing Southpark Characters being recognized inside the browser. VMX works on videos, cartoons, and all sorts of different video streams.
Here is a video showing VMX detecting objects in a Google Earth map fly-over. Want to count cars from overhead imagery? Or perhaps count how many pools are in your neighborhood?
VMX in the browser
In order to make the barrier-of-entry to computer vision as low as possible, we built VMX directly in the browser and made sure that it requires no extra hardware. All you need is a laptop with a webcam and a internet connection. Because browsers such as Chrome and Firefox can read video directly from a webcam, you most likely have all of the required software and hardware. The only thing missing is VMX.
You won't need a PhD to use Visual AI: Why you’ll love VMX
VMX gives you a our very-own open-source vision apps as well as all you need to effortlessly build your very own computer vision apps. Our technology is built on top of 10+ years of computer vision research experience acquired from CMU, MIT, and Google (see About The Founders section below). By leaving the hard stuff to us, you will be able to focus on creative uses of computer vision without the headaches of mastering machine learning algorithms or managing expensive computations. You won’t need to be a C++ guru or know anything about statistical machine learning algorithms to start using laboratory-grade computer vision tools for your own creative uses.
Training at your fingertips
The “killer feature” of VMX is an effortless method for training your own object detectors, directly in the browser. We talked to many aspiring developers and quickly realized that many people’s crazy ideas involve the ability to recognize different (and sometimes quite personal) objects. By waving an object in front of your laptop’s screen, you will be able to train your own object detector in a matter of minutes.
Creating a new object detector requires drawing a few selection boxes directly over the input video stream and then spending some time in “learning mode.” While you are in learning mode, the detector continues to run in real-time while learning about the object, making it ready for your application in a matter of minutes. You can then save a detector, or “object model,” for later use.
Running Multiple Object Detectors
You will be able to train multiple detectors for the different objects you care about. With VMX you can load, save, and manage all of your object models. You can run multiple detectors in real-time, use the GUI to make them faster or more robust, and most importantly, you can always improve your object detector later by enabling “learning-mode.” Here is an example of the model library which lets you select pre-trained object models.
To help you train an object detector in a very difficult scenario (such as one Southpark cartoon character versus another), we built an advanced model editor which lets you visually tweak the learned model. The Model Editor GUI inside VMX lets you move examples from the positive side to the negative side, and vice-versa. All you need to know about machine learning is that a “positive” example is what VMX thinks is the object and a “negative” examples is what VMX thinks is not the object.
If successfully funded, backers will get access to the the following VMX Apps (as featured in our video):
VMX Tweet & Greet App: If X is detected, send a Tweet to Y. If you want to know when your dog has jumped on the sofa, then this app is for you.
VMX Pong App: Use real-world objects as controllers. Challenge your friends. If you’re serious about emerging technologies for gaming, then this app is for you.
Below is a video of an early VMX prototype used to play Pong.
VMX Counter App: Do you need to know how many bottles are on the counter? Or how many cars are parked in your company’s parking lot? Then this app is for you.
Below are a few other VMX apps we are working on. We’ll let you know when they are ready!
VMX HandPlay App: Tired of looking for your TV remote? Do you want to use hand gestures to control media playback? This app will show you how to pimp out your living room with VMX technology.
VMX API and vision-as-a-service
By running the most expensive computations on our servers, you will be able to use VMX technology on a variety of different computers and hand-held devices, as long as each device has a camera, an internet connection. NOTE: As of January 5th, 2014 we also added the ability for $100 backers to download a standalone executable which will run locally on their computers.
There are two levels of VMX APIs
A VMX Cookbook will be provided to developers that documents the APIs, discusses the motivation behind the API design, and gives you code example “recipes“ so you can get the most done with the least amount of effort.
We believe in the power of decentralized systems and feel that Kickstarter is an ideal platform to deliver our technology to a broad base of early adopters and technology enthusiasts. We are developers, have always been developers, and always will be developers. We have witnessed amazing things built with computer vision technology and want to let the whole world experience computer vision without the typical pains of using laboratory-grade technology. By supporting this Kickstarter campaign, you are receiving more than just an awesome product and service. You are backing our dream of building a foundation for the masses to unleash their creativity. In order to deliver a high-quality ad-free product, we need to fully focus on VMX development without working on side projects to account for the cloud computing bills inherent in this project.
Our Kickstarter Rewards
Our main reward is a Kickstarter-exclusive early-access to the VMX webapp. If we reach our kickstarter goal of $100K, we plan on a public release in the Summer of 2014. Each “VMX Developer” will receive early-access to VMX technology and each backer will receive a heavy discount on the VMX service, as measured in terms of VMX Compute Hours (see note below).
If funding is successful, when we launch VMX compute hours will be $1/hr. By contributing now, you not only receive significant discounts on your VMX Compute Hours, but there is no cost associated with VMX Developer access (at the $100+ tiers), which allows you to list your apps in our directory, among other benefits.
(Update: January 14th 2014) We will give 10% of Kickstarter generated funds to high school students and clubs in the form of software licenses. More here: Update #8
Early-Access VMX Developer Program
By contributing to this kickstarter campaign, you could be among the first to use our technology. There will be no open-offerings to use VMX outside of this kickstarter until we launch VMX to the world. VMX early-access is kickstarter-exclusive.
We will be taking on new users incrementally. We want to be sure we can give the best experience possible to the most dedicated users. To that end, we will open VMX access in a series of phases, giving us time to scale out what is an extremely computationally expensive learning problem and let the earliest users tell us what they really want to see next. We have a list of not-yet-implemented features we personally want to see added, but are dedicated to really hearing what you want and prioritizing from that perspective.
What is a VMX Compute Hour?
Because VMX is provided as a service, we measure computation on our servers in compute hours. One compute hour allows one object detector to run continuously on a video stream for one hour. Unless you plan on sitting in front of your laptop and playing with VMX all day long (we suggest you take occasional breaks), your applications will typically utilize credits at a much slower rate. On one extreme, a 30 minute two-player game of VMX Pong (which uses two detectors for the two controllers) will cost one compute hour with your detectors running at maximum performance (which won’t be strictly necessary to have a very enjoyable game of pong), and on the other extreme, a VMX Tweet & Greet App may run an entire month using a single compute hour (assuming that your special person only occasionally enters through the door).
Essentially, some applications require a much less aggressive frame-rate than others; and you are using your compute hours only while actively asking the server for updates.
Running VMX Locally
(Update: January 5th 2014) Additionally, VMX Compute Hours can be traded in for a license and linux binary or virtual machine to run locally. Local installations of VMX will not consume VMX Compute Hours, and can be a good low-latency solution for demanding tasks, or situations where broadband is not reliable/available. For details click here.
What if I’m not a developer?
A long answer to a simple question is this: our goal is not primarily to give developers the tools to use computer vision in their applications. It is a means of getting more seamless user experiences in everyone’s hands. In talking about how we can best get this cutting edge technology out of the confines of a handful of researchers, considering the constraints on, and the complexity of, the underlying learning problem; we realized the only way to bring the technology to the world was to allow people to solve their own vision problems.
Building VMX powered vision apps will require basic programming skills or better, and using VMX apps in your own environment will often require customizing models to your specific environment. Importantly, you will be able to train, use, and customize your own models in any VMX app without any programming experience.
You will have access to the initial VMX apps for smart media playback, vision-pong, email/sms/twitter alerts as well as any apps that other developers choose to share. That is, you will be able to enjoy and use the VMX ecosystem without ever looking at a line of code. That being said, we are certainly focusing our efforts on encouraging and lowering the barrier to entry for developers across the world to create new and exciting apps. If you contribute as a non-developer you will still get the bonus rewards for VMX credits, you will still have early access to the ecosystem, and you will be able to help test and validate the VMX experience both from the ecosystem perspective and from the individual apps that we, and others, release.
If you want to be one of the first in the world to experience the -- we hope -- incoming tsunami of vision-aware apps, we highly encourage you to contribute!
What vision-aware app will you build?
User privacy is very important to us. We guarantee that we will not share, publish, look at, or sell any of your images or videos which VMX might see. In addition, we will not perform any additional data-mining on your data and promise not to share any individual or aggregated user data with a third-party. By supporting this kickstarter campaign, your contributions will allow us to work on what is important: giving you the computer vision experience of a lifetime. No ads. No unsolicited emails.
About The Founders
Geoff and Tom met in 2001, as first year undergraduate students in Rensselaer Polytechnic Institute's Computer Science program. RPI's motto has always been, "Why not change the world?" but little did they know that life would take each of them on separate journeys only to re-unite 12 years later and begin work on the VMX project, under their new company’s motto, “Let our vision empower your vision.”
Tom Malisiewicz (Dr. Tomasz Malisiewicz, Carnegie Mellon University Robotics PhD 2011) has been an active Computer Vision researcher for the past 10 years. His doctoral thesis focused on large-scale object recognition systems which utilize clusters of computers for solving the underlying machine learning problems, spent two summers at Google Research, and spent the last two years as a postdoctoral scholar at MIT in the Computer Science and Artificial Intelligence Laboratory. Tom has always wanted to make computers more intelligent, and quickly realized that empowering computers with the sense of sight would be necessary if we want to one day build robots. In his research career, he released many of his projects under the MIT open source license and learned a lot about the ups and downs of getting image recognition technology up and running. Tom believes that in order to bring computer vision to the masses, we must engineer high-tech tools for the masses.
Geoff Golder started programming as a teenager, initially making nefarious little “proggies” for AOL. After dropping out of RPI’s undergrad program, he commenced a several year long soul-searching journey, only the end up (unexpected to his previous self) building a career around building web applications. His philosophy has been to build software with scaling in mind since day one. His innovative way of integrating front-end software with RESTful back-end services, on top of a rock solid software engineering workflow, allows for rapid development and testing of a multi-layered stack of technology. He hopes his broad base of experience in working with developers of all skill levels will contribute to a super slick VMX development experience.
Movie Production: Kevin Hurley, Polaris Media Works.
Risks and challenges
As avid software engineers, our test-driven software engineering workflow allows us to isolate and fix bugs in a mature way. Tomasz Malisiewicz released his doctoral dissertation codebase on Github in 2011 under the most permissive open-source license, the MIT license. See, https://github.com/quantombone/exemplarsvm
"Releasing my scientific code to the world under the permissive MIT open source license not only gave my work the credibility that it deserved, but it let others build on top of that foundation. By interacting with users of my code (mostly PhD students across the globe) I learned how others wanted to use the object recognition software, what problems they had getting this technology up and running, as well as what skills they needed to have before they could modify the software in a meaningful way." -- Tomasz Malisiewicz
VMX is a significantly more advanced and more mature version of the kind of software produced in academia. Most importantly, it is designed to be used a broad audience -- an audience not trained in advanced machine learning nor ready to master building advanced C++ software such as OpenCV.
Scaling to thousands of users:
The key challenge in our project is allowing users to share their models and custom apps with the rest of the world, while protecting their privacy.
Yes. VMX runs entirely in the browser; and since there are compatible browsers for windows/osx/linux it is truly platform agnostic.
Additionally, you have the option of using the REST interface so you can use VMX with anything that speaks HTTP, which should be just about every device you might want to use.
If you choose to back our project at the $100 level and above, you'll be eligible for a single-machine license and download a VMX binary. The VMX webapp will still run inside the browser, but it will run locally. The webapp component will communicate with an "object detection server" running locally on your machine. This software will be directly installable on the Linux operating system, and for those not running Linux, we will provide a Linux Virtual Image for download which will contain a pre-installed, and fully configured instance of VMX.
It works well for most applications.
The thing to consider is how quickly your application needs to react. If it needs to react in less time than the latency of your connection to VMX, it probably won't work well. However, for most things this isn't really an issue.
Consider the tweet and greet app; it probably won't make much difference to you if you receive the tweet 400ms later than you would have received it on a fiber connection.. and so a high-latency connection will be fine.
On the other hand, a robot reacting to a fast-changing environment that is relying on VMX for its awareness may have issues if reaction times are being measured in milliseconds.
Generally speaking, if your "time to action" has to be less than the latency, then high latency is going to hurt you; otherwise it shouldn't really matter at all.
If you think latency is going to be an issue, then you will have to download a version of the VMX object detection server which will locally. This will cost $100 in July, or you'll be able to trade-in 100 Compute Hours for a single-machine license and download.
Our team is using a powerful, yet proprietary, algorithm for the machine learning component of the project. The most important part of the algorithm is an on-line and real-time method for training object detectors and being able to run them in real time, even across an internet connection.
There are many ideas in the VMX project which come from a research project I open sourced several years ago, after publication at the International Conference of Computer Vision in 2011. The name of the published method is "An Ensemble of Exemplar-SVMs" and you can find the research paper and more details here: http://www.cs.cmu.edu/~tmalisie/projects/iccv11/ Be warned that to use the code you'll need a Matlab license, a large cluster of computers, knowledge how to distribute the computation across your nodes, and several days of waiting time. This Kickstarter revisits some of these bottlenecks and makes it much easier for people to get their own models trained.
You could say that I have been working on my own computer vision library over the years, but what you're seeing in the video is not open source. Over the years I've witnessed many students (even PhD students) struggle with getting my software, as well as the software of my colleagues (http://www.cs.berkeley.edu/~rbg/latent/index.html) up and running, even though the software was intended to be "easy-to-use." Typical problems ranged from "I can't get your code to compile on my computer" to "I don't know how to use a cluster and/or mapreduce to parallelize the learning problem." Most importantly, I've seen very bright students take days just to train a new object detector using open source libraries. Open-source libraries are just tools, and unless you know how to take the correct training images, label them properly using the right annotation interfaces, format the resulting files, and choose the appropriate learning algorithm, training and object detector could take a few days. With our VMX project, that is all about to change. You'll be able to train a new object detector in a matter of minutes, something not possible with the open-source tools out there.
I've been working in this area for quite some time now, so feel free to ask me any questions and I'll answer with as much detail as we can legally disclose.
Dr. Tomasz Malisiewicz
Once the early-access period (March 2014 - June 2014) is over, backers of our Kickstarter campaign at the $100 level and above (which we call "VMX Developers") will have the option to receive a single-machine VMX license and install VMX on their own computer.
If the Kickstarter reaches the funding goal, we anticipate a single-machine VMX license will be available to VMX Developers for $100 (or more) during our summer 2014 public launch. This price will be frozen for Kickstarter backers -- you will be able to simply trade-in 100 of your VMX Compute Hours to obtain one single-machine license and download the software to use locally.
If want to run VMX locally, back us at the $100 level, as the $25 level gives you at most 75 Compute Hours.
Support this project
- (42 days)