Twitter icon
Facebook icon
LinkedIn icon
Google icon
Reddit icon
StumbleUpon icon icon

Vision-Based Gesture Controlled Drone with EdgeImpulse

Added to IoTplaybook or last updated on: 07/19/2021
Vision-Based Gesture Controlled Drone with EdgeImpulse



The term “drone” usually refers to any unpiloted aircraft. Sometimes referred to as “Unmanned Aerial Vehicles" (UAVs), these crafts can carry out an impressive range of tasks, ranging from military operations to package delivery. Drones can be as large as an aircraft or as small as the palm of your hand. Originally developed for the military and aerospace industries, drones have found their way into the mainstream because of the enhanced levels of safety and efficiency they bring. These robotic UAVs operate without a pilot on board and with different levels of autonomy. A drone’s autonomy level can range from remotely piloted (a human controls its movements) to advanced autonomy, which means that it relies on a system of sensors and detectors to calculate its movement.

Because drones can be controlled remotely and can be flown at varying distances and heights, they make the perfect candidates for taking on some of the toughest jobs in the world. They can be found assisting in a search for survivors after a hurricane, giving law enforcement and the military an eye-in-the-sky during terrorist situations, and advancing scientific research in some of the most extreme climates on the planet. Drones have even made their way into our homes and serve as entertainment for hobbyists and a vital tool for photographers.

Drones are used for various purposes:

  • Military
  • Delivery
  • Emergency Rescue
  • Outer Space
  • Wildlife and Historical Conservation
  • Medicine
  • Photography etc.

Things used in this project

Software apps and online services

Edge Impulse Studio
Edge Impulse Studio

Edge Impulse


The main motivation behind this project is my curiosity to explore the various control schemes for small-scale drones. The paper "Design and Development of Voice Control System for Micro Unmanned Aerial Vehicles" talks about various drone control methodologies such as Radio, GCS, Gesture, Voice, Joystick, PC, FPV, and Autonomous. In the paper Design and Development of an Android Application for Voice Control of Micro Unmanned Aerial vehicles, it is observed that situational awareness is at a medium level for Radio and Gesture UAV control methods, whereas situational awareness is high for the voice control method. In this project, we will work on vision-based gesture controlling, and later we will go up to voice control and also other advanced controls.

The motivation for this project also raised from the need to implement these different control methods in a low-cost portable and scalable embedded platform with computation at the edge, without relying on external resources for its working.


DJI Tello Drone

The DJI Tello is a small-sized drone that combines powerful technology from DJI and Intel Into a very tiny package. It is a lightweight, fun, and easy-to-use drone that is the perfect tool for learning the ropes of drone piloting before investing in a more expensive option. Tello boasts a 14-core processor from Intel that includes an onboard Movidius Myriad 2 VPU (Video Processing Unit) for advanced imaging and vision processing. It is equipped with a high-quality image processor, for shooting photos and videos. The camera features 5MP (2592x1936) photos and HD720 videos. The drone has a maximum flight time of 13 minutes. This incredibly small drone fits in your palm and only weighs approximately 80g (propellers and battery included). You can control Tello directly via the Tello app or with a supported Bluetooth remote controller connected to the Tello app. The drone is programmable via Python, C++, Scratch, and DroneBlocks.

DJI Ryze Tello

DJI Ryze Tello


  • Weight: Approximately 80 g (with propellers and battery)
  • Dimensions: 98mm*92.5mm*41mm
  • Propeller: 3 inch
  • Built-In Functions: Range Finder, Barometer, LED, Vision System, WIFI 802.11n 2.4G, 720P Live View
  • Port: Micro USB Charging Port
  • Max Flight Distance: 100m
  • Max Speed: 8m/s
  • Max Flight Time: 13min
  • Detachable Battery: 1.1Ah/3.8V
  • Photo: 5MP (2592×1936)
  • FOV: 82.6°
  • Video: HD720P30
  • Format: JPG(Photo); MP4(Video)
  • Electronic Image Stabilization: Yes

Preparing Tello Drone for the project

The Tello drone SDK provides ample information on how to program the drone to achieve the tasks via Tello commands, but are somewhat limited in the features. The Tello SDK connects to the aircraft through a Wi-Fi UDP port, allowing users to control the aircraft with text commands. We use Wi-Fi to establish a connection between the Tello and the M5Stack module. Once powered on Tello acts as Soft AP Wi-Fi ( to accept commands via port 8889.

The Tello SDK includes three basic command types.

Control Commands (xxx)

Returns “ok” if the command was successful.

Returns “error” or an informational result code if the command failed.

Set Command (xxx a) to set new sub-parameter values

Returns “ok” if the command was successful.

Returns “error” or an informational result code if the command failed.

Read Commands (xxx?)

Returns the current value of the sub-parameters.

Even though Tello is pretty maneuverable, with a number of different axes on which we can control the drone, in this project, we will use the following commands.

  • takeoff : Auto takeoff.
  • land : Auto landing.
  • up x : Ascend to “x” cm.
  • down x : Descend to “x” cm.
  • left x : Fly left for “x” cm.
  • right x : Fly right for “x” cm.
  • forward x : Fly forward for “x” cm.
  • back x : Fly backward for “x” cm.

Please refer to the SDK for a full set of commands.

As a safety feature, if there is no command for 15 seconds, the Tello will land automatically.

Tello API

We will use a custom C++ API ctello that would allow us to communicate with the DJI Tello drone via UDP.

Vision-based Gesture Control Method

Gesture Commands

In order to control our Tello drone using vision gestures, we will be using gesture detection. 6 basic gestures are considered for the control (idle, takeoff/land, forward, back, left, right).

A Takeoff command is issued by using a thumbs-up gesture.


A Land command is issued by using a thumbs-down gesture.


A Forward command is issued by using an open palm gesture.


A Backward command is issued by using a closed fist gesture.


A Left command is issued by using a thumbs-left gesture.


A Right command is issued by using a thumbs-right gesture.


Vision-based Gesture Recognition using Edge Impulse

We will use machine learning to build a gesture recognition system that runs on a microcontroller, with the help of Edge Impulse Studio.

Preparing Edge Impulse Studio for the project

  • Give Project name and click Create.

  • Head over to the "Devices" tab from the left menu and choose "Connect a new device".

  • You will be greeted with a variety of device options.

  • To make things simple, let's connect our smartphone device. Since all modern smartphones have onboard accelerometers, it will be easy-peasy.
  • Next, you will be given a QR code and a link to allow the collection of data from your smartphone.

  • Scan this QR code or open the link via your smartphone device.

  • Once the link is opened via your smartphone, the smartphone will show up in the "Devices" section.

Data Collection

For collecting the data for our machine learning model, we will use the camera sensor present onboard our smartphone. To make your machine learning model see it's important that you capture a lot of example images of these objects. When training the model these example images are used to let the model distinguish between them.

  • Once the smartphone is connected to Edge Impulse, head over to the "Data Acquisition" tab.
  • In the mobile phone, Select Collecting images and give access to the camera.
  • On the next screen, give the Label name Eg: takeoff, show the hand gesture in front of the camera and click Capture to begin sampling.

  • Once the device should complete each sampling, it uploads the file back to Edge Impulse. Once sampled, the data will appear in the data acquisition.
  • You see a new line appear under 'Collected data' in the studio.
  • When you click it you now see the raw image.

  • Repeat this process to collect as many samples as we can.
  • Repeat for the other labels takeoff, land, forward, back, left, and right.
  • Include some noise images also.
  • Make sure to perform variations adjustment to the gestures. E.g. do slightly vary the orientation of the gesture. You'll never know how your user will use the device.
  • Once sufficient data is collected, they will be shown under the same tab.

  • Click on each data row to view their raw image.







  • Now that we have sufficient data, we need to split the data into a training dataset and a test dataset.
  • Don't worry. The Edge Impulse Studio makes that easy for us too.
  • Head over to the "Dashboard section" and scroll down to the "Danger Zone".
  • Click in "Rebalance datasheet" to automatically split the dataset into training and test with a ratio of 80/20.

  • Now we have acquired and set up our training data for further processing.
  • Now we have a well-balanced dataset in our Edge Impulse project.
  • We can switch between your training and testing data with the two buttons above the Data collected widget.

Gesture Model Training

Since we have acquired all the data, it's time for us to train the dataset to fit a gesture model and Edge Impulse makes it very easier for us to generate a model without writing a single line of code.

With the training set in place, we can design an impulse. An impulse takes the raw image, uses a pre-processing block to manipulate the image, and then uses a learning block to classify new data. Pre-processing blocks always return the same values for the same input, while learning blocks learn from past experiences.

  • Head over to the "Impulse Design" tab.
  • We will already have the Image data section populated for us.
  • Select an image width of 48x48 and Resize mode as Squash.
  • Now click Add a processing block and select Image.
  • This block takes in the color image, optionally makes the image grayscale, and then turns the data into a features array.
  • The parameters will be auto-populated for us.
  • Now click Add a learning block and select Transfer Learning (Images).
  • The parameters will be auto-populated for us.
  • This block takes these features array and learns to distinguish between the six (idle, takeoff, forward, back, left, right) classes.
  • The Output features block will have all the labels that we have acquired.

  • Now click on Save Impulse to save the configuration.
  • Head over to the Image tab.
  • This will show you the raw data on top of the screen (you can select other files via the drop-down menu), and the results of the pre-processing step on the right.
  • You can use the options to switch between 'RGB' and 'Grayscale' mode, but for now, leave the color depth on 'RGB'.

  • Click Save parameters. This will send you to the Transfer learning screen.
  • Here we will resize all the data, apply the processing block on all this data and create a 3D visualization of the complete dataset.
  • Click Generate features.
  • The Feature explorer will load. This is a plot of all the data in our dataset.

  • Because images have a lot of dimensions (here: 48x48x3=6912 features) we run a process called 'dimensionality reduction' on the dataset before visualizing this.
  • Here the 6912 features are compressed down to just 3 and then clustered based on similarity.
  • Even though we have little data you can already see some clusters forming, and can click on the dots to see which image belongs to which dot.

  • For our dataset, the feature data are more or less separated which is a good sign. In case your features are overlapping, it is better to acquire more data.
  • The page also shows the expected on-device performance with processing time and peak RAM usage for calculating features.

With all data processed it's time to start training a neural network. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. The network that we're training here will take the signal processing data as an input, and try to map this to one of the four classes.

So how does a neural network know what to predict? A neural network consists of layers of neurons, all interconnected, and each connection has a weight. One such neuron in the input layer would be the height of the first peak of the X-axis (from the signal processing block), and one such neuron in the output layer would be takeoff (one of the classes). When defining the neural network all these connections are initialized randomly, and thus the neural network will make random predictions. During training, we then take all the raw data, ask the network to make a prediction, and then make tiny alterations to the weights depending on the outcome (this is why labeling raw data is important).

This way, after a lot of iterations, the neural network learns; and will eventually become much better at predicting new data.

  • Head over to the Transfer learning tab.
  • Set the Number of training cycles to 20Learning rate to 0.0005, and Minimum confidence rating to 0.60. You can play around with these values to adjust the accuracy of the trained model.

  • Leave the other parameters default for now and click Start training.
  • Now the Training Output section gets populated.

  • It displays the accuracy of the network and a confusion matrix. This matrix shows when the network made correct and incorrect decisions.
  • It also shows the expected On-device performance for this model.
  • Now that we have generated the model, we need to test it.

Gesture Model Testing

  • Head over to the Model Testing tab.
  • We can see our training dataset here. Click Classify all.

  • This will generate the model validation outcome using the training data that was unknown to the model. We can see that our trained model was able to classify with an accuracy of 100% which is very good considering the small amount of training data fed to the model in the training section.
  • It also shows which labels were incorrectly predicted.
  • By checking these results in Feature explorer, we can understand if any labels were misclassified and use more training data to re-train our model for better classification of those data.
  • You can also do a live classification of data from the smartphone from the Live classification tab. Your device should show as online under Classify new data. Set the Sensor as Camera, click Start sampling, and start capturing the sample.

  • Afterward, you'll get a full report on what the network thought that you did.
  • Now that we have trained and tested our model, let's deploy it.

Gesture Model Deployment

With the impulse designed, trained, and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the signal processing code, neural network weights, and classification code - up in a single C++ library that you can include in your embedded software.

  • Head over to the Deployment tab.
  • Select C++ library.
  • If you need the build for a specific Edge Impulse supported hardware, under Build firmware select your development board
  • Click Build. This will export the impulse, and build a library that will run on the development board in a single step.
  • We will see a pop-up with text and video instructions on how to deploy the model to our device.

  • After the build is completed you'll get prompted to download the library zip filImage Propertiese.
  • Save the zip file to our project directory.


Now that we have prepared our drone and the gesture model, let's interface everything together in code.

The complete interfacing code is provided in the Code section of this project tutorial.

Get the code here:

Run the following:

$ git clone
$ cd Tello_EI_vision_gesture_control
$ APP_CAMERA=1 make -j4

Now connect your laptop to Tello Wifi and execute the binary to start the control.

$ ./build/Tello_EI_vision_gesture_control 0


Let us now test the gesture control and see how well it works.

The following is a quick test for takeoff and land vision gestures.


Although the inference engine was not able to classify between some gestures accurately, overall the performance was satisfying. Also, there were some cases where the gesture command was misclassified.

We believe these issues can be better addressed by adding more training datasets and making the model more flexible.

The model is already optimized for processing on a low-power, resource-constraint module such as ESP32-CAM.

What next !!

  • Port the model to a low-power module such as ESP32-CAM (in progress).
  • Training the model with more test data for more accurate classification.
  • Use more diverse gestures.




Block diagram for vision-based gesture control of DJI tello drone using Edge Impulse

Code - Vision-based Gesture Controlled Drone

Codebase for vision-based gesture control of DJI tello drone via Edge Impulse


Cris Thomas

Cris Thomas 

24 projects • 69 followers

Electronics and Aerospace engineer with a dedicated history in Research and Development.

Jiss Joseph Thomas

Jiss Joseph Thomas 

21 projects • 45 followers

M.Tech student in Artificial Intelligence and Data Science at Amrita School of Engineering, Amrita Vishwa Vidyapeetham University

This content is provided by our content partner, an Avnet developer community for learning, programming, and building hardware. Visit them online for more great content like this.

This article was originally published at It was added to IoTplaybook or last modified on 07/19/2021.