Posted by Nari Yoon, Hee Jung, DevRel Community Manager / Soonson Kwon, DevRel Program Manager
Let’s explore highlights and accomplishments of vast Google Machine Learning communities over the first quarter of the year! We are enthusiastic and grateful about all the activities that the communities across the globe do. Here are the highlights!
ML Ecosystem Campaign Highlights
ML Olympiad is an associated Kaggle Community Competitions hosted by Machine Learning Google Developers Experts (ML GDEs) or TensorFlow User Groups (TFUGs) sponsored by Google. The first round was hosted from January to March, suggesting solving critical problems of our time. Competition highlights include Autism Prediction Challenge, Arabic_Poems, Hausa Sentiment Analysis, Quality Education, Good Health and Well Being. Thank you TFUG Saudi, New York, Guatemala, São Paulo, Pune, Mysuru, Chennai, Bauchi, Casablanca, Agadir, Ibadan, Abidjan, Malaysia and ML GDE Ruqiya Bin Safi, Vinicius Fernandes Caridá, Yogesh Kulkarni, Mohammed buallay, Sayed Ali Alkamel, Yannick Serge Obam, Elyes Manai, Thierno Ibrahima DIOP, Poo Kuan Hoong for hosting ML Olympiad!
Highlights and Achievements of ML Communities
TFUG organizer Ali Mustufa Shaikh (TFUG Mumbai) and Rishit Dagli won the TensorFlow Community Spotlight award (paper and code). This project was supported by provided Google Cloud credit.
ML GDE Ngoc Ba (Vietnam) posted MTet: Multi-domain Translation for English and Vietnamese. This project is about how to collect high quality data and train a state-of-the-art neural machine translation model for Vietnamese. And it utilized Google Cloud TPU, Cloud Storage and related GCP products for faster training.
ML GDE Margaret Maynard-Reid (USA), Nived P A, and Joel Shor posted Our Summer of Code Project on TF-GAN. This article describes enhancements made to the TensorFlow GAN library (TF-GAN) of the last summer.
ML GDE Aakash Nain (India) released a series of tutorials about building models in JAX. In the second tutorial, Aakash uses one of the most famous and most widely used high-level libraries for Jax to build a classifier. In the notebook, you will be taking a deep dive into Flax, too.
ML GDE Bhavesh Bhatt (India) built a model for braille to audio with 95% accuracy. He created a model that translates braille to text and audio, lending a helping hand to people with visual disabilities.
Posted by Ankita Tripathi, Community Manager (Dev Library)
Witnessing a plethora of open-source enthusiasts in the developer ecosystem in recent years gave birth to the idea of Google’s Dev Library. The inception of the platform happened in June 2021 with the only objective of giving visibility to developers who have been creating and building projects relentlessly using Google technologies. But why the Dev Library?
Why Dev Library?
Open-source communities are currently at a boom. The past 3 years have seen a surge of folks constantly building in public, talking about open-source contributions, digging into opportunities, and carving out a valuable portfolio for themselves. The idea behind the Dev Library as a whole was also to capture these open-source projects and leverage them for the benefit of other developers.
This platform acted as a gold mine for projects created using Google technologies (Android, Angular, Flutter, Firebase, Machine Learning, Google Assistant, Google Cloud).
With the platform, we also catered to the burning issue – creating a central place for the huge number of projects and articles scattered across various platforms. Therefore, the Dev Library became a one-source platform for all the open source projects and articles for Google technologies.
How can you use the Dev Library?
“It is a library full of quality projects and articles.”
External developers cannot construe Dev Library as the first platform for blog posts or projects, but the vision is bigger than being a mere platform for the display of content. It envisages the growth of developers along with tech content creation. The uniqueness of the platform lies in the curation of its submissions. Unlike other platforms, you don’t get your submitted work on the site by just clicking ‘Submit’. Behind the scenes, Dev Library has internal Google engineers for each product area who:
thoroughly assess each submission,
check for relevancy, freshness, and quality,
approve the ones that pass the check, and reject the others with a note.
It is a painstaking process, and Dev Library requires a 4-6 week turnaround time to complete the entire curation procedure and get your work on the site.
What we aim to do with the platform:
Provide visibility: Developers create open-source projects and write articles on platforms to bring visibility to their work and attract more contributions. Dev Library’s intention is to continue to provide this amplification for the efforts and time spent by external contributors.
Kickstart a beginner’s open-source contribution journey: The biggest challenge for a beginner to start applying their learnings to build Android or Flutter applications is ‘Where do I start my contributions from’? While we see an open-source placard unfurled everywhere, beginners still struggle to find their right place. With the Dev Library, you get a stack of quality projects hand-picked for you keeping the freshness of the tech and content quality intact. For example, Tomas Trajan, a Dev Library contributor created an Angular material starter project where they have ‘good first issues’ to start your contributions with.
Recognition: Your selection of the content on the Dev Library acts as recognition to the tiring hours you’ve put in to build a running open-source project and explain it well. Dev Library also delivers hero content in their monthly newsletter, features top contributors, and is in the process to gamify the developer efforts. As an example, one of our contributors created a Weather application using Android and added a badge ‘Part of Dev Library’.
With your contributions at one place under the Author page, you can use it as a portfolio for your work while simultaneously increasing your chances to become the next Google Developer Expert (GDE).
Features on the platform
Keeping developers in mind, we’ve updated features on the platform as follows:
Added a new product category; Google Assistant – All Google Assistant and Smart home projects now have a designated category on the Dev Library.
Integrated a new way to make submissions across product areas via the Advocu form.
Introduced a special section to submit Cloud Champion articles on Google Cloud.
Included displays on each Author page indicating the expertise of individual contributors
Upcoming: An expertise filter to help you segment out content based on Beginner, Intermediate, or Expert levels.
To submit your idea or suggestion, refer to this form, and put down your suggestions.
Contributor Love
Dev Library as a platform is more about the contributors who lie on the cusp of creation and consumption of the available content. Here are some contributors who have utilized the platform their way. Here’s how the Dev Library has helped along their journey:
Roaa Khaddam: Roaa is a Senior Flutter Mobile Developer and Co-Founder at MultiCaret Inc.
How has the Dev Library helped you?
“It gave me the opportunity to share what I created with an incredible community and look at the projects my fellow Flutter mates have created. It acts as a great learning resource.”
Somkiat Khitwongwattana: Somkiat is an Android GDE and a consistent user of Android technology from Thailand.
How has the Dev Library helped you?
“I used to discover new open source libraries and helpful articles for Android development in many places and it took me longer than necessary. But the Dev Library allows me to explore these useful resources in one place.”
Kevin Kreuzer: Kevin is an Angular developer and contributes to the community in various ways.
How has the Dev Library helped you?
“Dev Library is a great tool to find excellent Angular articles or open source projects. Dev Library offers a great filtering function and therefore makes it much easier to find the right open source library for your use case.”
What started as a platform to highlight and showcase some open-source projects has grown into a product where developers can share their learnings, inspire others, and contribute to the ecosystem at large.
Do you have an Open Source learning or project in the form of a blog or GitHub repo you’d like to share? Please submit it to the Dev Library platform. We’d love to add you to our ever growing list of developer contributors!
Posted by Hyunkil Kim, Software Quality Engineer at Line Corp.
This article is written by Hyunkil Kim who participated in the Machine Learning Bootcamp which is a machine learning training program conducted in Korea to nurture next-generation ML engineers and help them to find jobs.
As a developer, I had developed a certain level of curiosity about machine learning. I had also heard that many former developers were switching their specialization over to machine learning. Thus, I signed up for the <Google Machine Learning Bootcamp>, thinking it would be a good chance to get my feet wet.
I was a bit nervous and excited at the same time after getting the acceptance notification. Wondering if I should go over my Python skills one more time in preparation, I installed the newest version of TensorFlow on my machine. I also skimmed through documents on the basics of machine learning. Those were all unnecessary. To put it bluntly, I had to relearn everything from scratch over the course of the bootcamp. It was quite challenging to be introduced to new concepts I wasn’t familiar with, such as functional API and the concept of functional programming in general, various visualization libraries, and data processing frameworks and services that were new to me. I worked very hard with the mindset of starting fresh.
Journey to Becoming a Machine Learning Engineer
There were three main objectives for the participants: completing the Deep Learning Specialization on Coursera which is based on TensorFlow, acquiring ML certifications(TensorFlow certificate or Google Cloud ML(or Data Science) Engineer certification), and participating in Kaggle competitions. Google Developers team provided the course fee for Coursera and the certification fee and offered many benefits to those who completed the course. You could really make it worth your while as long as you took the initiative and applied your passion.
<Coursera Deep Learning Specialization>
The Coursera class is based on TensorFlow 2.x and requires watching a set amount of instructor Andrew Ng’s lectures on AI every week with screenshots and proof. It was pretty tough at first as the lectures were not in Korean. However, because the class was so famous, I was able to find posts on the internet that broke down the lectures and made them easier to understand. The class also provided reference links, so you could study more on your own once you got used to the class.
While this is not really related to the Coursera class, I also participated in online coding meetups by the bootcamp participants in-between classes as in the picture below, and it was a memorable experience. These are basically sessions held in coffee shops or study rooms where people got together and worked individually on their own coding projects in normal times. Because of the pandemic, we could not meet in person obviously and used Google Meet or Gather town and left our cameras on as we coded. It felt like I was studying with other people, and I liked the solidarity of relating to others.
<Machine Learning Certifications>
You were required to acquire at least one certification during the bootcamp. I chose to work on the GCP ML Engineer certification. As I used Google Cloud, I had wondered how ML services could be used on cloud. Coursera happened to have a specialization program for the GCP ML certification, so I took it, too. However, in the end, Google’s website offering GCP AI operations and use cases helped me more with the certification than the course on Coursera.
<Kaggle Competition>
I didn’t get to spend as much time on Kaggle. I didn’t see any current projects that interested me, so I tried the TPS to review what I had learned so far. TPS stands for Tabular Playground Series, which is a beginner-to-intermediate level competition for new-ish Kagglers that are just getting the hang of it. You’re required to predict the value of the target from the provided tabular data. It is slightly more difficult than Titanic Survival Predictions, which is a beginner competition. I chose this competition because I figured it would be a good practice of things I had learned so far, like data analysis, feature engineering, and hyperparameter tuning.
This was the part where I personally felt like I could have done better. I had many ideas for improving the model or enhancing the performance, but it took way more time to apply and experiment with them than I had expected. If I had known that model learning would take this much time, I would have started working on Coursera, the certification, and the Kaggle competition all at once from the beginning. Maybe I was too nervous about entering a Kaggle competition and put it off until the end. I should have just tried without getting so nervous. I hesitated too long and ended up regretting it a little too late.
<Tech Talk and Career Talk>
The bootcamp also included many other activities, including a weekly Tech Talk on specific themes and recruiting sessions of potential employers. Companies looking for ML talents were invited and had a chance to introduce themselves, explain the available positions, and take questions about joining their workforce. Some companies sent their current Machine Learning engineers to explain how they solved business problems with which models or what kind of data. Some companies focused more on describing the type of people they were looking for in detail. I didn’t know at the time, but I heard that some of the speakers were big names in the industry. Personally, I found these talks very helpful in terms of both finding employment and familiarizing myself with the trends in the industry. The sessions were very inspiring as new ideas kept flowing as I heard about applications of technologies I only knew in theory or thought about what kind of investments in AI would be promising.
Besides the Tech Talks, there were also more relaxed sessions for things like career consultation and resume/CV reviews. There were even sessions by the Googlers, where they personally answered participants’ questions and offered some advice. As I attended various sessions, I noticed that the bootcamp crew and many Tech Talk speakers from hiring companies offered authentic and valuable advice and were very eager to help out the bootcamp participants. Nobody talked about the cold reality of the world out there. Knowing how rare it is to find mentors that offer genuinely constructive feedback and guidance, I personally was very touched and grateful about that.
Concluding the Machine Learning Bootcamp.
The Google Machine Learning Bootcamp captured the essence of what it would be like to work for Google. I felt like they expected you to take your own initiative to do what you wanted. They showed that they were willing and able to help you grow as much as possible as long as you did your best. For example, one of the world’s most famous programmers Jeff Dean was at the kickoff session, and there was even an AMA session with Laurence Moroney, who had developed the training course for TensorFlow. They also allowed maximum freedom about finding teammates for the Kaggle competitions so that you didn’t have to worry about having to carry your team. Things covered in the Tech Talks or recruitment sessions were not included in assignments. They let the participants do their thing freely while promising the best support possible in the industry if needed. I could see how some people would find it too lax that Google lets you study on your own at your own pace.
I think this was a rare chance to meet people from various backgrounds with the common goal of becoming machine learning engineers or developers. It was a unique experience where I got to talk and study with good people and even do something strange like the online coding meetup. There were also times when I was vainly taking pride in what little knowledge I had, but I ended up putting a lot of work into the bootcamp, wanting to make the most of it and to come ahead of others.
In the end, the take-home message is to “try anything.”
Personally, I was very happy with the experience. I got to be a little more comfortable with machine learning. As a result, I’m able to pay more attention to details related to machine learning at my new job. The challenge of facing something new is a constant of a developer’s life. Still, participating in this bootcamp felt especially meaningful to me, and I enjoyed it thoroughly.
While the bootcamp is over, I heard that some participants are still continuing with their study groups or projects. Wanting to study as a group myself, I also had asked around and volunteered to join a study group, but I ended up studying alone because none of the groups covered the area I was interested in. Even so, many people sharing useful information on Slack helped me as I studied alone, and they are still helping me even after the bootcamp.
At any rate, I keep coming up with various ideas that I want to try in my current job or as a personal project. It feels like I found a new toy that I can have fun with for a while without getting tired of it. I think I’ll start slowly with a small toy project.
Posted by HyeJung Lee, DevRel Community Manager and Soonson Kwon, DevRel Program Manager
Let’s explore highlights and achievements of vast Google Machine Learning communities over the last quarter of last year! We are excited and grateful about all the activities that the communities across the globe do.
On Dec 12th, ML GDE Paolo Galeone started to solve puzzles of the Advent of Code series using “pure TensorFlow” (without any other library). His solution has been updated in a series of 12 on his blog. He explained how he designed the solutions, how he implemented them, and – when needed – focused on some TensorFlow features not widely used. (Day 1, Day 2, Day 3, Day 4, Day 5, Day 6, Day 7, Day 8, Day 9, Day 10, Day 11, Day 12, Wrap up)
TFUG organizer Ali Mustufa Shaikh and Rishit Dagli released “CPPE-5: Medical Personal Protective Equipment Dataset” (paper, code). This paper got featured on Google Research TRC’s publication section on January 5, 2022.
ML GDE Elyes Manai from Tunisia wrote an article “The ability to change people’s lives and leave one’s mark“. Are you facing difficulties growing in constrained environments? And do you think you’re not a first-class student and you don’t have connections in the industry? Then, check out Elyes’s story. He shared how Google helped him accelerate his impact.
Annotated Research Papers by ML GDE Aakash Kumar Nain (India) is an effort to make papers more accessible to a wider community. It also supports the web version and includes papers from Google Research and etc. This repository is popular enough to have a +2k star and a +200 fork.
ML DevFest 2021 by GDG Cloud San Francisco. There are 5 sessions that walk you through framing ML problems, researching ML, building proofs of concepts using existing ML APIs and models, building ML pipelines and etc. ML GDE Vikram Tiwari (USA) presented Vertex, ML Ops and GCP.
Krupal Modi (India)’s blog article and #IamaGDE video shares how he’s been leading the machine learning initiatives at Haptik, a conversational AI platform, and how the team paired with the Indian Government and WhatsApp to build a COVID-19 helpline.
Leigh Johnson from USA is the founder of Print Nanny, an automated failure detection system and monitoring system for 3D printers. Meet Leigh in this blog and video!
We are happy to announce ML Olympiad, an associated Kaggle Community Competitions hosted by Machine Learning Google Developer Experts (ML GDE) and TensorFlow User Group (TFUG).
Kaggle recently announced “Community Competitions” allowing anyone to create and host a competition at no cost. And our proud members of ML communities decided to dive in and take advantage of the feature to solve critical issues of our time, providing opportunities to train developers.
Why the ML Olympiad?
To train ML for developers leveraging Kaggle’s community competition. This is an opportunity for the participants to practice ML. This is the first 2022 global campaign of the ML Ecosystem team and this helps build stronger communities.
Competition will be focused on the Enem (National High School Examination) data. Competitors will have to create models to predict student scores in multiple tests.
Hosts: Vinicius Fernandes Caridá (ML GDE), Pedro Gengo, Alex Fernandes Mansano / Tensorflow User Group São Paulo
The aim of this competition is to build a multi-class classification model capable of accurately predicting the most suitable driver for one or several given orders based on the destination of the order and the paths covered by the deliverers.
Google Developers support ML Olympiad by providing swag for top 3 winners of each competition. Find your interest among the competitions, join/share them, and get your part of the swag for competition winners!
Posted by Álvaro Lamas, Héctor Parra, Jaime Martínez, Julia Hernández, Miguel Fernandes, Pablo Gil
Acquiring high value customers using predicted Lifetime Value, taking specific actions on high propensity of churn users, generating and activating audiences based on machine learning processed signals…All of those marketing scenarios require of analyzing first party data, performing predictions on the data and activating the results into the different marketing platforms like Google Ads as frequently as possible to keep the data fresh.
Feeding marketing platforms like Google Ads on a regular and frequent basis, requires a robust, report oriented and cost reduced ETL & prediction pipeline. These pipelines are very similar regardless of the use case and it’s very easy to fall into reinventing the wheel every time or manually copy & paste structural code increasing the risk of introducing errors.
Wouldn’t it be great to have a common reusable structure and just add the specific code for each of the stages?
Here is where Prediction Framework plays a key role in helping you implement and accelerate your first-party data prediction projects by providing the backbone elements of the predictive process.
Prediction Framework is a fully customizable pipeline that allows you to simplify the implementation of prediction projects. You only need to have the input data source, the logic to extract and process the data and a Vertex AutoML model ready to use along with the right feature list, and the framework will be in charge of creating and deploying the required artifacts. With a simple configuration, all the common artifacts of the different stages of this type of projects will be created and deployed for you: data extraction, data preparation (aka feature engineering), filtering, prediction and post-processing, in addition to some other operational functionality including backfilling, throttling (for API limits), synchronization, storage and reporting.
The Prediction Framework was built to be hosted in the Google Cloud Platform and it makes use of Cloud Functions to do all the data processing (extraction, preparation, filtering and post-prediction processing), Firestore, Pub/Sub and Schedulers for the throttling system and to coordinate the different phases of the predictive process, Vertex AutoML to host your machine learning model and BigQuery as the final storage of your predictions.
Prediction Framework Architecture
To get involved and start using the Prediction Framework, a configuration file needs to be prepared with some environment variables about the Google Cloud Project to be used, the data sources, the ML model to make the predictions and the scheduler for the throttling system. In addition, custom queries for the data extraction, preparation, filtering and post-processing need to be added in the deploy files customization. Then, the deployment is done automatically using a deployment script provided by the tool.
Once deployed, all the stages will be executed one after the other, storing the intermediate and final data in the BigQuery tables:
Extract: this step will, on a timely basis, query the transactions from the data source, corresponding to the run date (scheduler or backfill run date) and will store them in a new table into the local project BigQuery.
Prepare: immediately after the extract of the transactions for one specific date is available, the data will be picked up from the local BigQuery and processed according to the specs of the model. Once the data is processed, it will be stored in a new table into the local project BigQuery.
Filter: this step will query the data stored by the prepare process and will filter the required data and store it into the local project BigQuery. (i.e only taking into consideration new customers transactionsWhat a new customer is up to the instantiation of the framework for the specific use case. Will be covered later).
Predict: once the new customers are stored, this step will read them from BigQuery and call the prediction using Vertex API. A formula based on the result of the prediction could be applied to tune the value or to apply thresholds. Once the data is ready, it will be stored into the BigQuery within the target project.
Post_process: A formula could be applied to the AutoML batch results to tune the value or to apply thresholds. Once the data is ready, it will be stored into the BigQuery within the target project.
One of the powerful features of the prediction framework is that it allows backfilling directly from the BigQuery user interface, so in case you’d need to reprocess a whole period of time, it could be done in literally 4 clicks.
In summary: Prediction Framework simplifies the implementation of first-party data prediction projects, saving time and minimizing errors of manual deployments of recurrent architectures.
For additional information and to start experimenting, you can visit the Prediction Framework repository on Github.
Posted by HyeJung Lee, DevRel Community Manager and Soonson Kwon, DevRel Program Manager
Let’s explore highlights and achievements of vast Google Machine Learning communities by region for the last quarter. Activities of experts (GDE, professional individuals), communities (TFUG, TensorFlow user groups), students (GDSC, student clubs), and developers groups (GDG) are presented here.
Key highlights
30 days of ML with Kaggle is designed to help beginners study ML using Kaggle Learn courses as well as a competition specifically for the participants of this program. Collaborated with the Kaggle team so that +30 the ML GDEs and TFUG organizers participated as volunteers as online mentors as well as speakers for this initiative.
GDE Minori MATSUDA (Japan)’s project on Coca-Cola Bottlers Japan was published on Google Cloud Japan Blog covering creating an ML pipeline to deploy into real business within 2 months by using Vertex AI. This is also published on GCP blog in English.
GDE Chansung Park (Korea) and Sayak Paul (India) published many articles on GCP Blog. First, “Image search with natural language queries” explained how to build a simple image parser from natural language inputs using OpenAI’s CLIP model. From this second “Model training as a CI/CD system: (Part I, Part II)” post, you can learn more about why having a resilient CI/CD system for your ML application is crucial for success. Last, “Dual deployments on Vertex AI” talks about end-to-end workflow using Vertex AI, TFX and Kubeflow.
In China, GDE Junpeng Ye used TensorFlow 2.x to significantly reduce the codebase (15k → 2k) on WeChat Finder which is a TikTok alternative in WeChat. GDE Dan lee wrote an article on Understanding TensorFlow Series: Part 1, Part 2, Part 3-1, Part 3-2, Part 4
GDE Matthew Kelcey from Australia gave a talk on JAX at PyConAU event. Mat gave an overview to fundamentals of JAX and an intro to some of the libraries being developed on top.
In Singapore, TFUG Singapore dived back into some of the latest papers, techniques, and fields of research that are delivering state-of-the-art results in a number of fields. GDE Martin Andrews included a brief code walkthrough for the released PerceiverIO code at perceiver– highlighting what JAX looks like, how Haiku relates to Sonnet, but also the data loading stuff which is done via tf.data.
GDE Aakash Nain has published the TF-JAX tutorial series from Part 4 to Part 8. Part 4 gives a brief introduction about JAX (What/Why), and DeviceArray. Part 5 covers why pure functions are good and why JAX prefers them. Part 6 focuses on Pseudo Random Number Generation (PRNG) in Numpy and JAX. Part 7 focuses on Just In Time Compilation (JIT) in JAX. And Part 8 covers vmap and pmap.
GDE Sayak Paul and Soumik Rakshit shared a new NLP dataset for multi-label text classification. The dataset consists of paper titles, abstracts, and term categories scraped from arXiv.
North America
During the GSoC (Google Summer of Code), some GDEs mentored or co-mentored students. GDE Margaret Maynard-Reid (USA) mentored TF-GAN, Model Garden, TF Hub and TFLite products. You can get some of her experience and tips from the GDE Blog. And you can find GDE Sayak Paul (India) and Googler Morgan Roff’s GSoC experience in (co-)mentoring TensorFlow and TF Hub as well.
On the other side of the world, in Brazil, GDE Hugo Zanini Gomes wrote an article about “Custom object detection in the browser using TensorFlow.js” using the TensorFlow 2 Object Detection API and Colab was posted on the TensorFlow blog.
Data Pipelines for ML was talked about by GDE Nathaly Alarcon Torrico from Bolivia explained all the phases involved in the creation of ML and Data Science products, starting with the data collection, transformation, storage and Product creation of ML models.
TechTalk “Machine Learning Competitivo: Top 1% en Kaggle (Video)“ was hosted by TFUG Santiago (Chile). In this talk the speaker gave a tour of the steps to follow to generate a model capable of being in the top 1% of the Kaggle Leaderboard. The focus was on showing the libraries and“ tricks ”that are used to be able to test many ideas quickly both in implementation and in execution and how to use them in productive environments.
Please note that the information, uses, and applications expressed in the below post are solely those of our guest author, Neurons Lab, and not necessarily those of Google.
How the idea emerged
With the advancement of technology, drones have become not only smaller, but also have more compute. There are many examples of iPhone-sized quadcopters in the consumer drone market and the computing power to do live tracking while recording 4K video. However, the most important element has not changed much – the controller. It is still bulky and not intuitive for beginners to use. There is a smartphone with on-display control as an option; however, the control principle is still the same.
That is how the idea for this project emerged: a more personalised approach to control the drone using gestures. ML Engineer Nikita Kiselov (me) together with consultation from my colleagues at Neurons Lab undertook this project.
Figure 1: [GIF] Demonstration of drone flight control via gestures using MediaPipe Hands
Why use gesture recognition?
Gestures are the most natural way for people to express information in a non-verbal way. Gesture control is an entire topic in computer science that aims to interpret human gestures using algorithms. Users can simply control devices or interact without physically touching them. Nowadays, such types of control can be found from smart TV to surgery robots, and UAVs are not the exception.
Although gesture control for drones have not been widely explored lately, the approach has some advantages:
No additional equipment needed.
More human-friendly controls.
All you need is a camera that is already on all drones.
With all these features, such a control method has many applications.
Flying action camera. In extreme sports, drones are a trendy video recording tool. However, they tend to have a very cumbersome control panel. The ability to use basic gestures to control the drone (while in action) without reaching for the remote control would make it easier to use the drone as a selfie camera. And the ability to customise gestures would completely cover all the necessary actions.
This type of control as an alternative would be helpful in an industrial environment like, for example, construction conditions when there may be several drone operators (gesture can be used as a stop-signal in case of losing primary source of control).
The Emergencies and Rescue Services could use this system for mini-drones indoors or in hard-to-reach places where one of the hands is busy. Together with the obstacle avoidance system, this would make the drone fully autonomous, but still manageable when needed without additional equipment.
Another area of application is FPV (first-person view) drones. Here the camera on the headset could be used instead of one on the drone to recognise gestures. Because hand movement can be impressively precise, this type of control, together with hand position in space, can simplify the FPV drone control principles for new users.
However, all these applications need a reliable and fast (really fast) recognition system. Existing gesture recognition systems can be fundamentally divided into two main categories: first – where special physical devices are used, such as smart gloves or other on-body sensors; second – visual recognition using various types of cameras. Most of those solutions need additional hardware or rely on classical computer vision techniques. Hence, that is the fast solution, but it’s pretty hard to add custom gestures or even motion ones. The answer we found is MediaPipe Hands that was used for this project.
Overall project structure
To create the proof of concept for the stated idea, a Ryze Tello quadcopter was used as a UAV. This drone has an open Python SDK, which greatly simplified the development of the program. However, it also has technical limitations that do not allow it to run gesture recognition on the drone itself (yet). For this purpose a regular PC or Mac was used. The video stream from the drone and commands to the drone are transmitted via regular WiFi, so no additional equipment was needed.
To make the program structure as plain as possible and add the opportunity for easily adding gestures, the program architecture is modular, with a control module and a gesture recognition module.
Figure 2: Scheme that shows overall project structure and how videostream data from the drone is processed
The application is divided into two main parts: gesture recognition and drone controller. Those are independent instances that can be easily modified. For example, to add new gestures or change the movement speed of the drone.
Video stream is passed to the main program, which is a simple script with module initialisation, connections, and typical for the hardware while-true cycle. Frame for the videostream is passed to the gesture recognition module. After getting the ID of the recognised gesture, it is passed to the control module, where the command is sent to the UAV. Alternatively, the user can control a drone from the keyboard in a more classical manner.
So, you can see that the gesture recognition module is divided into keypoint detection and gesture classifier. Exactly the bunch of the MediaPipe key point detector along with the custom gesture classification model distinguishes this gesture recognition system from most others.
Gesture recognition with MediaPipe
Utilizing MediaPipe Hands is a winning strategy not only in terms of speed, but also in flexibility. MediaPipe already has a simple gesture recognition calculator that can be inserted into the pipeline. However, we needed a more powerful solution with the ability to quickly change the structure and behaviour of the recognizer. To do so and classify gestures, the custom neural network was created with 4 Fully-Connected layers and 1 Softmax layer for classification.
Figure 3: Scheme that shows the structure of classification neural network
This simple structure gets a vector of 2D coordinates as an input and gives the ID of the classified gesture.
Instead of using cumbersome segmentation models with a more algorithmic recognition process, a simple neural network can easily handle such tasks. Recognising gestures by keypoints, which is a simple vector with 21 points` coordinates, takes much less data and time. What is more critical, new gestures can be easily added because model retraining tasks take much less time than the algorithmic approach.
To train the classification model, dataset with keypoints` normalised coordinates and ID of a gesture was used. The numerical characteristic of the dataset was that:
3 gestures with 300+ examples (basic gestures)
5 gestures with 40 -150 examples
All data is a vector of x, y coordinates that contain small tilt and different shapes of hand during data collection.
Figure 4: Confusion matrix and classification report for classification
We can see from the classification report that the precision of the model on the test dataset (this is 30% of all data) demonstrated almost error-free for most classes, precision > 97% for any class. Due to the simple structure of the model, excellent accuracy can be obtained with a small number of examples for training each class. After conducting several experiments, it turned out that we just needed the dataset with less than 100 new examples for good recognition of new gestures. What is more important, we don’t need to retrain the model for each motion in different illumination because MediaPipe takes over all the detection work.
Figure 5: [GIF] Test that demonstrates how fast classification network can distinguish newly trained gestures using the information from MediaPipe hand detector
From gestures to movements
To control a drone, each gesture should represent a command for a drone. Well, the most excellent part about Tello is that it has a ready-made Python API to help us do that without explicitly controlling motors hardware. We just need to set each gesture ID to a command.
Figure 6: Command-gesture pairs representation
Each gesture sets the speed for one of the axes; that’s why the drone’s movement is smooth, without jitter. To remove unnecessary movements due to false detection, even with such a precise model, a special buffer was created, which is saving the last N gestures. This helps to remove glitches or inconsistent recognition.
The fundamental goal of this project is to demonstrate the superiority of the keypoint-based gesture recognition approach compared to classical methods. To demonstrate all the potential of this recognition model and its flexibility, there is an ability to create the dataset on the fly … on the drone`s flight! You can create your own combinations of gestures or rewrite an existing one without collecting massive datasets or manually setting a recognition algorithm. By pressing the button and ID key, the vector of detected points is instantly saved to the overall dataset. This new dataset can be used to retrain classification network to add new gestures for the detection. For now, there is a notebook that can be run on Google Colab or locally. Retraining the network-classifier takes about 1-2 minutes on a standard CPU instance. The new binary file of the model can be used instead of the old one. It is as simple as that. But for the future, there is a plan to do retraining right on the mobile device or even on the drone.
Figure 7: Notebook for model retraining in action
Summary
This project is created to make a push in the area of the gesture-controlled drones. The novelty of the approach lies in the ability to add new gestures or change old ones quickly. This is made possible thanks to MediaPipe Hands. It works incredibly fast, reliably, and ready out of the box, making gesture recognition very fast and flexible to changes. Our Neuron Lab`s team is excited about the demonstrated results and going to try other incredible solutions that MediaPipe provides.
We will also keep track of MediaPipe updates, especially about adding more flexibility in creating custom calculators for our own models and reducing barriers to entry when creating them. Since at the moment our classifier model is outside the graph, such improvements would make it possible to quickly implement a custom calculator with our model into reality.
Another highly anticipated feature is Flutter support (especially for iOS). In the original plans, the inference and visualisation were supposed to be on a smartphone with NPUGPU utilisation, but at the moment support quality does not satisfy our requests. Flutter is a very powerful tool for rapid prototyping and concept checking. It allows us to throw and test an idea cross-platform without involving a dedicated mobile developer, so such support is highly demanded.
Nevertheless, the development of this demo project continues with available functionality, and there are already several plans for the future. Like using the MediaPipe Holistic for face recognition and subsequent authorisation. The drone will be able to authorise the operator and give permission for gesture control. It also opens the way to personalisation. Since the classifier network is straightforward, each user will be able to customise gestures for themselves (simply by using another version of the classifier model). Depending on the authorised user, one or another saved model will be applied. Also in the plans to add the usage of Z-axis. For example, tilt the palm of your hand to control the speed of movement or height more precisely. We encourage developers to innovate responsibly in this area, and to consider responsible AI practices such as testing for unfair biases and designing with safety and privacy in mind.
We highly believe that this project will motivate even small teams to do projects in the field of ML computer vision for the UAV, and MediaPipe will help to cope with the limitations and difficulties on their way (such as scalability, cross-platform support and GPU inference).
If you want to contribute, have ideas or comments about this project, please reach out to [email protected], or visit the GitHub page of the project.
This blog post is curated by Igor Kibalchich, ML Research Product Manager at Google AI.
Posted by HyeJung Lee, MJ You, ML Ecosystem Community Managers
Google Developers Experts (GDE) is a community of passionate developers who love to share their knowledge with others. Many of them specialize in Machine Learning (ML).
Here are some highlights showcasing the ML GDEs achievements from last quarter, which contributed to the global ML ecosystem. If you are interested in becoming an ML GDE, please scroll down to see how you can apply!
Leigh Johnson (USA) wrote an article titled Soft-launching an AI/ML Product as a Solo Founder, covering GCP AutoML Vision, GCP IoT Core, TensorFlow Model Garden, and TensorFlow.js. The article details the journey of a solo founder developing an ML product for detecting printing failure for 3D printers (more on this story is coming up soon, so stay tuned!)
Aqsa Kausar (Pakistan) gave a talk about Explainable AI in Google Cloud at the International Women’s Day Philippines event. She explained why it is important and where and how it is applied in ML workflows.
Finally, ML Lab by Robert John from Nigeria, introduces the ML landscape on GCP covering from BigQueryML through AutoML to TensorFlow and AI Platform.
Greece-based George Soloupis wrote a tutorial “Fine-tune a BERT model with the use of Colab TPU” on how to finetune a BERT model that was trained specifically on greek language to perform the downstream task of text classification, using Colab’s TPU (v2–8).
JAX
India-based Aakash Nain has published the TF-JAX tutorial series (Part1, Part2, Part3, Part 4), aiming to teach everyone the building blocks of TensorFlow and JAX frameworks.
Online Meetup TensorFlow and JAX by Tzer-jen Wei from Taiwan covered JAX intro and use cases. It also touched upon different ways of writing TensorFlow models and training loops.
Notebook “Simple Bayesian Ridge with Sentence Embeddings” by Ertuğrul Demir (Turkey) about a natural language processing task using BERT finetuning followed by simple linear regression on top of sentence embeddings generated by transformers.
Advances in machine learning and deep learning research are changing our technology, and many ML GDEs are interested and contributing.
Karim Beguir (UK) co-authored a paper with the DeepMind team covering a novel compositional approach using Deep Reinforcement Learning to solve robotics manipulation tasks. The paper was accepted in the NeurIPS workshop.
Finally, Sayak Paul from India, together with Pin-Yu Chen, published a research paper, “Vision Transformers are Robust Learners,” covering the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
If you want to know more about the Google Experts community and their global open-source ML contributions, please check the GDE Program website, visit the GDE Directory and connect with GDEs on Twitter and LinkedIn. You can also meet them virtually on theML GDE’s YouTube Channel!
We’re always excited to share updates to our Coral platform for building edge ML applications. In this post, we have some interesting demos, interfaces, and tutorials to share, and we’ll start by pointing you to an important software update for the Coral Dev Board.
Important update for the Dev Board / SoM
If you have a Coral Dev Board or Coral SoM, please install our latest Mendel update as soon as possible to receive a critical fix to part of the SoC power configuration. To get it, just log onto your board and install the update as follows:
This will install a patch from NXP for the Dev Board / SoM’s SoC, without which it’s possible the SoC will overstress and the lifetime of the device could be reduced. If you recently flashed your board with the latest system image, you might already have this fix (we also updated the flashable image today), but it never hurts to fetch all updates, as shown above.
Note: This update does not apply to the Dev Board Mini.
Manufacturing demo
We recently published the Coral Manufacturing Demo, which demonstrates how to use a single Coral Edge TPU to simultaneously accomplish two common manufacturing use-cases: worker safety and visual inspection.
The demo is designed for two specific videos and tasks (worker keepout detection and apple quality grading) but it is designed to be easily customized with different inputs and tasks. The demo, written in C++, requires OpenGL and is primarily targeted at x86 systems which are prevalent in manufacturing gateways – although ARM Cortex-A systems, like the Coral Dev Board, are also supported.
Web Coral
We’ve been working hard to make ML acceleration with the Coral Edge TPU available for most popular systems. So we’re proud to announce support for WebUSB, allowing you to use the Coral USB Accelerator directly from Chrome. To get started, check out our WebCoral demo, which builds a webpage where you can select a model and run an inference accelerated by the Edge TPU.
New models repository
We recently released a new models repository that makes it easier to explore the various trained models available for the Coral platform, including image classification, object detection, semantic segmentation, pose estimation, and speech recognition. Each family page lists the various models, including details about training dataset, input size, latency, accuracy, model size, and other parameters, making it easier to select the best fit for the application at hand. Lastly, each family page includes links to training scripts and example code to help you get started. Or for an overview of all our models, you can see them all on one page.
Transfer learning tutorials
Even with our collection of pre-trained models, it can sometimes be tricky to create a task-specific model that’s compatible with our Edge TPU accelerator. To make this easier, we’ve released some new Google Colab tutorials that allow you to perform transfer learning for object detection, using MobileDet and EfficientDet-Lite models. You can find these and other Colabs in our GitHub Tutorials repo.
We are excited to share all that Coral has to offer as we continue to evolve our platform. Keep an eye out for more software and platform related news coming this summer. To discover more about our edge ML platform, please visit Coral.ai and share your feedback at [email protected].
Posted by HyeJung Lee and MJ You, Google ML Ecosystem Community Managers. Reviewed by Soonson Kwon, Developer Relations Program Manager.
Google Developers Experts is a community of passionate developers who love to share their knowledge with others. Many of them specialize in Machine Learning (ML). Despite many unexpected changes over the last months and reduced opportunities for various in person activities during the ongoing pandemic, their enthusiasm did not stop.
Here are some highlights of the ML GDE’s hard work during the Q1 2021 which contributed to the global ML ecosystem.
ML GDE YouTube channel
With the initiative and lead of US-based GDE Margaret Maynard-Reid, we launched the ML GDEs YouTube channel. It is a great way for GDEs to reach global audiences, collaborate as a community, create unique content and promote each other’s work. It will contain all kinds of ML related topics: talks on technical topics, tutorials, interviews with another (ML) GDE, a Googler or anyone in the ML community etc. Many videos have already been uploaded, including: ML GDE’s intro from all over the world, tips for TensorFlow & GCP Certification and how to use Google Cloud Platform etc. Subscribe to the channel now!!
TensorFlow Everywhere
17 ML GDEs presented at TensorFlow Everywhere (a global community-led event series for TensorFlow and Machine Learning enthusiasts and developers around the world) hosted by local TensorFlow user groups. You can watch the recorded sessions in the TensorFlow Everywhere playlist on the ML GDE Youtube channel. Most of the sessions cover new features in Tensorflow.
ML GDEs are also very active in mentoring community developers, students in the Google Developer Student Clubs and startups in the Google for Startups Accelerator program. Among many, GDE Arnaldo Gualberto (Brazil) conducted mentorship sessions for startups in the Google Fast Track program, discussing how to solve challanges using Machine Learning/Deep Learning with TensorFlow.
TensorFlow
Meanwhile in Europe, GDEs Alexia Audevart (based in France) and Luca Massaron (based in Italy) released “Machine Learning using TensorFlow Cookbook”. It provides simple and effective ideas to successfully use TensorFlow 2.x in computer vision, NLP and tabular data projects. Additionally, Luca published the second edition of the Machine Learning For Dummies book, first published in 2015. Her latest edition is enhanced with product updates and the principal is a larger share of pages devoted to discussion of Deep Learning and TensorFlow / Keras usage.
On top of her women-in-tech related activities, Ruqiya Bin Safi is also running a “Welcome to Deep Learning Course and Orientation” monthly workshop throughout 2021. The course aims to help participants gain foundational knowledge of deep learning algorithms and get practical experience in building neural networks in TensorFlow.
On the other side of the world, in Canada, GDE Tanmay Bakshi presented a talk “Machine Learning-powered Pipelines to Augment Human Specialists” during TensorFlow Everywhere NA. It covered the world of NLP through Deep Learning, how it’s historically been done, the Transformer revolution, and how using the TensorFlow & Keras to implement use cases ranging from small-scale name generation to large-scale Amazon review quality ranking.
Last but not least, GDE Gad Benram based in Israel wrote an article on “Seven Tips for Forecasting Cloud Costs”, where he explains how to build and deploy ML models for time series forecasting with Google Cloud Run. It is linked with his solution of building a cloud-spend control system that helps users more-easily analyze their cloud costs.
If you want to know more about the Google Experts community and all their global open-source ML contributions, visit the GDE Directory and connect with GDEs on Twitter and LinkedIn. You can also meet them virtually on the ML GDE’s YouTube Channel!
Posted by Kenny Sulaimon, Product Manager, ML Kit Chengji Yan, Suril Shah, Buck Bourdon, Software Engineers, ML Kit, Shiyu Hu, Technical Lead, ML Kit Dong Chen, Technical Lead, ML Kit
At the end of 2020, we introduced the Entity Extraction API to our ML Kit SDK, making it even easier to detect and perform actions on text within mobile apps. Since then, we’ve been hard at work updating our existing APIs with new functionality and also fine tuning Selfie Segmentation with the help of our partners in the ML Kit early access program.
Today we are excited to officially add Selfie Segmentation to the ML Kit lineup, introduce a few enhancements we’ve made to our popular Pose Detection API and announce that ML Kit has graduated to general availability!
General Availability
(ML Kit is now in General Availability)
We launched ML Kit back in 2018 in order to make it easy for developers on Android and iOS to use machine learning within their apps. Over the last two years we have rapidly expanded our set of APIs to help with both vision and natural language processing based use cases.
Thanks to the overwhelmingly positive response, developer feedback and a ton of adoption across both Android and iOS, we are ready to drop the beta label and officially announce the general availability of ML Kit’s APIs. All of our APIs (except Selfie Segmentation, Pose Detection, and Entity Extraction) are now in general availability!
Selfie Segmentation
(Example of ML Kit Selfie Segmentation)
With the increased usage of selfie cameras and webcams in today’s world, being able to quickly and easily add effects to camera experiences has become a necessity for many app developers.
ML Kit’s Selfie Segmentation API allows developers to easily separate the background from users within a scene and focus on what matters. Adding cool effects to selfies or inserting your users into interesting background environments has never been easier. The model works on live and still images, and both half and full body subjects.
Under The Hood
(Diagram of Selfie Segmentation API)
The Selfie Segmentation API takes an input image and produces an output mask. Each pixel of the mask is assigned a float number that has a range between [0.0, 1.0]. The closer the number is to 1.0, the higher the confidence that the pixel represents a person, and vice versa.
The API works with static images and live video use cases. During live video (stream_mode), the API will leverage output from previous frames to return smoother segmentation results.
We’ve also implemented a “RAW_SIZE_MASK” option to give developers more options for mask output. By default, the mask produced will be the same size as the input image. If the RAW_SIZE_MASK option is enabled, then the mask will be the size of the model output (256×256). This option makes it easier to apply customized rescaling logic or reduces latency if rescaling to the input image size is not needed for your use case.
Pose Detection Update
(Example of updated Pose Detection API; colors represent the Z value)
Last month, we updated our state-of-the-art Pose Detection API with a new model and new features. A quick summary of the enhancements is listed below:
More poses added The API now recognizes more poses, targeting fitness and yoga use cases, especially when a user is directly facing the camera.
50% size reduction The base and accurate pose models are now significantly smaller. This change does not impact the quality of the models.
Z Coordinate for depth analysis The API now outputs a depth coordinate Z to help determine whether parts of the user’s body are in front or behind the user’s hips.
Z Coordinate
The Z Coordinate is an experimental feature that is calculated for every point (excluding the face). The estimate is provided using synthetic data, obtained via the GHUMmodel (articulated 3D human shape model).
It is measured in “image pixels” like the X and Y coordinates. The Z axis is perpendicular to the camera and passes between a subject’s hips. The origin of the Z axis is approximately the center point between the hips (left/right and front/back relative to the camera). Negative Z values are towards the camera; positive values are away from it. The Z coordinate does not have an upper or lower bound.
For more information on the Pose Detection changes, please see our API documentation.
Pose Classification
After the release of Pose Detection, we’ve received quite a bit of requests from developers to help with classifying specific poses within their apps. To help tackle this problem, we partnered with the MediaPipe team to release a pose classification tutorial and Google Colab. In the classification tutorial, we demonstrate how to build and run a custom pose classifier within the ML Kit Android sample app and also demo a simple push-up and squat rep counter using the classifier.
(Example of Pose classification and repetition counting with MLKit Pose)
For a deep dive into building your own pose classifier with different camera angles, environment conditions, body shapes etc, please see the pose classification tutorial.
For more general classification tips, please see our Pose Classification Options page on the ML Kit website.
Beyond General Availability
It has been an exciting two years getting ML Kit to general availability and we couldn’t have gotten here without your help and feedback. As we continue to introduce new APIs such as Selfie Segmentation and Pose Detection, your feedback is more important than ever. Please continue to share your enhancement requests and questions with our development team or reach out through our community channels. Let’s build a smarter future together.