Tag Archives: Web

Taking the leap to pursue a passion in Machine Learning with Leigh Johnson #IamaGDE

Welcome to #IamaGDE - a series of spotlights presenting Google Developer Experts (GDEs) from across the globe. Discover their stories, passions, and highlights of their community work.

Leigh Johnson turned her childhood love of Geocities and Neopets into a web development career, and then trained her focus on Machine Learning. Now, she’s a staff software engineer at Slack, a Google Developer Expert in Web and Machine Learning, and founder of Print Nanny, an automated failure detection system and monitoring system for 3D printers.

Meet Leigh Johnson, Google Developer Expert in Web and Machine Learning.

Image shows GDE Leigh Johnson, smiling at the camera and holding a circuit board of some kind

GDE Leigh Johnson

The early days

Leigh Johnson grew up in the Bronx, NY, and got an early start in web development when she became captivated by Geocities and Neopets in elementary school.

“I loved the power of being able to put something online that other people could see, using just HTML and CSS,” she says.

She started college and studied Latin, but it wasn’t the right fit for her, so she dropped out and launched her own business building WordPress sites for small businesses, like local restaurants putting their menus online for the first time or taking orders through a form.

“I was 18, running around a data center trying to rack servers and teaching myself DNS to serve my customer base, which was small business owners,” she says. “I ran my business for five years, until companies like Squarespace and Wix started to edge me out of the market a little bit.”

Leigh went on to chase her dream of working in the video game industry, where she got exposed to low-level C++ programming, graphics engines, and basic statistics, which led her to machine learning.

Image shows GDE Leigh Johnson, smiling at the camera and standing in front of a presentation screen at SFPython

Machine learning

At the video game studio where she worked, Leigh got into Bayesian inference.

“It’s old school machine learning, where you try to predict things based on the probability of previous events,” she explains. “You look at past events and try to predict the probability of future events, and I did this for marketing offers—what’s the likelihood you’d purchase a yellow hat to match your yellow pants?”

In the first month or two of trying email offers, the company made more small dollar sales than they typically made in a year.

“I realized, this is powerful dark magic; I must learn more,” Leigh says.

She continued working for tech startups like Ansible, which was acquired by Red Hat, and Dave.com, doing heavy data lifting.

“Everything about machine learning is powered by being able to manipulate and get data from point A to point B,” she says.

Today, Leigh works on machine learning and infrastructure at Slack and is a Google Developer Expert in machine learning. She also has a side project she runs: Print Nanny.

Image shows circuit board with fan next to image of its schematics

Print Nanny: Monitoring 3D printers

When Leigh got into 3D printing as a hobby during the COVID-19 shutdown, she discovered that 3D printers can be unreliable and lack sophisticated monitoring programs.

“When I assembled my 3D printer myself, I realized that over time, the calibration is going to change,” she says. “It's a very finicky process, and it didn't necessarily guarantee the quality of these traditional large batch manufacturing processes.”

She installed a nanny cam to watch her 3D printer and researched solutions, knowing from her machine learning experience that because 3D printers build a print up layer by layer, there’s no one point of failure—failure happens layer by layer, over time. So she wrote that algorithm.

“I saw an opportunity to take some of the traditional machine intelligence strategies used by large manufacturers to ensure there’s a certain consistency and quality to the things they produce, and I made Print Nanny,” she says. “It uses a Raspberry Pi, a credit card-sized computer that costs $30. You can stick a computer vision model on one and do offline inference, which are basically predictions about what the camera sees. You can make predictions about whether a print will fail, help score calculations, and attenuate the print.”

Leigh used Google Cloud Platform AutoML Vision, Google Cloud Platform IoT Core, TensorFlow Model Garden, and TensorFlow.js to build Print Nanny. Using GCP credits provided by Google, she improved and developed Print Nanny with TensorFlow and Google Cloud Platform products.

When Print Nanny detects that a print is failing, the user receives a notification and can remotely pause or stop the printer.

“Print Nanny is an automated failure detection system and monitoring system for 3D printers, which uses computer vision to detect defects and alert you to potential quality or safety hazards,” Leigh says.

Leigh has hired team members who are interested in machine learning to help her with the technical aspects of Print Nanny. Print Nanny currently has 2100 users signed up for a closed beta, with 200 people actively using the beta version. Of that group, 80% are hobbyists and 20% are small business owners. Print Nanny is 100% open source.

Image shows a collection of 3D-Printed parts

Becoming a GDE

Leigh got involved with the GDE program about four years ago, when she began putting machine learning models on Raspberry Pis and building robots. She began writing tutorials about what she was learning.

“The things I was doing were quite hard: TensorFlow Light, the mobile device of TensorFlow—there was a missing documentation opportunity there, and my target platform, the Raspberry Pi, is a hobbyist platform, so there was a little bit of missing documentation there,” Leigh says. “For a hobbyist who wanted to pick up a Raspberry Pi and do a computer vision project for the first time, there was a missing tutorial there, so I started writing about what I was doing, and the response was tremendous.”

Leigh’s work caught the eye of Google staff research engineer Pete Warden, the technical Lead of the TensorFlow Mobile team, who encouraged her, and she leveraged the GDE program to connect to Google experts on TensorFlow and machine learning. Google provides a machine learning course for developers and supports TensorFlow, in addition to its many AI products.

“I had no knowledge of graph programming or what it meant to adapt the low-level kernel operations that would run on a Raspberry Pi, or compiling software, and I learned all that through the GDE program,” Leigh says. “This program changed my life.”

Image shows 1 man and three women smiling at the camera. Leigh is taking the photo selfie-style

Leigh’s favorite part of the GDE program is going to events like TensorFlow World, which she last attended in 2019, and GDE summits. She hadn’t travelled internationally until she was in her 20’s, so the GDE program has connected her to the international community.

“It’s been life-changing,” she says. “I never would have had access to that many perspectives. It’s changed the way I view the world, my life, and myself. It’s very powerful.”

Leigh smiles at the camera in front of a sign that reads TensorFlow for mobile and edge devices

Leigh’s advice to future developers

Leigh recommends that people find the best environment for themselves and adopt a growth mindset.

“The best advice that I can give is to find your motivation and find the environment where you can be successful,” she says. “Surround yourself with people who are lifelong learners. When you cultivate an environment of learning around you, it's this positive, self-perpetuating process.”

#IamaGDE: Diana Rodríguez Manrique

#IamaGDE series presents: Google Maps Platform

Welcome to #IamaGDE - a series of spotlights presenting Google Developer Experts (GDEs) from across the globe. Discover their stories, passions, and highlights of their community work.

Today, meet Diana Rodríguez— Maps, Web, Cloud, and Firebase GDE.

Google Developer Expert, Diana Rodríguez

Diana Rodríguez’s 20 years in the tech industry have been focused on community and making accessible content. She is a full-stack developer with experience in backend infrastructure, automation, and a passion for Python. A self-taught programmer, Diana also learned programming skills from attending meetups and being an active member of her local developer community. She is the first female Venezuelan GDE.

“I put a lot of myself into public speaking, workshops, and articles,” says Diana. “I want to make everything I do as open and transparent as possible.”

Diana’s first foray into working with Google Maps was in 2016, when she built an app that helped record institutional violence against women in Argentina. As a freelance developer, she uses the Google Maps Platform for her delivery services clients.

“I have plenty of clients who need not only location tracking for their delivery fleet, but also to provide specific routes,” says Diana.

“The level of interaction that’s been added to Maps has made it easier for me as a developer to work with direct clients,” says Diana, who uses the Plus Codes feature to help delivery drivers find precise locations on a map. “I’m a heavy user of plus codes. They give people in remote areas and underserved communities the chance to have location services, including emergency and delivery services.”

Getting involved in the developer community

Diana first became involved in the developer community 20 years ago, in 1999, beginning with a university user group. She attended her first Devfest in Bangkok in 2010 and has worked in multiple developer communities since then. She was a co-organizer of GDG Triangle and is now an organizer of GDG Durham in North Carolina. In 2020, she gave virtual talks to global audiences.

“It’s been great to get to know other communities and reach the far corners of the Earth,” she says.

Image of Diana Rodriguez

Favorite Google Maps Platform features and current projects

Diana is excited about the Places API and the Maps team’s continuous improvements. She says the Maps team keeps the GDEs up to date on all the latest news and takes their feedback very seriously.

“Shoutout to Claire, Alex, and Angela, who are in direct contact with us, and everyone who works with them; they have been amazing,” she says. “I look forward to showcasing more upcoming changes. What comes next will be mind-blowing, immersing people into location in a different way that is more interactive.”

Of the new features released in June 2020, which include Cloud-based maps styling and Local Context, Diana says, “Having the freedom to customize the experience a lot more is amazing.”

As a Maps GDE in 2021, Diana plans to continue working on open source tech projects that benefit the greater good, like her recently completed app for Diabetes users, ScoutX, which notifies emergency contacts when a Diabetic person’s blood glucose values are too high or too low, in case they need immediate help.

She envisions an app that expands connectivity and geolocation tracking for hikers in remote areas, using LoRaWan technologies that can withstand harsh temperatures and conditions.

“Imagine you go to Yellowstone and get lost, with no GPS signal or phone signal, but there’s a tracking device connected to a LoRaWan network sending your location,” Diana says. “It’s much easier for rescue services to find you. Rack Wireless is working on providing satellite access, as well, and having precise latitude and longitude makes mapping simple.”

In the future, Diana sees herself managing a team that makes groundbreaking discoveries and puts technologies to use to help other people.

Follow Diana on Twitter at @cotufa82

Check out Diana’s projects on GitHub

For more information on Google Maps Platform, visit our website.

For more information on Google Developer Experts, visit our website.

#IamaGDE: Diana Rodríguez Manrique

#IamaGDE series presents: Google Maps Platform

Welcome to #IamaGDE - a series of spotlights presenting Google Developer Experts (GDEs) from across the globe. Discover their stories, passions, and highlights of their community work.

Today, meet Diana Rodríguez— Maps, Web, Cloud, and Firebase GDE.

Google Developer Expert, Diana Rodríguez

Diana Rodríguez’s 20 years in the tech industry have been focused on community and making accessible content. She is a full-stack developer with experience in backend infrastructure, automation, and a passion for Python. A self-taught programmer, Diana also learned programming skills from attending meetups and being an active member of her local developer community. She is the first female Venezuelan GDE.

“I put a lot of myself into public speaking, workshops, and articles,” says Diana. “I want to make everything I do as open and transparent as possible.”

Diana’s first foray into working with Google Maps was in 2016, when she built an app that helped record institutional violence against women in Argentina. As a freelance developer, she uses the Google Maps Platform for her delivery services clients.

“I have plenty of clients who need not only location tracking for their delivery fleet, but also to provide specific routes,” says Diana.

“The level of interaction that’s been added to Maps has made it easier for me as a developer to work with direct clients,” says Diana, who uses the Plus Codes feature to help delivery drivers find precise locations on a map. “I’m a heavy user of plus codes. They give people in remote areas and underserved communities the chance to have location services, including emergency and delivery services.”

Getting involved in the developer community

Diana first became involved in the developer community 20 years ago, in 1999, beginning with a university user group. She attended her first Devfest in Bangkok in 2010 and has worked in multiple developer communities since then. She was a co-organizer of GDG Triangle and is now an organizer of GDG Durham in North Carolina. In 2020, she gave virtual talks to global audiences.

“It’s been great to get to know other communities and reach the far corners of the Earth,” she says.

Image of Diana Rodriguez

Favorite Google Maps Platform features and current projects

Diana is excited about the Places API and the Maps team’s continuous improvements. She says the Maps team keeps the GDEs up to date on all the latest news and takes their feedback very seriously.

“Shoutout to Claire, Alex, and Angela, who are in direct contact with us, and everyone who works with them; they have been amazing,” she says. “I look forward to showcasing more upcoming changes. What comes next will be mind-blowing, immersing people into location in a different way that is more interactive.”

Of the new features released in June 2020, which include Cloud-based maps styling and Local Context, Diana says, “Having the freedom to customize the experience a lot more is amazing.”

As a Maps GDE in 2021, Diana plans to continue working on open source tech projects that benefit the greater good, like her recently completed app for Diabetes users, ScoutX, which notifies emergency contacts when a Diabetic person’s blood glucose values are too high or too low, in case they need immediate help.

She envisions an app that expands connectivity and geolocation tracking for hikers in remote areas, using LoRaWan technologies that can withstand harsh temperatures and conditions.

“Imagine you go to Yellowstone and get lost, with no GPS signal or phone signal, but there’s a tracking device connected to a LoRaWan network sending your location,” Diana says. “It’s much easier for rescue services to find you. Rack Wireless is working on providing satellite access, as well, and having precise latitude and longitude makes mapping simple.”

In the future, Diana sees herself managing a team that makes groundbreaking discoveries and puts technologies to use to help other people.

Follow Diana on Twitter at @cotufa82

Check out Diana’s projects on GitHub

For more information on Google Maps Platform, visit our website.

For more information on Google Developer Experts, visit our website.

#IamaGDE: Josue Gutierrez

Posted by Alicja Heisig

#IamaGDE series presents: Google Maps

The Google Developers Experts program is a global network of highly experienced technology experts, influencers, and thought leaders who actively support developers, companies, and tech communities by speaking at events and publishing content.

Meet Josue Gutierrez — Maps, Web, Identity and Angular Google Developer Expert.

Josue currently works at the German company Boehringer Ingelheim and lives near Frankfurt. Before moving to Germany, Josue was working as a software engineer in Mexico, and before that, he spent almost a year in San Francisco as a senior front-end developer at Sutter Health.

Image of Josue Gutierrez

Josue Gutierrez

Josue studied computer science and engineering as an undergraduate and learned algorithms and programming. His first language was C++, and he learned C and Python, but was drawn to web technologies.

“When I saw a web browser for the first time, it stuck with me,” he says, “It was changing in real time as you’re developing. That feeling is really cool. That’s why I went into frontend development.”

Josue has worked on multiple ecommerce projects focused on improving customers’ trade experience. He sees his role as creating something from scratch to help people improve lives.

“These opportunities we have as developers are great — to travel, work for many verticals, and learn many businesses,” he says. “In my previous job, I developed tech-oriented trade tools for research companies, to manipulate strings or formulas. I was on the team involved in writing these kinds of tools, so it was more about the trade experience for doctors.”

Getting involved in the developer community

Josue’s first trip outside Mexico, to San Francisco, exposed him to the many developer communities in the area, and he appreciated the supportive communities of people trying to learn together. Several of the people he met suggested he start his own meetup in Mexico City, to get more involved in Google technologies, so he launched an Angular community there. As he hunted for speakers to come to his Angular meetup, Josue found himself giving talks, too.

Then, the GDG Mexico leader invited Josue to give talks on Google for startups.

“That helped me get involved in the ecosystem,” Josue says. “I met a lot of people, and now many of them are good friends. It’s really exciting because you get connected with people with the same interests as you, and you all learn together.”

“I’m really happy to be part of the Google Maps ecosystem,” Josue says. “It’s super connected, with kind people, and now I know more colleagues in my area, who work for different companies and have different challenges. Seeing how they solve them is a good part of being connected to the product. I try to share my knowledge with other people and exchange points of view.”

Josue says 2020 provided interesting opportunities.

“This year was weird, but we also discovered more tools that are evolving with us, more functionalities in Hangouts and Meetup,” Josue says. “It’s interesting how people are curious to get connected. If I speak from Germany, I get comments from countries like Bolivia and Argentina. We are disconnected but increasing the number of people we engage with.”

He notes that the one missing piece is the face-to-face, spontaneous interactions of in-person workshops, but that there are still positives to video workshops.

“I think as communities, we are always trying to get information to our members, and having videos is also cool for posterity,” he says.

He is starting a Maps developer community in Germany.

“I have colleagues interested in trying to get a community here with a solid foundation,” he says. “We hope we can engage people to get connected in the same place, if all goes well.”

Favorite Maps features and current projects

As a frontend developer, Josue regards Google Maps Platform as an indispensable tool for brands, ecommerce companies, and even trucking companies.

“Once you start learning how to plant coordinates inside a map, how to convert information and utilize it inside a map, it’s easy to implement,” he says.

In 2021, Josue is working on some experiments with Maps, trying to make more real-time actualization, using currently available tools.

“Many of the projects I’ve been working on aren’t connected with ecommerce,” he says. “Many customers want to see products inside a map, like trucking products. I’ve been working in directories, where you can see the places related to categories — like food in Mexico. You can use Google Maps functionalities and extend the diversification of maps and map whatever you want.”

“Submission ID is really cool,” he adds. “You can do it reading the documentation, a key part of the product, with examples, references, and a live demo in the browser.”

Future plans

Josue says his goal going forward is to be as successful as he can at his current role.

“Also, sharing is super important,” he says. “My company encourages developer communities. It’s important to work in a place that matches your interests.”

Image of Josue Gutierrez

Follow Josue on Twitter at @eusoj |Check out Josue’s projects on GitHub.

For more information on Google Maps Platform, visit our website or learn more about our GDE program.

MediaPipe KNIFT: Template-based Feature Matching

Posted by Zhicheng Wang and Genzhi Ye, MediaPipe team

Image Feature Correspondence with KNIFT

In many computer vision applications, a crucial building block is to establish reliable correspondences between different views of an object or scene, forming the foundation for approaches like template matching, image retrieval and structure from motion. Correspondences are usually computed by extracting distinctive view-invariant features such as SIFT or ORB from images. The ability to reliably establish such correspondences enables applications like image stitching to create panoramas or template matching for object recognition in videos (see Figure 1).

Today, we are announcing KNIFT (Keypoint Neural Invariant Feature Transform), a general purpose local feature descriptor similar to SIFT or ORB. Likewise, KNIFT is also a compact vector representation of local image patches that is invariant to uniform scaling, orientation, and illumination changes. However unlike SIFT or ORB, which were engineered with heuristics, KNIFT is an embedding learned directly from a large number of corresponding local patches extracted from nearby video frames. This data driven approach implicitly encodes complex, real-world spatial transformations and lighting changes in the embedding. As a result, the KNIFT feature descriptor appears to be more robust, not only to affine distortions, but to some degree of perspective distortions as well. We are releasing an implementation of KNIFT in MediaPipe and a KNIFT-based template matching demo in the next section to get you started.

Figure 1: Matching a real Stop Sign with a Stop Sign template using KNIFT.

Training Method

In Machine Learning, loosely speaking, training an embedding means finding a mapping that can translate a high dimensional vector, such as an image patch, to a relatively lower dimensional vector, such as a feature descriptor. Ideally, this mapping should have the following property: image patches around a real-world point should have the same or very similar descriptors across different views or illumination changes. We have found real world videos a good source of such corresponding image patches as training data (See Figure 3 and 4) and we use the well-established Triplet Loss (see Figure 2) to train such an embedding. Each triplet consists of an anchor (denoted by a), a positive (p), and a negative (n) feature vector extracted from the corresponding image patches, and d() denotes the Euclidean distance in the feature space.

Figure 2: Triplet Loss Function.

Figure 2: Triplet Loss Function.

Training Data

The training triplets are extracted from all ~1500 video clips in the publicly available YouTube UGC Dataset. We first use an existing heuristically-engineered local feature detector to detect keypoints and compute the affine transform between two frames with a high accuracy (see Figure 4). Then we use this correspondence to find keypoint pairs and extract the patches around these keypoints. Note that the newly identified keypoints may include those that were detected but rejected by geometric verification in the first step. For each pair of matched patches, we randomly apply some form of data augmentation (e.g. random rotation or brightness adjustment) to construct the anchor-positive pair. Finally, we randomly pick an arbitrary patch from another video as the negative to finish the construction of this triplet (see Figure 5).

Figure 3: An example video clip from which we extract training triplets.

Figure 4: Finding frame correspondence using existing local features.

Figure 5: (Top to bottom) Anchor, positive and negative patches.

Hard-negative Triplet Mining

To improve model quality, we use the same hard-negative triplet mining method used by FaceNet training. We first train a base model with randomly selected triplets. Then we implement a pipeline that uses the base model to find semi-hard-negative samples (d(a,p) < d(a,n) < d(a,p)+margin) for each anchor-positive pair (Figure 6). After mixing the randomly selected triplets and hard-negative triplets, we re-train the model with this improved data.

Figure 6: (Top to bottom) Anchor, positive and semi-hard negative patches.

Model Architecture

From model architecture exploration, we have found that a relatively small architecture is sufficient to achieve decent quality, so we use a lightweight version of the Inception architecture as the KNIFT model backbone. The resulting KNIFT descriptor is a 40-dimensional float vector. For more model details, please refer to the KNIFT model card.

Benchmark

We benchmark the KNIFT model inference speed on various devices (computing 200 features) and list them in Table 1.

Table 1: KNIFT performance benchmark.

Table 1: KNIFT performance benchmark.

Quality-wise, we compare the average number of keypoints matched by KNIFT and by ORB (OpenCV implementation) respectively on an in-house benchmark (Table 2). There are many publicly available image matching benchmarks, e.g. 2020 Image Matching Benchmark, but most of them focus on matching landmarks across large perspective changes in relatively high resolution images, and the tasks often require computing thousands of keypoints. In contrast, since we designed KNIFT for matching objects in large scale (i.e. billions of images) online image retrieval tasks, we devised our benchmark to focus on low cost and high precision driven use cases, i.e. 100-200 keypoints computed per image and only ~10 matching keypoints needed for reliably determining a match. In addition, to illustrate the fine-grained performance characteristics of a feature descriptor, we divide and categorize the benchmark set by object types (e.g. 2D planar surface) and image pair relations (e.g. large size difference). In table 2, we compare the average number of keypoints matched by KNIFT and by ORB respectively in each category, based on the same 200 keypoint locations detected in each image by the oFast detector that comes with the ORB implementation in OpenCV.

Table 2: KNIFT vs ORB average number of matched keypoints.

From Table 2, we can see that KNIFT consistently matches more keypoints than ORB by a large margin in every category. Here we acknowledge the fact that KNIFT (40-d float) is considerably larger than ORB (32-d char) and this can have an effort on matching quality. Nevertheless, most local feature benchmarks do not take descriptor size into account so we will follow the convention here.

To make it easy for developers to try KNIFT in MediaPIpe, we have built a local-feature-based template matching solution (see implementation details using MediaPipe in the next section). As a side effect, we can demonstrate the matching quality between KNIFT and ORB visually in side-by-side comparisons like Figure 7 and 9.

Figure 7: Example of “matching 2D planar surface”. (Left) KNIFT 183/240, (Right) ORB 133/240.

In Figure 7, we choose a typical U.S. Stop Sign image from Google Image Search as the template and attempt to match it with the Stop Sign in this video. This example falls into the “matching 2D planar surface” category in Table 2. Using the same 200 keypoint locations detected by oFast and the same RANSAC setting, we show that KNIFT is successful at matching the Stop Sign in 183 frames out of a total of 240 frames. In comparison, ORB matches 133 frames.

Figure 8: Example of “matching 3D untextured object”. Two template images from different views.

Figure 9: Example of “matching 3D untextured object”. (Left) KNIFT 89/150, (Right) ORB 37/150.

Figure 9 shows another matching performance comparison on an example from the “matching 3D untextured object” category in Table 2. Since this example involves large perspective changes of untextured surfaces, which is known to be challenging for local feature descriptors, we use template images from two different views (shown in Figure 8) to improve the matching performance. Again, using the same keypoint locations and RANSAC setting, we show that KNIFT is successful at matching 89 frames out of a total of 150 frames while ORB matches 37 frames.

KNIFT-based Template Matching in MediaPipe

We are releasing the aforementioned template matching solution based on KNIFT in MediaPipe, which is capable of identifying pre-defined image templates and precisely localizing recognized templates on the camera image. There are 3 major components in the template-matching MediaPipe graph shown below:

  • FeatureDetectorCalculator: a calculator that consumes image frames and performs OpenCV oFast detector on the input image and outputs keypoint locations. Moreover, this calculator is also responsible for cropping patches around each keypoint with rotation and scale info and stacking them into a vector for the downstream calculator to process.
  • TfLiteInferenceCalculator with KNIFT model: a calculator that loads the KNIFT tflite model and performs model inference. The input tensor shape is (200, 32, 32, 1), indicating 200 32x32 local patches. The output tensor shape is (200, 40), indicating 200 40-dimensional feature descriptors. By default, the calculator runs the TFLite XNNPACK delegate, but users have the option to select the regular CPU delegate to run at a reduced speed.
  • BoxDetectorCalculator: a calculator that takes pre-computed keypoint locations and KNIFT descriptors and performs feature matching between the current frame and multiple template images. The output of this calculator is a list of TimedBoxProto, which contains the unique id and location of each box as a quadrilateral on the image. Aside from the classic homography RANSAC algorithm, we also apply a perspective transform verification step to ensure that the output quadrilateral does not result in too much skew or a weird shape.

Figure 10: MediaPipe graph of the demo

Demo

In this demo, we chose three different denominations ($1, $5, $20) of U.S. dollar bills as templates and attempted to match them to various real world dollar bills in videos. We resized each input frame to 640x480 pixels, ran the oFast detector to detect 200 keypoints, and used KNIFT to extract feature descriptors from each 32x32 local image patch surrounding these keypoints. We then performed template matching between these video frames and the KNIFT features extracted from the dollar bill templates. This demo runs at 20 FPS on a Pixel 2 Phone CPU with XNNPACK.

Figure 11: Matching different U.S. dollar bills using KNIFT.

Build Your Own Templates

We have provided a set of built-in planar templates in our demo. To make it easy for users to try their own templates, we also provide a tool to build such an index with user generated templates. index_building.pbtxt is a MediaPipe graph that accepts as its input a directory path containing a set of template images. Users can use this graph to compute KNIFT descriptors for all template images (which will be stored in a single file) by 1) replacing the index_proto_filename field in the main graph and the BUILD file and 2) rebuilding the APK file. For step-by-step instructions on how we created the dollar bill demo shown above, please refer to this documentation.

Acknowledgements

We would like to thank Jiuqiang Tang, Chuo-Ling Chang, Dan Gnanapragasam‎, Howard Zhou, Jianing Wei and Ming Guang Yong for contributing to this blog post.