Upload an animal video and the automatic
foley machine returns matching sounds!
This project uses YOLOv5 to detect animals from a video without audio and then play appropriate sounds for the identified objects
when they appear in the video using the freesound.org API. To create an interactive audio-visual experience, sounds can be
adjusted with different knobs enabling effects like reverb, filtering and pitching.
The website is served by a node.js web server, which is also
responsible for temporarily storing the uploaded video on its hard
drive. The file can be uploaded via a dropzone. After receiving the
file the web server sends a POST-request to the Python server which
then runs the inference on a pre-trained machine learning model based
on YOLOv5 for object detection.
After processing and analyzing the individual frames, the Python
server sends back a JSON object containing all the detected
animals and the respective framenumber to the node.js web server.
With this information, the web server makes multiple requests to the
freesound.org API. The response consists of list related to the
searched keywords and links to the specific audio files, which are
then integrated into the website to create all required audio
elements that stream the sound directly from the freesound servers.
The audio data can then be manipulated during playback using the
controls, for which we utilize the capabilities of the Web Audio API.