We can modify the /predict endpoint so that when a video comes in, it is converted to audio and text as well, as per the user's request (this would involve thinking about the front-end UI first, and what is the most intuitive way to present options to the investigator before tackling the back-end). This issue deals with the front-end.
In the back-end, this would be done by adding a clause to the prediction pipeline that is something like:
if file type is video and audio models are selected:
1. convert files to audio
2. run predictions on audio models.
Things to think about:
- For step 2, the same
create_new_prediction function can be called.
- Would recommend abstracting this clause to a helper function because the
create_new_prediction function for readability
- There is a
model_type parameter in the create_new_prediction function. Is this still necessary?
- The user could select both video and audio models, so video prediction should still be run in that case.