zamba predict job pipeline
chunk input
Zamba does not scale out. We can, however, reduce the turnaround time by chunking the input videos and submitting one job per chunk.
bash chunk-input.sh \
/data/GROUP/videos \
/data/GROUP/videos-chunked-10 \
10
the pipeline
The zamba predict pipeline consists of two jobs:
- the array job handling the chunks
- a follow-up job aggregating the individual outputs of the array job tasks
The following command will submit the entire pipeline:
bash pipeline.sh /data/GROUP/videos-chunked-10
Outputs are:
- logs will be in
/work/$USER/zamba/logs
- chunked CSVs will be in
/work/$USER/zamba/csvs-DATE
- aggregated CSV will be in
/work/$USER/zamba/output-DATE.csv