The Best Way to Use FFmpeg
FFmpeg is one of the most powerful tools out there, but it can be daunting to use — until now.
Introduction
As I was working on a previous fun little demo of mine that used ffmpeg
, I was reminded as to why, despite how powerful of a tool it is, I always find myself wincing at the thought of having to use it. It almost seems to be a universally shared experience among developers — once in a blue moon you find yourself needing to convert a video to a GIF, or downsample the resolution of an image, or any other variety of tasks that ffmpeg
handle. You already have it installed from the last time you needed, but yet again, you've unlearned everything about it that you learned last time. So inevitably you start Googling again and see those purple links serving as a faint reminder of all you've forgotten.
You don't have to just take my word for it either. In a recent video, YouTuber ThePrimeagen made similar comments regarding use of ffmpeg
. Now, with the advent of AI and LLM tools, it has become much easier to use command-line tools, and ffmpeg
is certainly one of the biggest beneficiaries of this. Still though, the experience remains far from perfect. Now your new ffmpeg
workflow might look like:
It seemed to me that there should be a more integrated solution. iTerm2
's new LLM integration features (show off in the above linked video) are a step in the right direction, but there's still a certain amount of feedback provided by ffmpeg
in the case of failures which shouldn't have to be manually fed back to the LLM and iterated manually.
So, let's go ahead and fix that! I'll show to make an interactive multimedia editor powered by AI and ffmpeg
, that'll handle those aforementioned concerns for you.
You can give this a star on GitHub here: https://github.com/acalejos/CinEx
Here's a demo of the final product in case you're curious:
Install Dependencies
Let's start by discussing the dependencies we'll use here. There's two main dependencies, and two less important dependencies. I'll go ahead and describe each in detail:
Kino
- This is the library that provides the capacity to make interactive experiences inLivebook
, which is the platform I'll be using. You can read more aboutLivebook
here.Instructor
- Coerces responses from LLMs into JSON where we can also provide a schema and a set of validation functions that the responses must conform to. This makes it easy to use responses from LLMs within a data pipeline.- You will need to provide a configuration to tell
Instructor
which LLM you are using by supplying anadapter
. For this, I will be using theOpenAI
adapter and using an environment variable to store my API key. Do note that although in the code it refers to the environment variable asLB_OPENAI_TOKEN
, Livebook itself will actually prependLB_
to your created tokens, so when you make a token in the left sidebar you would only need to call itOPENAI_TOKEN
.
- You will need to provide a configuration to tell
erlexec
- Provides a more powerful way (over the standard library'sSystem
module) to run executables from Elixir (I should not it is actually an Erlang library, but you can call Erlang directly from Elixir). This is how we will capturestdout
andstderr
separately (whereasSystem.cmd
at best will mergestderr
intostdout
)Exterval
- Supports writing real-valued intervals using a~i
sigil. These intervals implement theEnumerable
protocol, and we will use them for a validation withInstructor
. You can easily drop this library altogether, but I wrote it and know that it's just a single file so am comfortable leaving it in.
Mix.install(
[
{:kino, "~> 0.12.3"},
{:instructor, "~> 0.0.5"},
{:erlexec, "~> 2.0"},
{:exterval, "~> 0.2.0"}
],
config: [
instructor: [
adapter: Instructor.Adapters.OpenAI,
openai: [api_key: System.fetch_env!("LB_OPENAI_TOKEN")]
]
]
)
Upload Struct
First, we'll start by making the Upload
module which is in charge of operations related to the uploaded / generated media.
This has helper functions and guards to determine allowed file types, and includes the function to turn an Upload
into a renderable Kino
.
defmodule Upload do
defstruct [:filename, :path]
@video_types [:mp4, :ogg, :avi, :wmv, :mov]
@audio_types [:wav, :mp3, :mpeg]
@image_types [:jpeg, :jpg, :png, :gif, :svg, :pixel]
defguard is_audio(ext) when ext in @audio_types
defguard is_video(ext) when ext in @video_types
defguard is_image(ext) when ext in @image_types
defguard is_valid_upload(ext) when is_audio(ext) or is_video(ext) or is_image(ext)
def accepted_types, do: @audio_types ++ @video_types ++ @image_types
defp to_existing_atom(str) do
try do
{:ok, String.to_existing_atom(str)}
rescue
_ in ArgumentError ->
{:error, "#{inspect(str)} is not an existing atom"}
_e ->
{:error, "Unknown Error ocurred in `String.to_existing_atom/1`"}
end
end
def ext_type(filename) do
with <<"."::utf8, rest::binary>> <- Path.extname(filename),
{:ok, ext} <- to_existing_atom(rest) do
ext
end
end
def to_kino(upload = %__MODULE__{path: path}) do
content = File.read!(upload.path)
case ext_type(path) do
ext when is_audio(ext) ->
Kino.Audio.new(content, ext)
ext when is_video(ext) ->
Kino.Video.new(content, ext)
ext when is_image(ext) ->
Kino.Image.new(content, ext)
end
end
def new(filename, path) do
%__MODULE__{filename: filename, path: path}
end
def generate_temp_filename(extension \\ "mp4") do
random_string = :crypto.strong_rand_bytes(8) |> Base.encode16()
temp_dir = System.tmp_dir!()
Path.join(temp_dir, "temp_#{random_string}.#{extension}")
end
end
Setup State Management
Next, we'll set up some simple state management Agents.
We will make agents to track the form state for the UI and the history of videos so we can undo, reset, and track previous prompts.
We need to track input values using the FormState
Agent since we are not using a Kino.Control.Form
, which means using Kino.Input.read/1
will not work for our use case of repeated reads for changing input states. So instead we just listen for change events and store the values as state.
The EditHistory
Agent is just a simple queue to track history which we essentially use more as a stack. We store 2-Tuples of {upload::%Upload{},prompt::String.t()}
, but currently only the uploads are actually used downstream. The original, unmodified media is the first element in the queue, and its prompt is nil
as there was no prompt used to generate it.
defmodule FormState do
use Agent
def start_link(_init) do
Agent.start_link(fn -> %{prompt: "", retries: 2, debug: false, explain_outputs: true} end,
name: __MODULE__
)
end
def update(key, value) do
Agent.update(__MODULE__, fn state -> Map.put(state, key, value) end)
end
def get(key) do
Agent.get(__MODULE__, fn state -> Map.get(state, key) end)
end
end
defmodule EditHistory do
use Agent
def start_link(_init) do
Agent.start_link(fn -> :queue.new() end, name: __MODULE__)
end
def push(%Upload{} = upload, prompt \\ nil) do
Agent.update(__MODULE__, fn history ->
:queue.snoc(history, {upload, prompt})
end)
end
def undo_edit do
Agent.get_and_update(__MODULE__, fn history ->
popped = :queue.liat(history)
{:queue.last(popped), popped}
end)
end
def current do
Agent.get(__MODULE__, fn history ->
:queue.last(history)
end)
end
def original do
Agent.get(__MODULE__, fn history ->
:queue.head(history)
end)
end
def previous_edit do
Agent.get(__MODULE__, fn history ->
popped = :queue.liat(history)
unless :queue.is_empty(popped) do
:queue.last(popped)
else
nil
end
end)
end
def reset do
Agent.get_and_update(__MODULE__, fn history ->
original = :queue.head(history)
{original, :queue.from_list([original])}
end)
end
end
Now we add the agents to our supervision tree using Kino.start_child!/1
so that they are supervised in a way that lets their state be controlled by the evaluation state of the notebook. Kino
's supervision tree is special in that way, since it's meant to work within Livebook.
Enum.each([EditHistory, FormState], &Kino.start_child!/1)
Setup Boilerplate
This module is just in charge of storing templates that we wil use to render logs (with levels), or the program output (from stdout
and stderr
). This uses EEx
, which is a templating library that is part of Elixir's standard library.
Every cell in a Livebook
notebook is rendered within its own iframe
, so you really have a ton of flexibility of what you can output.
defmodule Boilerplate do
def placeholder,
do: """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Video Preview Placeholder with Spinner</title>
<style>
.video-preview-placeholder {
width: 100%;
max-width: 640px;
height: 0;
padding-bottom: 56.25%; /* 16:9 aspect ratio */
border: 2px dashed #ccc;
display: flex;
align-items: center;
justify-content: center;
background-color: #f9f9f9;
color: #666;
font-size: 20px;
text-align: center;
position: relative;
box-sizing: border-box;
margin: auto;
}
.spinner-container {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
}
.spinner {
border: 4px solid #f3f3f3;
border-top: 4px solid #3498db;
border-radius: 50%;
width: 40px;
height: 40px;
animation: spin 2s linear infinite;
margin-bottom: 10px;
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
.message {
font-size: 16px;
color: #666;
}
</style>
</head>
<body>
<div class="video-preview-placeholder">
<div class="spinner-container">
<%= if show_spinner do %>
<div class="spinner"></div>
<% end %>
<div class="message"><%= message %></div>
</div>
</div>
</body>
</html>
"""
def log_template,
do: """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Log Level Message Box</title>
<style>
.message-box {
width: 100%;
border: 2px solid;
padding: 20px;
box-sizing: border-box;
margin: 20px 0;
border-radius: 5px;
font-size: 18px;
text-align: left;
}
.message-box.error {
border-color: #f44336;
background-color: #fdecea;
color: #f44336;
}
.message-box.success {
border-color: #4caf50;
background-color: #e8f5e9;
color: #4caf50;
}
.message-box.info {
border-color: #2196f3;
background-color: #e3f2fd;
color: #2196f3;
}
</style>
</head>
<body>
<div class="message-box <%= level %>">
<%= message %>
</div>
</body>
</html>
"""
def stdout_template,
do: """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title><%= device %></title>
<style>
body {
background-color: #1e1e1e;
color: #c5c8c6;
font-family: "Courier New", Courier, monospace;
margin: 0;
padding: 20px 20px 20px 5px;
}
.container {
border: 1px solid #444;
border-radius: 5px;
overflow: hidden;
}
.header {
background-color: #444;
color: #c5c8c6;
padding: 10px;
font-weight: bold;
text-transform: uppercase;
}
.output {
background-color: #1d1f21;
border-left: 4px solid <%= border_color %>;
padding: 12px 12px 12px 5px;
font-size: 16px;
color: #c5c8c6;
white-space: pre-wrap;
word-break: break-all;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<%= device %>
</div>
<div class="output">
<%= output %>
</div>
</div>
</body>
</html>
"""
def make_stdout(output, device, border_color \\ "gray") do
Kino.HTML.new(EEx.eval_string(stdout_template(), binding()))
end
def make_log(message, level) do
Kino.HTML.new(EEx.eval_string(log_template(), binding()))
end
end
Setup Form
Now we setup the form using built-in Kino
s. A Frame
in Kino
is really just a placeholder where we can render future Kino
s. So here we setup corresponding frames for each thing we want to display over the lifespan of the application. If we don't want it to display at the start, we dont render anything into it. So you will notice the pattern here is to create an empty frame and then create a corresponding Kino
widget, which will be rendered into its frame at some point throughout the lifespan of the application.
There are a few operations on frames you should know about since you'll see them used throughout this code:
Kino.Frame.render
- Replaces all of the contents of the frame with the new content you specify, youKino.Frame.append
- Appends the input content to the end of the existing frame. You will notice this is how things such as the logs and output are rendered.Kino.Frame.clear
- Clears out the content of the frame. You will mostly see this called either at the beginning or end of event listeners throughout this code.
original = Kino.Frame.new()
prompt = Kino.Input.textarea("Prompt")
upload = Kino.Input.file("Upload", accept: Upload.accepted_types())
errors = Kino.Frame.new(placeholder: false)
submit_button = Kino.Control.button("Run!")
submit_frame = Kino.Frame.new(placeholder: false)
undo_frame = Kino.Frame.new(placeholder: false)
reset_frame = Kino.Frame.new(placeholder: false)
undo_button = Kino.Control.button("Undo")
reset_button = Kino.Control.button("Reset")
output = Kino.Frame.new(placeholder: false)
logs = Kino.Frame.new(placeholder: false)
debug_checkbox = Kino.Input.checkbox("Verbose Mode")
debug_frame = Kino.Frame.new(placeholder: false)
explain_checkbox = Kino.Input.checkbox("Explain Outputs", default: true)
explain_frame = Kino.Frame.new(placeholder: false)
retries = Kino.Input.number("# Retries", default: 2)
Kino.Frame.render(
original,
Kino.HTML.new(
EEx.eval_string(Boilerplate.placeholder(),
message: "Upload Media to Get Started",
show_spinner: false
)
)
)
inputs = Kino.Layout.grid([prompt, retries], columns: 2, gap: 10)
buttons =
Kino.Layout.grid([submit_frame, undo_frame, reset_frame, explain_frame, debug_frame],
columns: 7,
gap: 1
)
Kino.Layout.grid([original, inputs, upload, buttons, output, logs])
FFMPEG Instructions
Now we will implement the two modules that will be used to interact with the LLM using Instructor
.
Remember how I mentioned that oftentimes ffmpeg
will return large chunks of text as output and it can be a bit difficult to parse through and interpret which part of it is relevant to what you want?
Well this first module, which I've appropriately named Alfred
, is in charge of helping you interpret those results (if you so choose).
Alfred
will be called to help explain the contents of stdout
and stderr
whenever the resulting command (either ffmpeg
or ffprobe
) writes to them. It will pass along the relevant context, including the original task that you asked to be done, as well as the command that was called that resulted in those outputs.
Alfred
also provides you with a confidence metric which tells you how confident it is in the provided explanation. It ranges from 0
to 10
in increments of 0.5
, but of course you can tune that as you wish.
Alfed
will only be called upon if the explain_outputs
form toggle is enabled, AND if the resulting command actually output to either stdout
or stderr
. If it output to both, it will recieve both and give an explanation that incorporates all of the information.
The main components needed here for Instructor
are the embedded_schema
, which defined the structure that must be returned, the validate_changeset
function which defines additional validations that will be performed on the resulting structured response, and the set of prompts (Instructor
even uses the @doc
field defined for the embedded_schema
).
defmodule Alfred do
use Ecto.Schema
use Instructor.Validator
import Ecto.Changeset
import Exterval
@confidence_interval ~i<[0,10]//0.5>
@system_prompt """
You are the companion Agent to another Agent whose job is to product execve-styled arguments
for programs given a specific prompt. Your job is to interpret and explain the output
of the command after it has been run. You will be given the prompt / task that originally
generated the command, then you will be given the command that was run, along with the
output that was generated. You do not need to re-explain what the task was or regurgitate
what the command was. You only need to explain what the output means within the context
of the task. If the task / prompt was a question, you should determine whether the provided
output directly answers the question and if it does not you should answer it based on the
output. If the output is not relevant to the prompt this should also be noted.
You will also provide a confidence score about how confident you are about the above explanation.
The confidence score is separate from the explanation.
"""
@primary_key false
@doc """
## Field Descriptions:
- explanation: Explanation of the output given the context of the task and command that was run
- confidence: Rating from 0 to 10 in increments of 0.5 of how confident you are in your answer,
with higher scores being more confident.
"""
embedded_schema do
field(:explanation, :string)
field(:confidence, :float)
end
@impl true
def validate_changeset(changeset) do
changeset
|> validate_inclusion(:confidence, @confidence_interval)
end
def execute(prompt, command, retries, outputs \\ [stdout: nil, stderr: nil]) do
Instructor.chat_completion(
model: "gpt-4o",
response_model: __MODULE__,
max_retries: retries,
messages:
[
%{
role: "system",
content: @system_prompt
},
%{
role: "user",
content: """
Here's the prompt that generated the command: #{inspect(prompt)}
"""
},
%{
role: "user",
content: """
Here's command: #{inspect(command)}
"""
},
Keyword.get(outputs, :stdout) &&
%{
role: "user",
content: """
stdout: #{inspect(Keyword.fetch!(outputs, :stdout))}
"""
},
Keyword.get(outputs, :stderr) &&
%{
role: "user",
content: """
stderr: #{inspect(Keyword.fetch!(outputs, :stderr))}
"""
}
]
|> Enum.filter(& &1)
)
end
end
Now we define the module that is the star of the show. The AutoFfmpeg
agent will receive a task (prompt) from the user as well as the input type (eg. mp4
, mp3
,png
, etc.), and has to decide three things:
- What program to use to accomplish the given task (between
ffmpeg
andffprobe
) - The set of arguments, formatted as a list of strings, to use to accomplish the task
- The output file type given the list of allowed types (or
null
if the task doesn't write to a new file)
You will also notice the field(:output_path, :string, virtual: true)
, which is a field that the LLM is not required to output, but sets it aside for us to use later on. We will use this to store information to be used once the successfuly response is generated.
Most of the important work here actually happens within the validate_changeset
callback. This callback is invoked by Instructor
after a response is received and coerced into the schema. Any errors found during validation are stored in the changeset and are used to retry the request to the LLM, providing the error to steer the LLM towards our desired result. This is that feedback loop I mentioned above that is normally lacking.
Whereas most implementations of validate_changeset
would use functions provided by Ecto.Changeset
, we need to implement a very custom validation since our validation includes actually trying to run the generated ffmpeg
/ffprobe
command.
Here is a diagram of how the calls to the LLM using Instructor
work with the validation function.
Hopefully now you can see the advantages to using this approach as opposed to just using ChatGPT or manually performing this workflow. This self-correct capability gained from using Instructor
dramatically increases the speed in which you can go from prompt to desired result.
defmodule AutoFfmpeg do
use Ecto.Schema
use Instructor.Validator
@system_prompt """
You are a multimedia editor and your job is to receive tasks for multimedia editing and use
the programs available to you (and only those) to complete the tasks. You will return arguments
to be passed to the program assuming that the input file(s) has already been passed. You do not need to
call the binary itself, you are only in charge of generating all subsequent
arugments after inputs have been passed. Assume the output file path will be appended
after the arguments you provide.
You have access to the following programs: ffmpeg and ffprobe
So assume the command already is composed of something like
`ffmpeg -i input_file_path [..., args, ...] output_file_path` and you then pass arugments
to complete the given task. You will also be provided the input file for context, but you
should not include inputs in your arguments. Use the given file extension to determine how
to form your arugments. You will also provide the output file
extension / file type, since depending on the task it could differ from the input type. If the
given task does not result in an operation that writes to a file, (eg. asking for timestamps
where it is silent would result in writing to stdout), the extension would be `null`.
If the command is such that it will output to stdout, you should output as JSON when
possible.
"""
@doc """
## Field Descriptions:
- program: the executable program to call
- arguments: execve-formatted arguments for the command
- output_ext: The extension (filetype) of the outputted file
"""
@primary_key false
embedded_schema do
field(:program, Ecto.Enum, values: [:ffmpeg, :ffprobe])
field(:arguments, {:array, :string})
field(:output_ext, Ecto.Enum,
values: [
:mp4,
:ogg,
:avi,
:wmv,
:mov,
:wav,
:mp3,
:mpeg,
:jpeg,
:jpg,
:png,
:gif,
:svg,
:pixel,
:null
]
)
field(:output_path, :string, virtual: true)
end
@impl true
def validate_changeset(
changeset,
%{
upload_path: upload_path,
debug: debug,
debug_frame: debug_frame,
output_frame: output_frame,
prompt: prompt,
explain: explain,
retries: retries
}
) do
program = Ecto.Changeset.get_field(changeset, :program)
program_args = Ecto.Changeset.get_field(changeset, :arguments)
input_args = ["-i", upload_path]
output_ext = Ecto.Changeset.get_field(changeset, :output_ext)
output_args =
cond do
program == :ffprobe ->
[]
output_ext == :null ->
["-f", "null", "-"]
true ->
[Upload.generate_temp_filename(Atom.to_string(output_ext))]
end
command =
Enum.join([Atom.to_string(program) | input_args ++ program_args ++ output_args], " ")
if debug do
message = """
<strong>Prompt:</strong> <em>#{prompt}</em><br><br>
<strong>Command:</strong> <code>#{command}</code>
"""
Kino.Frame.append(
debug_frame,
Boilerplate.make_log(
message,
:info
)
)
end
case :exec.run(command, [
:sync,
:stdout,
:stderr
]) do
{:ok, result} when is_list(result) ->
outputs =
[:stdout, :stderr]
|> Enum.map(fn device ->
if Keyword.has_key?(result, device) do
output = Enum.join(Keyword.fetch!(result, device), "")
Kino.Frame.append(output_frame, Boilerplate.make_stdout(output, device))
{device, output}
else
{device, nil}
end
end)
if explain do
case Alfred.execute(prompt, command, retries, outputs) do
{:ok, %Alfred{explanation: explanation, confidence: confidence}} ->
Kino.Frame.append(
output_frame,
Boilerplate.make_stdout(
"<strong>Explanation:</strong> #{explanation}\n\n<strong>Confidence:</strong> #{confidence}",
:alfred,
"green"
)
)
{:error,
%Ecto.Changeset{
errors: [
explanation: {error, _extras}
],
valid?: false
}} ->
Kino.Frame.append(
debug_frame,
Boilerplate.make_log("Trouble providing explanation: #{inspect(error)}", :error)
)
end
end
if program == :ffmpeg && output_ext != :null do
[output_path] = output_args
Ecto.Changeset.put_change(changeset, :output_path, output_path)
else
changeset
end
{:error, result} when is_list(result) ->
debug &&
Kino.Frame.append(
debug_frame,
Boilerplate.make_log("Something Went Wrong! Retrying...", :error)
)
error =
cond do
Keyword.has_key?(result, :stderr) ->
Keyword.fetch!(result, :stderr) |> Enum.join("")
Keyword.has_key?(result, :stdout) ->
Keyword.fetch!(result, :stdout) |> Enum.join("")
Keyword.has_key?(result, :exit_signal) ->
"Error resulted in exit code #{Keyword.fetch!(result, :exit_signal)}"
true ->
"Unexpected error occurred!"
end
Ecto.Changeset.add_error(
changeset,
:arguments,
error,
status: Keyword.get(result, :exit_status)
)
end
end
def execute(prompt, %{upload_path: upload_path} = context, retries) do
Instructor.chat_completion(
model: "gpt-4o",
validation_context: Map.put(context, :prompt, prompt) |> Map.put(:retries, retries),
response_model: __MODULE__,
max_retries: retries,
messages: [
%{
role: "system",
content: @system_prompt
},
%{
role: "user",
content: """
Here's the editing task: #{inspect(prompt)}
"""
},
%{
role: "user",
content: """
Here's input file type: #{inspect(Upload.ext_type(upload_path))}
"""
}
]
)
end
end
Listeners
The last thing to do is to setup the actual application lifecycle. We will simply pass all possible input Kino
s into a Kino.Control.tagged_stream
which will let us listen to events and match to events according to which input emitted the event. Then we perform all operations for each each event. Let's break down how we handle each event:
:explain
- Theexplain_outputs
toggle was changed, so we just update the state of that form field:retries
- Theretries
number input was changed, so we just update the state of that form field:debug
- Thedebug_checkbox
toggle was changed, so we just update the state of that form field:prompt
- Theprompt
textarea input was changed, so we just update the state of that form field:upload
- This accepts a file upload, verifies that its extension / file type is supported, and then pushes it to the history state, and renders the media into theoriginal
frame, which is just the frame showing the current media.:submit
- Gets the prompt from the state. Passes the prompt toAutoFfmpeg
to get the commands from the LLM to complete the task. Checks for the return value (which is the result of all retries upon failure), and on success pushes the new output to the history state (if there is a new output since some tasks only output tostdout
/stderr
) and renders that output. On failure will render an error to thelogs
frame.:reset
- Sets the history state to only the head of the history, which will be the original video, and renders it. This effectively undoes any edits that were applied.:undo
- Pops the most recent entry of the history, which just reverts to the previous version of the media, and renders it.
import Upload
[
upload: upload,
submit: submit_button,
reset: reset_button,
undo: undo_button,
prompt: prompt,
debug: debug_checkbox,
retries: retries,
explain: explain_checkbox
]
|> Kino.Control.tagged_stream()
|> Kino.listen(fn
{:explain, %{type: :change, value: value}} ->
FormState.update(:explain_outputs, value)
{:retries, %{type: :change, value: value}} ->
FormState.update(:retries, value)
{:debug, %{type: :change, value: value}} ->
FormState.update(:debug, value)
{:prompt, %{type: :change, value: prompt}} ->
FormState.update(:prompt, prompt)
{:upload,
%{
type: :change,
value: %{
file_ref: file_ref,
client_name: filename
}
}} ->
Kino.Frame.clear(logs)
Kino.Frame.clear(output)
ext_type = Upload.ext_type(filename)
unless is_valid_upload(ext_type) do
Kino.Frame.render(
logs,
Boilerplate.make_log(
"File must be of one of the following types: #{inspect(Upload.accepted_types())}",
:error
)
)
else
file_path =
file_ref
|> Kino.Input.file_path()
tmp_path = Upload.generate_temp_filename(ext_type)
_bytes_copied = File.copy!(file_path, tmp_path)
upload = Upload.new(filename, tmp_path)
Upload.to_kino(upload) |> then(&Kino.Frame.render(original, &1))
EditHistory.push(upload)
Kino.Frame.render(debug_frame, debug_checkbox)
Kino.Frame.render(explain_frame, explain_checkbox)
Kino.Frame.render(submit_frame, submit_button)
Kino.Frame.clear(undo_frame)
Kino.Frame.clear(reset_frame)
end
{:submit, %{type: :click}} ->
Kino.Frame.clear(logs)
Kino.Frame.clear(output)
prompt = FormState.get(:prompt) |> String.trim()
if prompt == "" do
Kino.Frame.append(logs, Boilerplate.make_log("Prompt cannot be empty!", :error))
else
Kino.Frame.render(
original,
Kino.HTML.new(
EEx.eval_string(Boilerplate.placeholder(), message: "Working...", show_spinner: true)
)
)
{%Upload{} =
current_upload, _old_prompt} = EditHistory.current()
num_retries = FormState.get(:retries)
case AutoFfmpeg.execute(
prompt,
%{
upload_path: current_upload.path,
debug: FormState.get(:debug),
debug_frame: logs,
output_frame: output,
explain: FormState.get(:explain_outputs)
},
num_retries
) do
{:ok, %AutoFfmpeg{output_path: output_path}} ->
FormState.get(:debug) &&
Kino.Frame.append(logs, Boilerplate.make_log("Success!", :success))
unless is_nil(output_path) do
new_upload = Upload.new(current_upload.filename, output_path)
EditHistory.push(new_upload, prompt)
Upload.to_kino(new_upload) |> then(&Kino.Frame.render(original, &1))
else
Upload.to_kino(current_upload) |> then(&Kino.Frame.render(original, &1))
end
Kino.Frame.render(undo_frame, undo_button)
Kino.Frame.render(reset_frame, reset_button)
{:error,
%Ecto.Changeset{
changes: %{
arguments: _arguments,
output_ext: _output_ext
},
errors: [
arguments: {error, [status: _status]}
],
valid?: false
}} ->
Upload.to_kino(current_upload) |> then(&Kino.Frame.render(original, &1))
Kino.Frame.append(
logs,
Boilerplate.make_log("Failed after #{num_retries} attempts!", :error)
)
Kino.Frame.append(logs, Boilerplate.make_log(error, :error))
{:error, <<"LLM Adapter Error: ", error::binary>>} ->
Upload.to_kino(current_upload) |> then(&Kino.Frame.render(original, &1))
{error, _binding} = error |> Code.eval_string()
Kino.Frame.append(
logs,
Boilerplate.make_log("Error! Reference the error below for details", :error)
)
Kino.Frame.append(logs, Kino.Tree.new(error))
{:error, <<"Invalid JSON returned from LLM: ", error::binary>>} ->
Upload.to_kino(current_upload) |> then(&Kino.Frame.render(original, &1))
Kino.Frame.append(logs, Boilerplate.make_log(error, :error))
end
end
{:reset, %{type: :click}} ->
{%Upload{} = original_upload, nil} = EditHistory.reset()
Upload.to_kino(original_upload) |> then(&Kino.Frame.render(original, &1))
Kino.Frame.clear(logs)
Kino.Frame.clear(output)
Kino.Frame.clear(reset_frame)
Kino.Frame.clear(undo_frame)
{:undo, %{type: :click}} ->
Kino.Frame.clear(logs)
Kino.Frame.clear(output)
case EditHistory.undo_edit() do
nil ->
Kino.Frame.append(logs, Kino.Text.new("Error! Cannot `Undo`. No previous edit."))
{%Upload{} = previous_upload, _previous_prompt} ->
Upload.to_kino(previous_upload) |> then(&Kino.Frame.render(original, &1))
Kino.Frame.clear(logs)
if EditHistory.previous_edit() == nil do
Kino.Frame.clear(reset_frame)
Kino.Frame.clear(undo_frame)
end
end
end)
And that's all there is to it! Now you can either interact with the application after manually running all cells in the Livebook or you can deploy it as an application (from the left sidebar) and run it that way. If you choose to deploy it, make sure to click the checkbox in the deployment configuration for Only render rich outputs
to make it more like a standalone application.
Conclusion
Now you have a working AI-powered ffmpeg
tool to quickly iterate over ffmpeg
commands using natural language. I've found myself using this quite a lot now, and with the debug
mode turned on you can see the generated ffmpeg
commands and learn a bit while you're at it.
I want to draw special attention to how much tools like Livebook and Kino
allow you to deploy usable applications extremely quickly. They take care of many of the concerns that you might not want to focus on when trying to deploy an inital version of an application, or just trying to iterate through the idea.
Now if you wanted to turn this into a full-fledged application you certainly could, but at least for a non web developer like myself, these tools allow me to still create good user-friendly tools at a fast pace.
You could realistically get abot 85% of this solution only using the Instructor
and Kino
libraries, with erlexec
adding a bit by separating stdout
from stderr
.
This still has several shortcomings and thus will not be the best fit for all tasks. For those simple tasks mentioned at the top of this post though I think it can be a great tool in your kit.
Of course you can also tune the prompts, the chosen models (right now it's using OpenAI's gpt4o
but some benchmarks show gpt4-preview
to outperform gpt4o
on coding tasks -- although this might not classify as a coding task), or how you compose the arguments.
Also, as it stands right now, this can only handle one input and one output at a time, but could be altered to handle multiple of each. As of the time of this writing, Kino.Input.File
only allows one upload at a time, which is the main reason only one input is supported. The main reason only one output is supported is because I wanted this to automatically render the output, and didn't want to worry about how to display multiple.
If you want to see these features, feel free to request them as Issues on the repo or make PRs!
That's all I have for now. If you enjoyed this article, please consider subscribing or follow me to see more!