You are right, not many people are speaking, even though it is so busy community according to the download numbers. Now I must tell you I was using your OneSql model, but it seems that Microsoft Phi-4 is just fine. It's giving me all the good results according to me. So I'm not switching that often to one SQL to try it out.
There are no fixes with the model but there is fine tuning and you are right free software can be taken by anybody and used however they wish and want.
I think that SQL responses can be implemented better, integrated better into the workflow. For example, it could be implemented directly in the command line tool called PSQL.
Here are some examples of what I did today.
π RCD Notes: Daily Average Spoken Pages
π RCD Notes: Daily Average Spoken Pages:
\pset border 2
\pset linestyle unicode
WITH word_count AS (
SELECT
speech_datecreated::date AS date,
SUM((LENGTH(speech_text) - LENGTH(REPLACE(speech_text, ' ', '')) + 1)) AS total_words
FROM
public.speech
GROUP BY
speech_datecreated::date
)
SELECT
date,
total_words,
ROUND((total_words::numeric / 500), 2) AS average_spoken_pages
FROM
word_count
ORDER BY
date;
Output
ββββββββββββββ¬ββββββββββββββ¬βββββββββββββββββββββββ
β date β total_words β average_spoken_pages β
ββββββββββββββΌββββββββββββββΌβββββββββββββββββββββββ€
β 2025-03-05 β 2011 β 4.02 β
β 2025-03-06 β 2391 β 4.78 β
β 2025-03-07 β 834 β 1.67 β
β 2025-03-08 β 286 β 0.57 β
β 2025-03-09 β 1142 β 2.28 β
β 2025-03-10 β 1680 β 3.36 β
β 2025-03-11 β 526 β 1.05 β
β 2025-03-12 β 814 β 1.63 β
β 2025-03-13 β 1453 β 2.91 β
β 2025-03-16 β 291 β 0.58 β
β 2025-03-17 β 969 β 1.94 β
β 2025-03-18 β 713 β 1.43 β
β 2025-03-19 β 1891 β 3.78 β
β 2025-03-20 β 1564 β 3.13 β
β 2025-03-21 β 1363 β 2.73 β
β 2025-03-22 β 3214 β 6.43 β
β 2025-03-23 β 5671 β 11.34 β
β 2025-03-24 β 2005 β 4.01 β
β 2025-03-25 β 3357 β 6.71 β
β 2025-03-26 β 2165 β 4.33 β
β 2025-03-27 β 9052 β 18.10 β
β 2025-03-28 β 12408 β 24.82 β
ββββββββββββββ΄ββββββββββββββ΄βββββββββββββββββββββββ
π RCD Notes: Speech size per day
π RCD Notes: Speech size per day:
\pset border 2
\pset linestyle unicode
SELECT
speech_datecreated::date AS date,
SUM(length(speech_text)) AS total_text_size_in_chars,
ROUND(SUM(length(speech_text))::numeric / 1024, 2) AS total_text_size_in_kb
FROM
public.speech
GROUP BY
speech_datecreated::date
ORDER BY
speech_datecreated::date;
Output
ββββββββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β date β total_text_size_in_chars β total_text_size_in_kb β
ββββββββββββββΌβββββββββββββββββββββββββββΌββββββββββββββββββββββββ€
β 2025-03-05 β 10103 β 9.87 β
β 2025-03-06 β 12568 β 12.27 β
β 2025-03-07 β 4355 β 4.25 β
β 2025-03-08 β 1474 β 1.44 β
β 2025-03-09 β 5847 β 5.71 β
β 2025-03-10 β 8917 β 8.71 β
β 2025-03-11 β 2811 β 2.75 β
β 2025-03-12 β 4117 β 4.02 β
β 2025-03-13 β 7455 β 7.28 β
β 2025-03-16 β 1554 β 1.52 β
β 2025-03-17 β 5036 β 4.92 β
β 2025-03-18 β 3917 β 3.83 β
β 2025-03-19 β 9420 β 9.20 β
β 2025-03-20 β 8183 β 7.99 β
β 2025-03-21 β 6819 β 6.66 β
β 2025-03-22 β 16401 β 16.02 β
β 2025-03-23 β 28514 β 27.85 β
β 2025-03-24 β 9844 β 9.61 β
β 2025-03-25 β 16094 β 15.72 β
β 2025-03-26 β 10238 β 10.00 β
β 2025-03-27 β 44336 β 43.30 β
β 2025-03-28 β 63773 β 62.28 β
ββββββββββββββ΄βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββ
Shell snippet to check spoken commands by embeddings
transcript=$(<"${temp_file}")
embedding=$(echo "$transcript" | rcd-llm-get-embeddings.sh)
if [ -z "$embedding" ]; then
log "ERROR" "Failed to generate embedding"
exit 1
fi
escaped_embedding="\$VECTOR\$${embedding}\$VECTOR\$"
speech_id=$(psql -t <<EOF
SELECT speech_id
FROM speech
WHERE speech_embeddings IS NOT NULL
AND speech_speechtypes = 13
AND speech_embeddings <=> ${escaped_embedding}::vector < 0.3
ORDER BY speech_embeddings <=> ${escaped_embedding}::vector
LIMIT 1
EOF
)
speech_id=$(echo "$speech_id" | tr -d '[:space:]')
if [[ -n "$speech_id" && "$speech_id" =~ ^[0-9]+$ ]]; then
log "INFO" "Match found (speech_id: $speech_id). Aborting."
exit 1
fi
Use case
When speaking, sometimes I make incoherent expressions, but then I wish to stop the process. In my case I just say "Aria, please stop recording", and even if I was speaking other things together, it gets recognized and speech transcription stops.
To understand visually how speech and transcription works, see this video:
https://www.youtube.com/watch?v=51jEUtjrARo
I serve many people in an executive role; therefore, you might consider that SQL snippets are beneficial worldwide. π₯οΈ Thank you once again! ππ