Add Deepgram Streaming Diarization #74

dbrkn · 2025-11-14T22:51:35Z

This PR adds the streaming diarization pipeline used for benchmarking. It currently supports the Deepgram Streaming API, with additional pipelines to be integrated in the future.

EduardoPach

Looks good, just a few nits and good to merge

EduardoPach · 2025-12-18T14:26:03Z

src/openbench/pipeline/orchestration/orchestration_deepgram_streaming.py

+        elif "model_timestamps_confirmed" in output and output["model_timestamps_confirmed"]:
+            # Fallback to regular transcription without speaker
+            for timestamp_group in output["model_timestamps_confirmed"]:
+                for word_info in timestamp_group:
+                    if "word" in word_info:
+                        words.append(
+                            Word(
+                                word=word_info.get("word", ""),
+                                start=word_info.get("start"),
+                                end=word_info.get("end"),
+                                speaker=None,
+                            )
+                        )


Setting the speaker to None will likely cause an error downstream. I'd suggest raising an error in this case since the main reason speaker labels is the core info for Orchestration

dberkin1 added 9 commits November 14, 2025 20:50

update benchmarks.md

22eedde

Refactor

9eaa8a7

Refactor

d3c92cd

add deepgram streaming diarization pipeline

f0e2dec

update benchmarks

8037e29

Refactor

adc13fe

refactor

3bda1d5

.

e9b314c

fix api key

1dbc5fe

dbrkn requested a review from EduardoPach November 14, 2025 22:51

Merge branch 'main' into berkin/streaming-diarization

6cbd429

dbrkn changed the title ~~Add Streaming Diarization Pipeline~~ Add Deepgram Streaming Diarization Nov 19, 2025

dberkin1 and others added 3 commits November 19, 2025 19:12

Add Deepgram streaming orchestration pipeline

28da8ec

remove streaming diarization

6c6a394

Merge branch 'main' into berkin/streaming-diarization

62870c1

EduardoPach changed the title ~~Add Deepgram Streaming Diarization~~ Add Deepgram Streaming Diarization and Speechmatics Nov 21, 2025

dbrkn force-pushed the berkin/streaming-diarization branch from e4b7d44 to 62870c1 Compare December 17, 2025 17:14

dberkin1 added 2 commits December 17, 2025 20:16

Merge main into berkin/streaming-diarization

913b5b7

Fix merge conflict

d350e0d

dbrkn changed the title ~~Add Deepgram Streaming Diarization and Speechmatics~~ Add Deepgram Streaming Diarization Dec 17, 2025

fix pipeline type

0b5dcfc

EduardoPach requested changes Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Deepgram Streaming Diarization #74

Add Deepgram Streaming Diarization #74

Uh oh!

dbrkn commented Nov 14, 2025

Uh oh!

EduardoPach left a comment

Uh oh!

EduardoPach Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Deepgram Streaming Diarization #74

Are you sure you want to change the base?

Add Deepgram Streaming Diarization #74

Uh oh!

Conversation

dbrkn commented Nov 14, 2025

Uh oh!

EduardoPach left a comment

Choose a reason for hiding this comment

Uh oh!

EduardoPach Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants