Perch-Hoplite embedding database integration #1178

sammlapp · 2025-10-30T20:33:10Z

No description provided.

this doesn't seem ideal, but we can't add "inf" values to the hoplite db, and don't want to rescale or normalize because the hoplite db doesn't contain any additional scaling metadata

untested! need to check implementation and see how it works

sammlapp · 2025-10-30T20:38:27Z

Still need to implement training shallow classifiers on hoplite embedding db (now done)

sammlapp · 2025-11-21T02:43:27Z

method for predict_from_hoplite_db()

this is a pretty big refactor aimed at making things like the BMZ Perch2/Birdnet/TF models and ONNXModel only need to implement batch_forward() and otherwise be able to use SpectrogramPreprocessor methods like .predict() and .embed() The strategy is that you implement batch_forward as appropraite for the class, and it returns a dictionary of outputs. Then predict() and embed() just need to make the dataloader with self.predict_dataloader(), iterate the dataloader to get batches, call batch_forward() on each batch, and aggregate the results across batches. Some things in the tutorials are likely broken after this refactor, I haven't run/tested/checked them.

the DataLoader needs to return lists of AudioSample anyways for other parts of the code to work, so we can specify using collate_audio_samples to get the sample.data and sample.labels of each sample after getting a batch of AudioSamples from a dataloader. This reduces complexity/confusion.

batch_forward is what we want to implement per-model anyways, so it is natural to define collation within batch_forward

instead, retain start time and end time listed in the original input, since this is what is loaded from the audio and we want to retain match between preprocessed/output start/end time and input (eg to know the processed sample matches the input) this resolves issues with existing samples not "matching" embedded samples and re-embedding

still getting surprising mismatches for floating point comparison of offsets, established rounding precision default of just 3 decimals

sammlapp added 6 commits August 7, 2025 14:46

implement embedding and querying hoplite db

318116c

Merge branch 'issue_1143_mplclassifier' into feat_hoplite

d419e39

clip embeddings to float16 range before casting for hoplite entry

a5b58ec

this doesn't seem ideal, but we can't add "inf" values to the hoplite db, and don't want to rescale or normalize because the hoplite db doesn't contain any additional scaling metadata

implement BCELossWeakNegatives (untested)

4aeef79

untested! need to check implementation and see how it works

expose args in MLPClassifier.fit()

cf34588

Merge branch 'develop' into feat_hoplite

c9ed617

sammlapp added 3 commits October 30, 2025 17:51

allow creating/loading db in cnn.embed_to_hoplite_db

8888b9d

hoplite training!

b3227b3

use weaknegativesloss by default for shallow classifier training

dc25338

sammlapp added 19 commits December 1, 2025 10:46

update gitignore

055d666

refactor hoplite integration for hoplite v1.0

1725cab

add missing import

2b77452

Merge branch 'develop' into feat_hoplite

bc75289

remove outdated ref to _invalid_samples

d7a881f

add file_to_datetime arg for hoplite embedding

43f6ff0

efficiency updates for hoplite embedding

01dd39c

Merge branch 'develop' into feat_hoplite

e4c1459

Merge branch 'develop' into feat_hoplite

8e6e1a6

format

9572a2a

fix bug where Audio.noise returns one too few samples

bfda729

add Audio.pad and Audio.pad_to methods

b61f406

updates to match hoplite 1.0.0.dev8 api

4674587

delay batch collation to within batch_forward

6ce65d5

batch_forward is what we want to implement per-model anyways, so it is natural to define collation within batch_forward

switch default out of bounds mode to ignore in Audio loading

00c5d21

switch to step-based training for MLPClassifier

c588f91

improve robustness for matching existing windows

2fb6ba1

still getting surprising mismatches for floating point comparison of offsets, established rounding precision default of just 3 decimals

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perch-Hoplite embedding database integration #1178

Perch-Hoplite embedding database integration #1178

Uh oh!

sammlapp commented Oct 30, 2025

Uh oh!

sammlapp commented Oct 30, 2025 •

edited

Loading

Uh oh!

sammlapp commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Perch-Hoplite embedding database integration #1178

Are you sure you want to change the base?

Perch-Hoplite embedding database integration #1178

Uh oh!

Conversation

sammlapp commented Oct 30, 2025

Uh oh!

sammlapp commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sammlapp commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sammlapp commented Oct 30, 2025 •

edited

Loading