Sylber: Syllabic Embedding Representation of Speech from Raw Audio


Anonymous team

Contents

Resynthesis Results

Sample Number Ground Truth HuBERT SD-HuBERT Sylber
50 100 200 5K 10K 20K $\infty$ 5K 10K 20K $\infty$

1

2

3

4

5

6

7

8

9

10

11

Articulatory Interpolation Simulation used in Categorical Perception Experiment

Rhyming Word Pair $\alpha = 0.0$ $\alpha = 0.1$ $\alpha = 0.2$ $\alpha = 0.3$ $\alpha = 0.4$ $\alpha = 0.5$ $\alpha = 0.6$ $\alpha = 0.7$ $\alpha = 0.8$ $\alpha = 0.9$ $\alpha = 1.0$

down, town

zip, sip

ball, mall

lest, rest

thin, thing