听觉效果超炫 现在,HuggingGPT 增加了 Gradio 演示: 话不多说,直接看一个 AudioGPT 对话的例子,涵盖文本、图像和语音三种模态的输入,AudioGPT 能够理解不同模态的输入,生成结果也能以文本、图像、音频相结合的方式展现给用户。 语音 Q1: 'Generate a speech with text'here we go'.'A1: Q2: 'Transcribe this speech.'A2: Here we go.Q3: 'Separate each speech from the speech mixture.' A3: 音乐 Q4: 'Please generate a piece of singing voice. Text sequence is 小酒窝长睫毛 AP 是你最美的记号. Note sequence is C#4/Db4 | F#4/Gb4 | G#4/Ab4 | A#4/Bb4 F#4/Gb4 | F#4/Gb4 C#4/Db4 | C#4/Db4 | rest | C#4/Db4 | A#4/Bb4 | G#4/Ab4 | A#4/Bb4 | G#4/Ab4 | F4 | C#4/Db4. Note duration sequence is 0.407140 | 0.376190 | 0.242180 | 0.509550 0.183420 | 0.315400 0.235020 | 0.361660 | 0.223070 | 0.377270 | 0.340550 | 0.299620 | 0.344510 | 0.283770 | 0.323390 | 0.360340.'A4: 音效 Q5: 'Generate an audio of a piano playing.'A5: Q6: Give me the description of this audio. A6: The audio is recording of a goat bleating nearby several times. 3D 说话人 Q7: Generate a talking human portrait video. A7: