-
Notifications
You must be signed in to change notification settings - Fork 361
Description
model = VoxCPM.from_pretrained("./pretrained_weights/")
Non-streaming
wav = model.generate(
text="VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.",
prompt_wav_path="/root/andy/overall_process/VoxCPM_TTS/example.wav", # optional: path to a prompt speech for voice cloning
prompt_text="123", # optional: reference text
cfg_value=2.0, # LM guidance on LocDiT, higher for better adherence to the prompt, but maybe worse
inference_timesteps=10, # LocDiT inference timesteps, higher for better result, lower for fast speed
normalize=False, # enable external TN tool, but will disable native raw text support
denoise=False, # enable external Denoise tool, but it may cause some distortion and restrict the sampling rate to 16kHz
retry_badcase=True, # enable retrying mode for some bad cases (unstoppable)
retry_badcase_max_times=3, # maximum retrying times
retry_badcase_ratio_threshold=6.0, # maximum length restriction for bad case detection (simple but effective), it could be adjusted for slow pace speech
)
sf.write("output.wav", wav, model.tts_model.sample_rate)
print("saved: output.wav")
加了参考音频文件之后,只要运行就是段错误:
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-21 21:53:57.579946: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/iic/speech_zipenhancer_ans_multiloss_16k_base
2025-12-21 21:54:02,662 - modelscope - INFO - initiate model from /root/.cache/modelscope/hub/models/iic/speech_zipenhancer_ans_multiloss_16k_base
2025-12-21 21:54:02,663 - modelscope - INFO - initiate model from location /root/.cache/modelscope/hub/models/iic/speech_zipenhancer_ans_multiloss_16k_base.
2025-12-21 21:54:02,664 - modelscope - INFO - initialize model from /root/.cache/modelscope/hub/models/iic/speech_zipenhancer_ans_multiloss_16k_base
2025-12-21 21:54:03,360 - modelscope - WARNING - No preprocessor field found in cfg.
2025-12-21 21:54:03,360 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2025-12-21 21:54:03,360 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/root/.cache/modelscope/hub/models/iic/speech_zipenhancer_ans_multiloss_16k_base'}. trying to build by task and model information.
2025-12-21 21:54:03,360 - modelscope - INFO - No preprocessor key ('speech_zipenhancer_ans_multiloss_16k_base', 'acoustic-noise-suppression') found in PREPROCESSOR_MAP, skip building preprocessor. If the pipeline runs normally, please ignore this log.
Warm up VoxCPMModel...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:18<00:00, 1.87s/it]
段错误