Skip to content

Hyperopt sampling#2371

Open
tgiani wants to merge 27 commits intomasterfrom
sampling
Open

Hyperopt sampling#2371
tgiani wants to merge 27 commits intomasterfrom
sampling

Conversation

@tgiani
Copy link
Contributor

@tgiani tgiani commented Sep 23, 2025

this PR should enable n3fit to produce a fit using different hyperparameters for each replica, taking as input the results of an hyperopt run. I think we need to

  1. write a script implementing the sampling of the hyperopt trials. This should produce a file containing the hyperparameters settings for each replicas
  2. enable n3fit to read from this file the settings for each replica

@tgiani tgiani changed the title reading different hyperparameters for each replica Hyperopt sampling Sep 23, 2025
# different samples
else:
with open(params['hyperopt_res'], 'r') as file:
hyperopt_params = json.load(file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hyperopt_params = json.load(file)
else:
hyperopt_params = [params]

@tgiani
Copy link
Contributor Author

tgiani commented Feb 26, 2026

This is a possible way to select and use the best trials : one should add in the runcard

trial_specs: 
  hyperscan: 260204-jcm-hyperopt
  thermalization: 400
  number_of_trials: 10

which will download the full 260204-jcm-hyperopt hyperscan and select the 10 best trials, after dropping the first 400 ones for thermalization.
Using this info the procution rule produce_trials will create a dictionary containing the settings of the best trials, which will then be used in the fit. Results discussed so far for nnpdf4.1 test have been produced using thermalization: 400 and number_of_trials: 10, which should then be used in teh baseline runcard.

If trial_specs is not given in the runcard, teh fit will use teh settings specified in the runcard under parameters

@tgiani tgiani marked this pull request as ready for review February 26, 2026 12:19
@scarlehoff
Copy link
Member

I like this approach a lot. I had a quick look and seems fine to me. I'll try to have a deeper look later and then we can merge.

@scarlehoff
Copy link
Member

We should add a test, at least to the regressions, for this. Same runcard, two replicas. Something like that.

And perhaps it would be wise to add to the .json file at the end the hyperparameter of the fit so we always know the parameters of each replica.

n_best = trial_specs['number_of_trials']
best = hyperopt_dataframe[n_termalization:].sort_values('loss')[:n_best].to_dict(orient='list')
best['number_of_trials'] = n_best
return best
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is part of the config, it should use the loader (or the fallbackloader)

I'm unsure whether this should be here or in the validphys config though.

The other problem I see is that this is not seen by setupfit so if you send many jobs in parallel all of them will try to download the same thing to the same place, which can lead to a very bad crash (just happened to me in the cluster 😅)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scarlehoff does the last commit solve this issue?

Copy link
Contributor Author

@tgiani tgiani Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another maybe better option could be to do everything in vp config, create there the best trials and save a json file with them in the table folder or something. n3fit would then read it. Just like we do with stuff like thcovmat I guess. But how do I access output folders from the config to save the json file...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess since you went for the option of downloading a scan, I think it is fine not to save the json file (since the fit would be in the server anyway).
But then you need to make sure the parameters are in the json of each replica.

@tgiani
Copy link
Contributor Author

tgiani commented Mar 3, 2026

@scarlehoff thank you, I will have a go at your comments this afternoon

@tgiani tgiani added the redo-regressions Recompute the regression data label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hyperoptimization redo-regressions Recompute the regression data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants