Skip to content

Conversation

@pgierz
Copy link
Member

@pgierz pgierz commented Feb 16, 2025

Using the Artificial Intelligence to rewrite some of our more horrible insides. Part 1: The Calendar.

This pull request includes several changes aimed at improving the codebase and adding new functionality. The most important changes involve updating dependencies, refactoring code for better readability, and adding new tests.

Rewrite:

  • esm_calendar.py: Completely rewritten by CoPilot.

Dependency updates:

  • setup.py: Added the deprecated library to the dependencies.

Code refactoring:

Functionality improvements:

  • src/esm_runscripts/batch_system.py, src/esm_runscripts/oasis.py, src/esm_runscripts/prepare.py, src/esm_runscripts/prev_run.py, src/esm_runscripts/resubmit.py, src/esm_runscripts/yac.py: Updated the format method calls to use print_hours, print_minutes, and print_seconds instead of givenph, givenpm, and givenps for better clarity. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

New tests:

@pgierz
Copy link
Member Author

pgierz commented Feb 16, 2025

This is as much a proof of concept as anything else, but I put esm_calendar through the AI to see if it could handle refactoring in a way that might be useful.

First the plus points:

  • The resulting code ismuch cleaner than it would have been by hand
  • Automatically generated unit tests cover the most important behaviour

Minus points:

  • Some fixes "by hand" still need to be applied
  • The resulting differences are, obviously, very large. It becomes difficult to guarantee the same behaviour before and after this refactoring, which is (I guess) a lesson in technical debt and that we should have had unit tests before.

I would like to try similar AI refactors of some of the other modules, if we find this to be useful.

@pgierz
Copy link
Member Author

pgierz commented Feb 16, 2025

It's almost there, something is still wrong with the delta dates. I need to restore the old behaviour and write some target unit tests to see how it should look.

@pgierz
Copy link
Member Author

pgierz commented Feb 16, 2025

OK, the old behaviour doesn't seem to make sense to me at all:

>>> d1 = Date("2024-02-16T12:30:00")
>>> d2 = Date("2024-02-16T11:30:00")
>>> d1 - d2
[0, 0, 0, 0, 0, 0]
>>> d1
Date(2024-02-16T12:30:00)
>>> d1.hour
12
>>> d1.year
2024
>>> d1.time_between(d2)
0
>>> d1.time_between(d2, "days")
0
>>> d1 = Date("2001-01-01T00:00:00")
>>> d1
Date(2001-01-01T00:00:00)
>>> d2 = Date("2000-01-01T00:00:00")
>>> d1 - d2
[1, 12, 366, 8784, 527040, 31622400]
>>> d1.time_between(d2)
31622400

Subtraction gives back 1 year in all possible combinations? How is that sensible?

@pgierz
Copy link
Member Author

pgierz commented Feb 16, 2025

Even worse:

>>> d1 = Date("2024-02-16T12:30:00")
>>> d2 = Date("2024-02-16T11:30:00")
>>> d1.hour
12
>>> d2.hour
11
>>> d1 - d2
[0, 0, 0, 0, 0, 0]

So dates that are obviously 1 hour apart are calculated to be 0 apart.

@pgierz
Copy link
Member Author

pgierz commented Feb 16, 2025

In the new implementation, this makes much more sense:

python
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from esm_calendar import Date
>>> d1 = Date("2024-02-16T12:30:00")
>>> d2 = Date("2024-02-16T11:30:00")
>>> d1 - d2
[0, 0, 0, 1, 0, 0]
>>> d1.time_between(d2)
3600
>>> d1.time_between(d2, unit="hours")
1
>>> d1.time_between(d2, unit="days")
0
>>> d1.time_between(d2, unit="years")
0
>>> d1.time_between(d2, unit="seconds")
3600
>>> d1 = Date("2001-01-01T00:00:00")
>>> d2 = Date("2000-01-01T00:00:00")
>>> d1 - d2
[1, 0, 0, 0, 0, 0]

@pgierz
Copy link
Member Author

pgierz commented Feb 16, 2025

This seems to be by design:

def sub_date(self, other):
d2 = copy.deepcopy(
[self.year, self.month, self.day, self.hour, self.minute, self.second]
)
d1 = copy.deepcopy(
[other.year, other.month, other.day, other.hour, other.minute, other.second]
)
diff = [0, 0, 0, 0, 0, 0]
while d1[1] > 1:
diff[1] -= 1
d1[1] -= 1
diff[2] -= self._calendar.day_in_month(d1[0], d1[1])
while d2[1] > 1:
diff[1] += 1
d2[1] -= 1
diff[2] += self._calendar.day_in_month(d2[0], d2[1])
while d1[2] > 1:
diff[2] -= 1
d1[2] -= 1
while d2[2] > 1:
diff[2] += 1
d2[2] -= 1
if diff[1] < 0:
diff[0] = diff[0] - 1
while d2[0] > d1[0]:
diff[0] += 1
diff[1] += 12
diff[2] += self._calendar.day_in_year(d1[0])
d1[0] += 1
diff[3] += diff[2] * 24
if diff[3] < 0:
diff[3] += 24
diff[4] += diff[3] * 60
if diff[4] < 0:
diff[4] += 60
diff[5] += diff[4] * 60
if diff[5] < 0:
diff[5] += 60
return diff
.

Since that is the case, maybe the refactored version also should take this into account, however, I think it would make more sense to name this something more intuitive, or provide a specific DeltaESMDate, similar to how Python's built in timedelta works.

@pgierz
Copy link
Member Author

pgierz commented Feb 17, 2025

Differences in esm-tests are now only due to a different repr: esm-calendar object ... vs proper class names Calendar object with....

@pgierz pgierz marked this pull request as ready for review February 17, 2025 16:10
@pgierz pgierz mentioned this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants