-
Notifications
You must be signed in to change notification settings - Fork 366
Test libraries for storage operations #3513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Test libraries for storage operations #3513
Conversation
fb6999e to
7d138bf
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your diligent porting work, @pingtimeout !
|
An easy fix here to trim the size of this PR would be to split it into one per container rather than moving all of them at once. Just a guide for the future. |
| @@ -0,0 +1,23 @@ | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all new? Just noting it has the Apache license and not the Dremio one
| public class S3Resource { | ||
| @Inject ObjectStorageMock mockServer; | ||
|
|
||
| private static final Owner TEST_OWNER = Owner.of(42, "nessie-iceberg-s3-mock"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be confusing later, not a big deal for now though
|
i believe we are reinventing the wheel here, why not use something like S3Mock ? can we do more thorough analysis on why we need this custom code and not use something like libraries like S3Mock |
|
I'm fine with adding this for now, since it's just a test dependency, if we want to use S3 mock or something like that in the future it's still available. |
In short, the technical capabilities of these testing utilities are different from S3Mock, which is why they were developed under Project Nessie in the first place. One such aspect is As @RussellSpitzer noted, it is certainly possible to make adjustments later if developers find more efficient ways to work with and test the code in #3256. |
|
Thank you for sharing this @dimas-b ! I believe thats precisely i was looking to get answer for ! I wonder if can't we adjust the tests to work with s3mocks rather than writing whole req / responses even introducing things like Owner. my understanding was that iceberg uses s3 mock pointer, even trino uses s3 mock Trino so i was confused on what exactly are we testing thats not possible with s3mock as we see other projects which do similar stuff is capable of. I believe we just want to check it in and use it for tests (#3256) since its tests we are fine, without changing we have and think about it later ? |
|
I think @singhpk234 has a good point. Given the size of this change, around 5600 LOC(79 new files), I think it is worth pausing and discussing this on the dev mailing list. Reusing an existing, well maintained library where possible is generally preferable, and we should be careful about reinventing the wheel unless there is a clear technical gap that existing solutions cannot cover. It would be good to have a broader discussion on dev ML about: If we agree that custom code is still the right direction after that discussion, at least we will have alignment and a documented rationale. Happy to help start a dev list thread if that sounds reasonable. |
|
I agree for a discussion on the dev mailing list. Happy to drive it if needed. |
|
If it's a concern about the size of this PR, I wonder if it's not better to be back on the initial proposal: using nessie test dependecy as a library and create an issue to find alternative.
|
+1 to this from my POV. I believe it is the easiest option both in terms of license (Nessie is ALv2) and maintenance (the code is covered by active CI under Project Nessie). |
|
Just to clarify my position, I am not against adding dependencies or accepting donations. My concern is more about locking ourselves into a single option too early. I think it would be good to explore a bit more, look at multiple libraries or approaches, and understand what concrete requirements we actually have before converging. Is that allowed by the community? From that angle, a discussion on the dev mailing list feels reasonable, both to validate the direction and to surface any other concerns the wider community may have. Happy to proceed once we have that shared context and alignment. |
As part of #3256 , the following test libraries were added from the Nessie project:
objet-storage-mock- Allows for mocking the object storage layer and simulate any Iceberg table, including tables with extremely large number of filesazurite-testcontainer- Allows for simulating Azure Storage with testcontainers in Polaris codebasegcs-testcontainer- Allows for simulating GCS with testcontainers in Polaris codebaseThere was an explicit request to pull the code into Polaris, instead of depending on that code as a library (#3256 (comment), #3256 (review)). This is what this PR does.
This PR introduces three test-only libraries in the
toolsdirectory so that they can be used by #3256.In case it can be useful, here is a bash script that generates
diffcommands for all files added in this PR against a Nessie folder. It can be used to verify that the code that comes from Nessie has not been modified, apart from package names, code-copied-to-polaris marker and other minor tweaks.I have also tested the current PR by integrating it with #3256 and verifying that all the tests from #3256 still pass.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)