Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug in the reading from oracle clob #1183

Closed

Conversation

avpalienko
Copy link

OCILobGetLength and OCILobRead for character LOBs (CLOBs, NCLOBs) uses the number of characters, not bytes. So, there is a bug with buffer array sizing

Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the switch to using a callback really needed? If the problem is really just a wrong byte count, couldn't we multiply (divide?) the length by whatever to fix it?

Also, it would be tremendously helpful if you could please add a unit test checking for this problem, i.e. failing now and passing with this fix. Could you please do this? TIA!

@avpalienko
Copy link
Author

Is the switch to using a callback really needed? If the problem is really just a wrong byte count, couldn't we multiply (divide?) the length by whatever to fix it?

If we should avoid the callback we can try OCILobRead2
How to know the multiplier ?

Also, it would be tremendously helpful if you could please add a unit test checking for this problem, i.e. failing now and passing with this fix. Could you please do this? TIA!

Ok

Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is changing faster than I can review it, so I'll stop for now, but let me publish some comments I had so far.

}

value.assign(buf.begin(), buf.end());
value = std::move ( accumTo );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this makes no sense, we should just pass value to the callback instead of accumTo directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very nice to mark a comment as resolved when it wasn't addressed at all.

if (len != 0)
ub4 amtp = 0;
ub4 offset = 1;
char buf[10240] = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use both a buffer and a callback? Shouldn't we not need the buffer any longer if we use the callback?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oci read into the buffer and callback after

TEST_CASE ( "Oracle clob", "[oracle][clob]" )
{
soci::session sql ( backEnd, connectString );
std::string testPhrase{ "SystГЁme est ouvert aux requГЄtes d'information" };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally misencoded?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, first of all, this doesn't really work even with this PR apparently due to this.

Second, any I really don't want to be mean, but just I don't understand how can a so obviously questionable line be added without any comment explaining it. You should know that something so clearly unusual and apparently wrong would raise questions from whoever is reading it, and I can't fathom why you wouldn't want to explain that the string must be wrongly encoded to trigger the bug (if this is how it really is).

@vadz
Copy link
Member

vadz commented Nov 18, 2024

Using just the function-based part of the test, it's clear that we have an encoding problem here, i.e. at the first glance this has nothing to do with the byte/char mismatch.

Moreover, we don't need to use some weird bytes in the string to see it, the same problem happens with just

    std::string testPhrase{ "Système est ouvert aux requêtes d'information" };

too. I.e. it looks like the real problem is that Oracle/OCI doesn't use the correct encoding for the string.

@vadz
Copy link
Member

vadz commented Nov 18, 2024

In fact, the same problem happens with a plain VARCHAR2() too, not just CLOB, so even if there is a problem with count mismatch for CLOBs, it's clearly not the only problem here.

I don't know anything about charsets in Oracle, but it looks like we need to specify that we're using UTF-8 somehow. It's pretty weird that this doesn't work out of the box, even though NLS_CHARACTERSET database parameter is set to AL32UTF8 and it would be nice to understand what's going on here, but I'm pretty sure that this PR is not the right solution for the problem, whatever it is.

@avpalienko
Copy link
Author

In fact, the same problem happens with a plain VARCHAR2() too, not just CLOB, so even if there is a problem with count mismatch for CLOBs, it's clearly not the only problem here.

There are no problems with varchar2, export NLS_LANG is required on client session

I don't know anything about charsets in Oracle, but it looks like we need to specify that we're using UTF-8 somehow. It's pretty weird that this doesn't work out of the box, even though NLS_CHARACTERSET database parameter is set to AL32UTF8 and it would be nice to understand what's going on here, but I'm pretty sure that this PR is not the right solution for the problem, whatever it is.

I have changed oracle.sh to use utf-8 encoding

@vadz
Copy link
Member

vadz commented Nov 19, 2024

Thanks, I forgot about NLS_LANG weirdness. However I still don't think this has anything to do with using callback — does the (simpler) solution in #1184 work for you or do you still see any problems with it and, if so, how can they be reproduced?

@avpalienko
Copy link
Author

Thanks, I forgot about NLS_LANG weirdness. However I still don't think this has anything to do with using callback — does the (simpler) solution in #1184 work for you or do you still see any problems with it and, if so, how can they be reproduced?

What is the advantage of using a self-made loop over an oci internal loop with the callback?
However, I won't insist on my solution. What matters to me is resolving the bug

@avpalienko avpalienko closed this Nov 20, 2024
@vadz
Copy link
Member

vadz commented Nov 20, 2024

What is the advantage of using a self-made loop over an oci internal loop with the callback?

I think using both the buffer and the callback is confusing, if the OCI really doesn't allow to do without the buffer in this case, I'd rather avoid defining the callback.

I don't know if the behaviour is the same with OCILobRead2(), maybe we could make things simpler using it, but I didn't have time to try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants