Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string to varchar with length #517

Closed
anilpanicker opened this issue Sep 22, 2022 · 7 comments
Closed

string to varchar with length #517

anilpanicker opened this issue Sep 22, 2022 · 7 comments
Labels
accepted Accepted for implementation enhancement New feature or request

Comments

@anilpanicker
Copy link

Background [Optional]

A clear explanation of the reason for raising the question.
This gives us a better understanding of your use cases and how we might accommodate them.

Question

we want to write the dataframe to SQL server, the dataframe has string datatype where we want to change the type to varchar with correct length. Is there a way to get fieldName, dataType and length from the copyBook?

@anilpanicker anilpanicker added the question Further information is requested label Sep 22, 2022
@yruslan
Copy link
Collaborator

yruslan commented Sep 23, 2022

Hi, you can get lengths and other parameters from an AST generated by parsing a copybook using CopybookParser.parseSimple(copyBookContents).

Example: https://github.com/AbsaOSS/cobrix#spark-sql-schema-extraction

When invoking parseSimple() you get an AST that you can traverse and read field lengths and other field properties.

@anilpanicker
Copy link
Author

ok, thanks let me try

@yruslan yruslan added enhancement New feature or request accepted Accepted for implementation labels Sep 26, 2022
@yruslan
Copy link
Collaborator

yruslan commented Sep 26, 2022

I'm also thinking of adding a metadata field to the generated Spark schema that will contain maximum lengths of string fields, so converting this question to a feature request.

@yruslan yruslan removed the question Further information is requested label Sep 26, 2022
@anilpanicker
Copy link
Author

Thanks, Ruslan, the same idea came to my mind as well. Our use case is to load the data to RDBMS, currently, all strings default to max length (nvarchar). If we have lengths available we can add an option like this:
df.write.format("JDBC").option("createTableColumnTypes","ProductID Int, ProductName nvarchar(100) )

@yruslan
Copy link
Collaborator

yruslan commented Oct 10, 2022

The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch.
Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata
You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.

@anilpanicker
Copy link
Author

anilpanicker commented Oct 10, 2022 via email

@anilpanicker
Copy link
Author

anilpanicker commented Oct 14, 2022 via email

@yruslan yruslan closed this as completed Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants