string to varchar with length #517

anilpanicker · 2022-09-22T23:55:41Z

Background [Optional]

A clear explanation of the reason for raising the question.
This gives us a better understanding of your use cases and how we might accommodate them.

Question

we want to write the dataframe to SQL server, the dataframe has string datatype where we want to change the type to varchar with correct length. Is there a way to get fieldName, dataType and length from the copyBook?

yruslan · 2022-09-23T13:22:09Z

Hi, you can get lengths and other parameters from an AST generated by parsing a copybook using CopybookParser.parseSimple(copyBookContents).

Example: https://github.com/AbsaOSS/cobrix#spark-sql-schema-extraction

When invoking parseSimple() you get an AST that you can traverse and read field lengths and other field properties.

anilpanicker · 2022-09-24T12:10:06Z

ok, thanks let me try

yruslan · 2022-09-26T07:10:29Z

I'm also thinking of adding a metadata field to the generated Spark schema that will contain maximum lengths of string fields, so converting this question to a feature request.

anilpanicker · 2022-09-26T11:47:58Z

Thanks, Ruslan, the same idea came to my mind as well. Our use case is to load the data to RDBMS, currently, all strings default to max length (nvarchar). If we have lengths available we can add an option like this:
df.write.format("JDBC").option("createTableColumnTypes","ProductID Int, ProductName nvarchar(100) )

…ta field.

yruslan · 2022-10-10T07:18:37Z

The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch.
Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata
You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.

anilpanicker · 2022-10-10T10:09:03Z

Thanks for the quick turnaround. Will check it out.

…

On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko ***@***.***> wrote: The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon. — Reply to this email directly, view it on GitHub <#517 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!! - Francis Sullivan

anilpanicker · 2022-10-14T23:33:50Z

Hi Ruslan, Another question: we have a data file with x length (x > 90), but I want to parse only the first 90 bytes, is it possible with the current approach? I tried with record_length option but it did not work. Let me please know your thoughts. On Mon, Oct 10, 2022 at 6:08 AM Anil Ramapanicker ***@***.***> wrote:

…

Thanks for the quick turnaround. Will check it out. On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko < ***@***.***> wrote: > The new metadata field ('maxLength') for each Spark schema column is now > available in the 'master' branch. > Here are details on this: > https://github.com/AbsaOSS/cobrix#spark-schema-metadata > You can try it out by cloning master and building from source, or you can > wait for the release of Cobrix 2.6.0, which should be soon. > > — > Reply to this email directly, view it on GitHub > <#517 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > -- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!! - Francis Sullivan

-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!! - Francis Sullivan

anilpanicker added the question Further information is requested label Sep 22, 2022

yruslan added enhancement New feature or request accepted Accepted for implementation labels Sep 26, 2022

yruslan removed the question Further information is requested label Sep 26, 2022

yruslan added a commit that referenced this issue Oct 3, 2022

#517 Add 'maxLength' to Spark schema metadata for string fields.

7058720

yruslan added a commit that referenced this issue Oct 3, 2022

#517 Add a unit test and documentation for the new 'maxLength' metada…

a5766a9

…ta field.

yruslan added a commit that referenced this issue Oct 4, 2022

#517 Fix maxLength metadata field for debug fields.

330fb21

yruslan added a commit that referenced this issue Oct 5, 2022

#517 Bump Scala and Spark to the latest version.

bc88f52

yruslan added a commit that referenced this issue Oct 5, 2022

#517 Add 'maxLength' to Spark schema metadata for string fields.

1955005

yruslan added a commit that referenced this issue Oct 5, 2022

#517 Add a unit test and documentation for the new 'maxLength' metada…

f239ae7

…ta field.

yruslan added a commit that referenced this issue Oct 5, 2022

#517 Fix maxLength metadata field for debug fields.

7572b6a

yruslan added a commit that referenced this issue Oct 5, 2022

#517 Bump Scala and Spark to the latest version.

3cc09e4

yruslan closed this as completed Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string to varchar with length #517

string to varchar with length #517

anilpanicker commented Sep 22, 2022

yruslan commented Sep 23, 2022

anilpanicker commented Sep 24, 2022

yruslan commented Sep 26, 2022

anilpanicker commented Sep 26, 2022

yruslan commented Oct 10, 2022

anilpanicker commented Oct 10, 2022 via email

anilpanicker commented Oct 14, 2022 via email

string to varchar with length #517

string to varchar with length #517

Comments

anilpanicker commented Sep 22, 2022

Background [Optional]

Question

yruslan commented Sep 23, 2022

anilpanicker commented Sep 24, 2022

yruslan commented Sep 26, 2022

anilpanicker commented Sep 26, 2022

yruslan commented Oct 10, 2022

anilpanicker commented Oct 10, 2022 via email

anilpanicker commented Oct 14, 2022 via email