Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copybook meta data for RDBMS #634

Closed
sree018 opened this issue Jul 27, 2023 · 5 comments
Closed

copybook meta data for RDBMS #634

sree018 opened this issue Jul 27, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@sree018
Copy link

sree018 commented Jul 27, 2023

Background

Currently, copybook metadata comes as spark schema, we need schema as rdbms level

Example [Optional]

'''
01 MASTER-RECORD.
02 RDT-TLF-MTHD-NM PIC X(08).
02 RDT-ADJ-ORGN-TRAN-DT PIC 9(06).
02 FILLER PIC X(03).
02 RDT-ADDL-DATA-GROUP.
05 RDT-ADDL-DATA OCCURS 0 TO 2 TIMES
DEPENDING ON RDT-ADDL-SEGS-NO.
10 RDT-ADDL-SEG-KEY.
15 RDT-ADDL-SEG-KEY-PROD PIC X(02).
15 RDT-ADDL-SEG-KEY-TYPE PIC S9(15)V99 COMP-3.
'''
Current Schema:
root
|-- RDT-TLF-MTHD-NM String
|-- RDT-ADJ-ORGN-TRAN-DT integer
|-- RDT-ADDL-DATA-GROUP
|-- RDT-ADDL-SEG-KEY
|-- RDT-ADDL-SEG-KEY-PROD String
|-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)

expected out
|-- RDT-TLF-MTHD-NM VARCHAR(08)
|-- RDT-ADJ-ORGN-TRAN-DT integer (06)
|-- RDT-ADDL-DATA-GROUP
|-- RDT-ADDL-SEG-KEY
|-- RDT-ADDL-SEG-KEY-PROD VARCHAR(08)
|-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)

we are able get parent-level element lengths only before flattening

df.schema.fields(0).metadata.getLong("maxLength")

is there any option to get the expected schema?

@sree018 sree018 added the enhancement New feature or request label Jul 27, 2023
@yruslan
Copy link
Collaborator

yruslan commented Aug 1, 2023

Spark does not have varchar() type, nor integer(6) data types, only string and integer, so the expected output you specified is not possible.

However, it could be possible to retain metadata after schema flattening. How do you flat the schema?

@sree018
Copy link
Author

sree018 commented Aug 1, 2023

SparkUtils.flattenSchema(df,useShortFieldManes=false)

@yruslan
Copy link
Collaborator

yruslan commented Aug 3, 2023

I've tested if retaining the metadata is possible, and it is.

This PR makes SparkUtils.flattenSchema() retain metadata: #635

It is already merged into master. Please, test if you can and let me know if it works for you.

@sree018
Copy link
Author

sree018 commented Aug 7, 2023

@yruslan

New feature working.

thanks for feature

@yruslan
Copy link
Collaborator

yruslan commented Aug 8, 2023

Awesome! Thanks for letting me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants