Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add column replacer #48

Merged
merged 2 commits into from
Sep 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 61 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
# S3S
# s3s

S3S is a go binary instead of [vast-engineering/s3select](https://github.com/vast-engineering/s3select).
**s3s** is a go binary instead of [vast-engineering/s3select](https://github.com/vast-engineering/s3select).

## Feature
## Features

s3s query all files lower than S3 prefix.

Available below:

- Input JSON to Output JSON
- Input CSV to Output JSON
- Input Application Load Balancer Logs to Output JSON
- Input CloudFront Logs to Output JSON

- [x] Input JSON to Output JSON
## Usage

```console
Expand Down Expand Up @@ -37,6 +45,8 @@ GLOBAL OPTIONS:
--where value, -w value WHERE part of the query
```

s3s is execution S3 Select from json to json (default).

```console
$ s3s s3://bucket/prefix
{"time":1654848930,"type":"speak"}
Expand All @@ -53,21 +63,64 @@ $ s3s -q 'SELECT * FROM S3Object s WHERE s.type = "speak"' s3://bucket/prefix
// $ s3s -w 's.type = "speak"' s3://bucket/prefix
```

### CSV support

`--csv` option is no header csv only
s3s can execute S3 Select from csv to json when `--csv` option enabled.

```console
// 122, hello
$ s3s s3://bucket/prefix
{"_1":122,"_2":"hello"}
```

`--alb-logs` or `--cf-logs` option is tagging available instead of _1, _2, etc
### ALB and CF logs support

`--alb-logs` or `--cf-logs` option is tagging available instead of `_1`, `_2`, etc.

- [Application Load Balancer Format](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html)
- [CloudFront Format](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html)

And also, `--where` replace column names to column numbers.
But `--query` does not replace columns for execution raw query.

```console
// below query is same as $ s3s --alb-logs --query="'SELECT * FROM S3Object s WHERE s.`_2` = '2022-09-01T00:00:00.000000Z'" s3://prefix
$ s3s --alb-logs --where="s.`time` = '2022-09-01T00:00:00.000000Z'" s3://prefix
```

|index|ALB|CF|
|-|-|-|
|_1|type|date|
|_2|time|time|
|_3|elb|x-edge-location|
|_4|client:port|sc-bytes|
|_5|target:port|c-ip|
|_6|request_processing_time|cs-method|
|_7|target_processing_time|cs(Host)|
|_8|response_processing_time|cs-uri-stem|
|_9|elb_status_code|sc-status|
|_10|target_status_code|cs(Referer)|
|_11|received_bytes|cs(User-Agent)|
|_12|sent_bytes|cs-uri-query|
|_13|request|cs(Cookie)|
|_14|user_agent|x-edge-result-type|
|_15|ssl_cipher|x-edge-request-id|
|_16|ssl_protocol|x-host-header|
|_17|target_group_arn|cs-protocol|
|_18|trace_id|cs-bytes|
|_19|domain_name|time-taken|
|_20|chosen_cert_arn|x-forwarded-for|
|_21|matched_rule_priority|ssl-protocol|
|_22|request_creation_time|ssl-cipher|
|_23|actions_executed|x-edge-response-result-type|
|_24|redirect_url|cs-protocol-version|
|_25|error_reason|fle-status|
|_26|target:port_list|fle-encrypted-fields|
|_27|target_status_code_list|c-port|
|_28|classification|time-to-first-byte|
|_29|classification_reason|x-edge-detailed-result-type|
|_30||sc-content-type|
|_31||sc-range-start|
|_32||sc-range-end|

### `-delve`, like directory move before querying

search from prefix
Expand Down
2 changes: 1 addition & 1 deletion cmd/s3s/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ func cmd(ctx context.Context, paths []string) error {

// Execution
if queryStr == "" {
queryStr = buildQuery(where, limit, isCount)
queryStr = buildQuery(where, limit, isCount, isALBLogs, isCFLogs)
}
queryInfo := &s3s.QueryInfo{
IsCountMode: isCount,
Expand Down
88 changes: 86 additions & 2 deletions cmd/s3s/query.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,80 @@
package main

import "strconv"
import (
"regexp"
"strconv"
)

func buildQuery(where string, limit int, isCount bool) string {
var (
albLogsWhereMap = map[string]string{
"type": "_1",
"time": "_2",
"elb": "_3",
"client:port": "_4",
"target:port": "_5",
"request_processing_time": "_6",
"target_processing_time": "_7",
"response_processing_time": "_8",
"elb_status_code": "_9",
"target_status_code": "_10",
"received_bytes": "_11",
"sent_bytes": "_12",
"request": "_13",
"user_agent": "_14",
"ssl_cipher": "_15",
"ssl_protocol": "_16",
"target_group_arn": "_17",
"trace_id": "_18",
"domain_name": "_19",
"chosen_cert_arn": "_20",
"matched_rule_priority": "_21",
"request_creation_time": "_22",
"actions_executed": "_23",
"redirect_url": "_24",
"error_reason": "_25",
"target:port_list": "_26",
"target_status_code_list": "_27",
"classification": "_28",
"classification_reason": "_29",
}
cfLogsWhereMap = map[string]string{
"date": "_1",
"time": "_2",
"x-edge-location": "_3",
"sc-bytes": "_4",
"c-ip": "_5",
"cs-method": "_6",
"cs(Host)": "_7",
"cs-uri-stem": "_8",
"sc-status": "_9",
"cs(Referer)": "_10",
"cs(User-Agent)": "_11",
"cs-uri-query": "_12",
"cs(Cookie)": "_13",
"x-edge-result-type": "_14",
"x-edge-request-id": "_15",
"x-host-header": "_16",
"cs-protocol": "_17",
"cs-bytes": "_18",
"time-taken": "_19",
"x-forwarded-for": "_20",
"ssl-protocol": "_21",
"ssl-cipher": "_22",
"x-edge-response-result-type": "_23",
"cs-protocol-version": "_24",
"fle-status": "_25",
"fle-encrypted-fields": "_26",
"c-port": "_27",
"time-to-first-byte": "_28",
"x-edge-detailed-result-type": "_29",
"sc-content-type": "_30",
"sc-content-len": "_31",
"sc-range-start": "_32",
"sc-range-end": "_33",
}
)

func buildQuery(where string, limit int, isCount bool, isALBLogs bool, isCFLogs bool) string {
if where == "" && limit == 0 && !isCount {
return DEFAULT_QUERY
}
Expand All @@ -20,5 +92,17 @@ func buildQuery(where string, limit int, isCount bool) string {
if limit != 0 {
query += " LIMIT " + strconv.Itoa(limit)
}
if isALBLogs {
for k, v := range albLogsWhereMap {
rep := regexp.MustCompile(` (s\.)?` + "`?" + k + "`" + `? `)
query = rep.ReplaceAllString(query, " s."+v+" ")
}
} else if isCFLogs {
for k, v := range cfLogsWhereMap {
rep := regexp.MustCompile(` (s\.)?` + "`?" + k + "`" + `? `)
query = rep.ReplaceAllString(query, " s."+v+" ")
}
}

return query
}
135 changes: 135 additions & 0 deletions cmd/s3s/query_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
package main

import "testing"

func TestBuildQuery(t *testing.T) {
cases := []struct {
name string
where string
limit int
isCount bool
isALBLogs bool
isCFLogs bool
want string
}{
{
name: "default",
where: "",
limit: 0,
isCount: false,
isALBLogs: false,
isCFLogs: false,
want: "SELECT * FROM S3Object s",
},
{
name: "where",
where: "s.time > '2022-09-26 00:00:00'",
limit: 0,
isCount: false,
isALBLogs: false,
isCFLogs: false,
want: "SELECT * FROM S3Object s WHERE s.time > '2022-09-26 00:00:00'",
},
{
name: "limit",
where: "",
limit: 1,
isCount: false,
isALBLogs: false,
isCFLogs: false,
want: "SELECT * FROM S3Object s LIMIT 1",
},
{
name: "count",
where: "",
limit: 0,
isCount: true,
isALBLogs: false,
isCFLogs: false,
want: "SELECT COUNT(*) FROM S3Object s",
},
{
name: "where as alb-logs",
where: "s.time > '2022-09-26 00:00:00'",
limit: 0,
isCount: false,
isALBLogs: true,
isCFLogs: false,
want: "SELECT * FROM S3Object s WHERE s._2 > '2022-09-26 00:00:00'",
},
{
name: "where as alb-logs without s.",
where: "time > '2022-09-26 00:00:00'",
limit: 0,
isCount: false,
isALBLogs: true,
isCFLogs: false,
want: "SELECT * FROM S3Object s WHERE s._2 > '2022-09-26 00:00:00'",
},
{
name: "where as alb-logs using backquote",
where: "s.`time` > '2022-09-26 00:00:00'",
limit: 0,
isCount: false,
isALBLogs: true,
isCFLogs: false,
want: "SELECT * FROM S3Object s WHERE s._2 > '2022-09-26 00:00:00'",
},
{
name: "where as alb-logs using backquote without s.",
where: "`time` > '2022-09-26 00:00:00'",
limit: 0,
isCount: false,
isALBLogs: true,
isCFLogs: false,
want: "SELECT * FROM S3Object s WHERE s._2 > '2022-09-26 00:00:00'",
},
{
name: "where as cf-logs",
where: "s.date > '2022-09-26'",
limit: 0,
isCount: false,
isALBLogs: false,
isCFLogs: true,
want: "SELECT * FROM S3Object s WHERE s._1 > '2022-09-26'",
},
{
name: "where as cf-logs without s",
where: "date > '2022-09-26'",
limit: 0,
isCount: false,
isALBLogs: false,
isCFLogs: true,
want: "SELECT * FROM S3Object s WHERE s._1 > '2022-09-26'",
},
{
name: "where as cf-logs using backquote",
where: "s.`date` > '2022-09-26'",
limit: 0,
isCount: false,
isALBLogs: false,
isCFLogs: true,
want: "SELECT * FROM S3Object s WHERE s._1 > '2022-09-26'",
},
{
name: "where as cf-logs using backquote without s",
where: "`date` > '2022-09-26'",
limit: 0,
isCount: false,
isALBLogs: false,
isCFLogs: true,
want: "SELECT * FROM S3Object s WHERE s._1 > '2022-09-26'",
},
}

for _, tt := range cases {
tt := tt
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got := buildQuery(tt.where, tt.limit, tt.isCount, tt.isALBLogs, tt.isCFLogs)
if got != tt.want {
t.Errorf("want = %s,\nbut got = %s", tt.want, got)
}
})
}
}