You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're trying to convert JSON to Parquet with compression for one of our requirements. We found ChoETL to be very useful. We have a question regarding CompressionMethod. We took the Sample52.json message from the repo as an example to see if it suffices our requirement. The compression method we're looking at is Gzip.
What we found out, when we completely took off the CompressionMethod in the ParquetWriter, it was around 5.7 MB. But, with CompressionMethod, it was around 6.9 MB.
We tried adding a compression level too.
With a value of 8 as Compression Level, it was around 6.6 MB. Understand it's .3 MB less but, was looking far less than that when the message got compressed.
Just wondering if we're using the component the way it should be used or, if it's the best it can offer as it stands.
Another thing we didn't understand, without CompressionMethod, the size was less.
using (var r = ChoJSONReader.LoadText(requestBody)
.UseJsonSerialization()
.JsonSerializationSettings(s => s.DateParseHandling = DateParseHandling.None)
.JsonSerializationSettings(s => s.NullValueHandling = NullValueHandling.Include)
)
{
using var parquetStream = new MemoryStream();
{
using (var w = new ChoParquetWriter(parquetStream)
.Configure(c => c.CompressionMethod = Parquet.CompressionMethod.Gzip)
.ThrowAndStopOnMissingField(false)
)
{
w.Write(r);
}
}
Thanks in Advance.
The text was updated successfully, but these errors were encountered:
Hi,
We're trying to convert JSON to Parquet with compression for one of our requirements. We found ChoETL to be very useful. We have a question regarding CompressionMethod. We took the Sample52.json message from the repo as an example to see if it suffices our requirement. The compression method we're looking at is Gzip.
What we found out, when we completely took off the CompressionMethod in the ParquetWriter, it was around 5.7 MB. But, with CompressionMethod, it was around 6.9 MB.
We tried adding a compression level too.
With a value of 8 as Compression Level, it was around 6.6 MB. Understand it's .3 MB less but, was looking far less than that when the message got compressed.
Just wondering if we're using the component the way it should be used or, if it's the best it can offer as it stands.
Another thing we didn't understand, without CompressionMethod, the size was less.
using (var r = ChoJSONReader.LoadText(requestBody)
.UseJsonSerialization()
.JsonSerializationSettings(s => s.DateParseHandling = DateParseHandling.None)
.JsonSerializationSettings(s => s.NullValueHandling = NullValueHandling.Include)
)
{
using var parquetStream = new MemoryStream();
{
using (var w = new ChoParquetWriter(parquetStream)
.Configure(c => c.CompressionMethod = Parquet.CompressionMethod.Gzip)
.ThrowAndStopOnMissingField(false)
)
{
w.Write(r);
}
}
Thanks in Advance.
The text was updated successfully, but these errors were encountered: