Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MeterW1therm.cpp to catch erroneous temperature values #610

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

h-eff
Copy link

@h-eff h-eff commented Nov 8, 2023

As announced here: https://www.photovoltaikforum.com/thread/199267-blitzeinschlag-in-der-nachbarschaft-hat-mir-den-vz-lahmgelegt/?postID=3459011#post3459011 , I've written some code to catch

  1. the erroneous 0°C temperature reading,
  2. the 85°C error reading and
  3. the 127.9375°C error reading
    before they end up in the database. This is based upon work from Chris Petrich: https://github.com/cpetrich/counterfeit_DS18B20

A similar check for conversion success was added to w1_therm kernel module in 2020: https://github.com/torvalds/linux/blame/305230142ae0637213bf6e04f6d9f10bbcb74af8/drivers/w1/slaves/w1_therm.c#L1186-L1197

So when do these readings occur?

  1. can happen when the bus is unstable due to bad design (some sensors in star topology, long wiring, too many sensors, pullup inadequate, EMI etc.). The busmaster will receive only zeroes when it reads the 8 byte scratchpad from a DS18B20. Unfortunately, this will result in a correct CRC checksum (which is also zero). Fortunately, according to the DS18B20 datasheet, it is impossible for a scratchpad to contain only zeroes (i.e. byte 6 won't be zero if byte 0 is zero, the alarm values at byte 2&3 won't be both zero etc.). So this reading can be safely discarded as wrong.
  2. happens when a genuine DS18B20 has to report a temperature before it could finish the temperature conversion. Or, if the first conversion command was never received by the DS18B20 due to an unstable bus. Luckily, scratchpad byte 6 will tell the difference (at least for genuine DS18B20). For counterfeit sensors, I disabled this check, as they mostly do not handle byte 6 as genuine DS18B20 do (see cpetrich). Sensor discrimination is done via sensor address (see cpetrich).
    This discrimination will work until the serial numbers of genuine DS18B20 reach 0x0001xxxxxxxx. Currently, we are at 0x00000fxxxxxx. I estimate this won't be a problem before 2050.
  3. was never encountered by myself. According to cpetrich, it can happen when a parasitic circuit with some DS18B20 happens to have at least one DS18B20 with Vcc left floating (and not tied to ground as the datasheet demands). Also, temperature readings above 125°C are outside sensor specification and probably always fishy.

In my opinion, all of these readings should not end up as spikes in the temperature database, but preferable as debug error in the vzlogger logfile.

As of yet, I still have little experience writing C++, so probably I did not write the best possible code. Please comment on what could be better.

I did not yet test this code on a RasPi. I plan to set up a test system in the next two weeks.

Added some code to catch 
1) the erroneous 0°C temperature reading,
2) the 85°C error reading and
3) the 127.9375°C error reading 
before they end up in the database. This is based upon work from Chris Petrich: 
https://github.com/cpetrich/counterfeit_DS18B20

A similar check for conversion success was added to w1_therm kernel module in 2020:
https://github.com/torvalds/linux/blame/305230142ae0637213bf6e04f6d9f10bbcb74af8/drivers/w1/slaves/w1_therm.c#L1186-L1197

I did not test this code on a RasPi yet.
@h-eff h-eff changed the title Update MeterW1therm.cpp Update MeterW1therm.cpp to catch erroneous temperature values Nov 8, 2023
if (fgets(buffer, 100, fp)) { // e.g. 07 01 55 00 7f ff 0c 10 18 t=16437

///// Start of new code: Catch DS18B20-specific errors. /////
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put this into a function to isolate it, instead of surrounding with comments?
private check_quirks(device,buffer){...
if (!check_quirks(device,buffer)) return false;

@@ -83,8 +83,50 @@ bool MeterW1therm::W1sysHWif::readTemp(const std::string &device, double &value)
print(log_debug, "CRC not ok from %s (%s)", "w1t", dev.c_str(), buffer);
} else {
// crc ok
// now parse t=<value>
// now check for other errors:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is misplaced...
the result of fgets() is not only for error-checking, it's also what the reading is parsed from.

@r00t-
Copy link
Contributor

r00t- commented Nov 9, 2023

thanks for your contribution, good stuff!
i think the code could be a little cleaner, can assist if you like.
also looking forward to test results, might do some myself.

@r00t-
Copy link
Contributor

r00t- commented Nov 14, 2023

@h-eff:
also, care to make the formatting check happy, or should we assist?
best way is to install clang-format and then run check-formatting.sh locally.

@r00t-
Copy link
Contributor

r00t- commented Jun 24, 2024

i suggest we merge this, even with @h-eff not responding anymore, as it's really useful.

can anybody else provide some review?

will try to test also, but it's not possible to test those corner-cases in hardware. (we probably should have automated tests for this.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants