-
Notifications
You must be signed in to change notification settings - Fork 13
[Enhancement]: Review Storage mechanics PLUS! #10
Comments
Hi @KeithHanson, It's way passed my bed time where I am (2AM) but wanted to touch base before I hit sleep. Thanks for the kind words btw :) I must admit, this was a DIY project, but would welcome any help in making it more accessible. On to the issue. My initial thoughts are the way it stores the clip metadata. It uses SQLITE which I'm not overly a fan of - I wonder if a different storage mechanism can be used? To stop multiple writes happening, all writes are queued (FIFO), and an interval is happening to check for anything in the queue with the Commit() function (every 10s). it's based on setTimeout, and if a write happens to error out, it may not reach calling setTimeout again for the next 10s, meaning there is lots in queue, but the Commit function is not being executed, as it crashed before setting the next 10s - the crash may have been due to some SQLITE lock issue, or file IO error, hence why i am not overly happy with my choice in using SQLite. Another potential is the IO buffer with the pipes between the FFMPEG processs and it's self, if that gets full, it's self will stop receiving FFMPEG output, but I'm sure I ignore stdout... |
Thank you for responding so quickly! DIY is my approach to handling a fleet of pole cameras for our municipality. We have a decentralized, edge-based, DIY approach. So far, our Real Time Crime Center analysts like the interface :) And it fits my approach (if it can run decentralized straight from the pole, that's the best outcome). Your response above all makes sense and what I was suspecting - likely some weird failure on the HD that gets taken care of by our other services on the Pi (decrypts and re-mounts the volume). I will make a branch littered with debug statements to try to catch this (though, as you know, we will need to wait some time for whatever failure to occur). Have you considered ditching the SQLite DB entirely? It's almost perfect as a use case to just look at the file system and draw the timeline since I only care about showing the user what's on disk. But you may have explored this already, which might lead me down the wrong track if you've already tried it. If not, I may take a crack at a lightweight solution. It would be great if there were one less moving part to worry about (and no SQLite DB means not having to import gaps in the data, easy recovery, and other icing-on-the-cake so to speak :) |
Also, one last thought for now :P Converting from SQLite to something like PostgreSQL or MySQL might be pretty straightforward. I'm trying to keep as few moving parts on my setups as possible, though (since there's already so many), and keep processor utilization down as much as possible, so I liked the SQLite DB at first sight. I think I will spend some time on the error not re-queueing the timeout. I'm not the best nodejs dev but I'm sure we're not the only ones who have had this issue with critical functions not being called / queued back up. |
To draw clips on the timeline, one could look to use ffinfo, along with the timestamp of the file being created/last modified, to accurately place them in time, the missing piece will be to associate any event data, and associate them and store the event data somewhere (other than SQLite :)) |
Makes perfect sense at this stage, might be a simple overlook on my part 👍 |
Yes - that is pretty much the implementation I was thinking. Ah - true true. I forgot about the events timeline (which was one of the reasons I chose this over others - that is very interesting down the road to me, and a simple API for it makes a lot of sense). Perhaps it might be worth exploring an option for that. Since the timeline of segments is critical for our uses, it would make sense to store event data in an optional database. If you're not using that feature (like us), then you can disable it. If you DO want that feature, event data is likely considered as non-critical compared to the video on disk itself. So using a more "fail-proof" mechanism ("this is what is on disk and that's all I know") makes sense to me if it does to you. |
Quick research suggests the "retry" module might be the ticket to patch it quickly :) Will give it a go in a branch. I also think I will hit this issue again pretty quickly - I'm testing this on several Pi's on actual poles right now so... one of them is bound to trip up (something is causing problems regularly on our side - which is good for you and I :D ) |
Progress: I've got debug statements running on a camera system right now via this fork/branch: Working on setting up a way to cause failure for the SQLite DB and test the retry module in another branch. |
Ok! I found it. Some error catching and logs went a long way. I am sorry I had to dirty up your clean code :D But the problem, I think, is more basic than I initially thought. Since the FIFO dequeue wasn't checking for an error on the run, it just drops those segments. I added an enqueue and reset the timeout. I tested this by simply I'll submit a pull request once I do more live testing on our setup. I've pushed my changes here and re-deployed on two test and live systems. Will report back with any findings :) This is the deployed codebase: |
Wow! You don't waste time 😅 Would you like to test MKV? I did take a small look at the other PR - but your environment might be a more substantial test.
All whilst using mkv. swap out the mp4 extension. Here Line 449 in ab9e402
And here Line 508 in ab9e402
|
Certainly! That one is very selfishly interesting to me, ha! One adjustment I'll need make to this issue's branch prior to that is to either revert to console.log or dig into making debug output to stdout instead of stderr - PM2 thought it was simply erroring out due to this but the logs themselves seem to indicate everything is fine. Happy to say using 3 minute chunks on 3 connected cameras to the pi, it didn't hiccup once with the patch. Oddly, at about 6:30am my time though (CST here), the process died in an errored state and didn't restart because of it, so I'll dig into that today. Once I verify we are good to go with logging, I'll push the mkv patch too. The MKV format is interesting to me because our analysts typically want to roll back the footage within a very short window (they pull footage and monitor on every 911 call). And we've seen all kinds of things go wrong on a pole (power sag/power loss/cabling coming loose inside our box, disk space, mounted drive unmounted, etc). So, lots of opportunity for improperly closing the mp4. And MKV will allow us to view partially saved footage. I've gotta go wear the suit and tie for a little while but once I can get some time to code I'll knock those two adjustments out (stdout and MKV testing). |
Ok - you have twisted my arm 😅 I have created a branch (2.1.0) - I will use this to merge any enhancements being worked on - so any PR's, address them to this branch for the time being. I think we can use the video file(s) as a way to provide the timeline content, and do away from any DB based index of segments. I'll wait to hear back from the MKV tests, before I venture down that road, but if the MKV files prove to work in the browser, this should be a smooth change - even event metadata could be made extremely simple also - a flat JSON file per event (that contains the linked segment file name) 😎 Quick Example (bf9d2a78-7425-4249-a559-39bc9ff85f6c-1660052517.json) {
"event": "Building Car Park Gates Opened",
"cameraId": "52e5b562-1a1c-4ae8-88a9-c92dc94b497b",
"linkedSegment": "yyyy-mm-dd-hh:mm.mkv",
"sensorId": "bf9d2a78-7425-4249-a559-39bc9ff85f6c",
"timestamp": 1660052517
}
|
Awesome! I love the idea of a flat file metadata setup. One thing I will need to is to checksum the video upon creation, for example, so as to provide proof that the file was not tampered with (if used in court in the U.S., chain of custody is a big deal when you can't prove authenticity of a file; with a checksum on creation, none of that matters much). So that change could be interesting. Perhaps a video file + a metadata/events file along with each file on disk? IE: for any events coming in, they'd get appended to a file named the same as the video, but with a json extension. This way, handing all details related to a clip is just two file downloads. Re: 2.1.0 branch - will do! |
That's the ticket! A metadata file per segment. 52e5b562-1a1c-4ae8-88a9-c92dc94b497b.1660054102.json {
"segment": {
"cameraId": "52e5b562-1a1c-4ae8-88a9-c92dc94b497b",
"fileName": "52e5b562-1a1c-4ae8-88a9-c92dc94b497b.1660054102.mkv",
"startTime": 1660054102,
"endTime": 1660098736,
"checksum": "sha256:xxxxxxxxxxxxxx"
},
"events": [
{
"event": "Building Car Park Gates Opened",
"sensorId": "bf9d2a78-7425-4249-a559-39bc9ff85f6c",
"timestamp": 1660052517
},
{
"event": "Building Car Park Gates Opened",
"sensorId": "bf9d2a78-7425-4249-a559-39bc9ff85f6c",
"timestamp": 1660052517
}
]
} These files then drive the timeline UI. |
PERFECT (for my use cases at least)! I am happy to let you know that everything has been working perfectly with my branch's patch to re-enqueuing the writes to the SQLDB for over 12 hours now. I'll let it keep running for another 12 before I update that box to using MKV's. I'm going to deploy this to another system now after making the MKV adjustment. Question for my current branch, though - I do not submit pull requests often (ever), so I'm not all that versed on how best to handle this. I have a pending branch that re-queues the SQLDB write. I actually need this in production atm, so am curious if you would:
Thanks for the guidance! |
Hey @KeithHanson, Believe it or not. - I have made massive progress, in removing the need for SQLite altogether (I also don't waste time 😄) This is what will be coming (have written the logic for).....
File name examples:
These files will be used to drive the UI. This should remove a truck load of surface area for problems to occur.
Once I am happy, you can migrate to the latest version. I have not touched this project in a few weeks - so thanks for getting me excited about it again 😄 Importantly, no difference in the UI, its all backend changes. - but the UI should benefit by being able to view recorded footage that is still being written to (well on the basis MKV works 😅 ) |
Absolutely! And I'm excited you're excited :) Agreed about the PR! Sounds like my changes won't be worth much since this will now be flat-file based. Patching in and deploying the MKV now - will let you know after a couple hours of footage :) |
By all means, once the new (better) enhancements are in place, feel free to rebrand it - Its OSS after all 😉 Renamed to [Enhancement]: Review Storage mechanics PLUS! 😅 |
I do think making that configurable would be a great update :) But I also don't mind proudly using open-source and the branding that comes with :) So! I just patched in MKV files, deployed them to another system, and watched two segments hit the timeline.
A quick google seems to indicate that MKV support is just not in Firefox and from what I can see, not coming :/ Thoughts? This isn't a deal breaker for us here since we use Chrome anyways. |
That make sense, since live streaming uses the MP4 container (well fMP4 - with trickery using websockets) See, I split the one input (camera) to 2 outputs [Live, File] - using one FFMPEG instance per camera, IMO - The benefit to be had with MKV outweighs the use of Firefox, so at this point I can look past that 😅 I could test Fragmented MKV - but will likely be pushing my luck 😄 The live stream is passed via web sockets to a custom media file handler - the browser thinks its a single file (but really its live) it uses the |
My side of the world will be sleeping soon. |
Ok. I can say that: it works reliably on both test machines. BUT... For whatever reason, loading MP4 segments are significantly faster than loading MKV segments. I have both in my timeline and recorded a test for you to see the dramatic difference. Basically, MKV took about 4-5 seconds to load PER SEGMENT (with 3-minute segments - very small compared to regular operation of 15-minute segments), and MP4 took about half a second, reliably. I have tested it during multiple periods today, and the behavior has been the same. MKV is significantly slower to load than MP4. I have no idea why, though :/ |
Ok, I have identified why MKV is slower. As part of the recording chain, I move each files internal metadata to the start of the file (its usually at the end by default) This allows quicker loading of the file into the Browser (by only loading a small portion to get to the metadata and start playing whilst it still downloading), and guess what, MKV does not allow this, so even though I add the flag to the ffmpeg chain, it has no affect to MKV files. Meaning the browser has to grab the entier file (to get to the meta info at the end of the file, before it can start playing) Turns out also, that MKV is really only supported by Chrome, I'm going to opt out of using MKV, but if one wanted to, there is a value you can change in code. I'm almost done with using flat files - so far its working great! |
Ahhh that makes sense! I think I am with you. Snappier is better for realtime/in-the-moment lookups. I set the segments small enough that I don't think we will need to worry about that. That is looking really good! I'm excited to begin testing it 😁 One thing I've noticed on our machines, but PM2 registers the NVRJS.js as errored after some time. When I check PM2 logs NVRJS, I don't see anything that would indicate it errored. I'll try to dig in when I can today, but if you have some insight there as well, we'd appreciate it 😁 I can also open another issue on it to track if you'd like. But that is really the only problem I'm bumping into with my requeue patch in place while waiting for the flat file updates 😁 |
Mmmm - I'm not sure how PM2 identifies something as errored, it might be picking up some FFMPEG debug output, and freaking out maybe. I don't use PM2 myself these days (well... I did when I created this, hence the example start up 😅) One thing to note about the upcoming version (3.0.0) Unless you manually create the JSON files for each segment 😬 but I will also push it to the branch I created (2.1.0 - which I'll rename to 3.0.0 once uploaded) 3.0.0 also allows you to add events to the live view. |
Overhaul complete. Its not published yet to NPM - but you should be able to pull down the branch and use it. |
Going to test this immediately :) THANK YOU! :) I will keep digging into the PM2 issue. I love it due to PM2.io's interface, custom actions, metrics, etc. It's pretty nice when managing our 20+ camera systems that will eventually be 100+. One question:
Our storage is pretty much temp storage anyways - the important parts are the live monitoring and clip retrieval in the moment. So not having it show up in the timeline for previous isn't a huge deal breaker, though I may just blow it away and start fresh. Not a huge deal, but could become important in the future to have some sort of "Catch up" script. Most important thing is the video is on disk and files get deleted over time when space or retention dictates. |
Ah, that makes a lot of sense why that's not as straightforward as I was hoping. Yes - that would be fine - as long as we're able to recover automatically we are happy :) |
Just an update - I am moving the Meta Creation Logic over to listen for FFMPEG directly after all. |
Roger that! Thanks for the update! |
@marcus-j-davies Any updates? :) :) :) Current metadata bug is affecting us - I'm solving it with a cron-based restart of the service for now. If you think it will be more than a few more days (no judgement! promise! just planning next steps), I'll go ahead and knock out the backfill task for us and have that on a periodic run & reboot service along with that every hour and I don't think we'll see any further metadata writing issues. If I can help in any way further than that, please let me know! I don't want to step on your toes, though as I'm certain you'll produce a better outcome than I could :P Again, many thanks for such a great tool and all the collaboration with us! |
Hi @KeithHanson , Only a couple of lines left really. Will ping you shortly |
The patch is ready to test. To Disable Login: module.exports = {
/* System Settings */
system: {
/* Disable Security - Know what your doing before changing this! */
disableUISecurity: true,
.....
}
} Changes {timestamp}_placeholder.json {timestamp}.json metafile creation is now tied to FMPEG activity, and is no longer 'watching' for new files, which I still believe seem to have problems recovering after IO errors. |
THANK YOOOOOU!!!! Will deploy tonight! |
Got excited. Deployed to one of our systems for testing and so far things are working :P I've got about 30+ systems I'll deploy to tonight to verify. |
Ok - I've got this running on 10 of our live deployments as of about 20 minutes ago :) Going to let this run for 24 hours or so and report back :) |
Oooh, as a bonus - I moved the files from one folder to the renamed camera folders and bam - everything was picked up. I did have to restart the service for it to show in the timeline, but suuuuper convenient! |
Ok, I have deployed to all systems we have under control. ~30. Something else I thought of that is a win because of this is the way we want to backup our data. I'm planning on having a second NVRJS running on 81, but it will pull the low quality streams to disk. We're designing an algorithm that will pick the most appropriate "buddy" to back up to and from and kick off some kind of copy process. Because of these changes, we can easily obtain at least low quality footage by simply using the interface and update the config to display the backed up folder <3 This lets us focus on the hard/critical part for us (choosing the right pole to back up to with a variety of factors) while getting a UI for free! :) :) :) |
You are excited! 😆 There are APIs built in to fetch segments (metadata), current system utilisation and camera details, if you wanted to create a dashboard to monitor all instances (not footage, more health)- i have not spent a great deal documenting them but they do exist. Also if I am reading your comments correctly - NVRJS has not been tested to run 80+ cameras (i.e pole 81) - just one to bare in mind. Let me know if the recent patch fixes the missing meta files (I'm hoping they have) |
30 total systems - We'll get to 80 before end of year :) Excellent - we will definitely tap into that - we have a heartbeat service that checks all kinds of things relevant to us (disks, encryption status, temps, software versions, etc). |
We did hit an issue on two systems. If the metadata file is corrupted for any reason, parsing it fails and NVRJS goes into a boot loop.
I am fairly sure we just need a try-catch here: https://github.com/marcus-j-davies/nvr-js/blob/v3.0.0/NVRJS.js#L494 I patched in a try/catch and things seemed to work:
|
Yeah - a try catch should be added in a couple of places really. What did 1660650120.json & 1660650121.json look like? Just want to make sure I am doing nothing silly, and its more IO Disk errors, when writing files. |
Almost undoubtedly disk error - we have to deal with it on our crappy power grid and recover automatically (sad, I know - drink one for me in commiseration :P). Both files were empty. |
If you want I can add in this try catch for now and submit a pull request. I've patched the two systems that failed manually for now. If you're already off to the races on the patch I can hang back ofc :) |
Cool - well not, but you know 😅 I'll add a try catch to any reading of a file (and writing for good measure). EDIT:
Already on it |
Awesome :) And that is fine - I'm not sure if there is any magical code that could solve that lol. But that's also a reason I reduce things to 3 minute chunks (faster streaming, faster downloading, less risk of missing important things because of gremlins/dragons like this). |
Just did a count of all the systems. Deployed most recent commit to 41 raspberry pi's with 2TB and 4TB drives attached :) I tested it on the previously failing system, and everything went smoothly. Thank you! :) I'll report in if I find anything. |
Rock solid :D |
Nice! This is with the recent patch to stop reading Corrupted JSON files? The changes mode here. |
Correct! I deployed it after I saw the update. I see some gaps here and there, but that's from whatever failures happened. But the only problem I've had is HDD space filling up at this point :D Just means I need to tune the retention. I've spot checked about 10 of the 41 systems and those that didn't fill up their drive have a full 2 days of history! Brilliant! :) EDIT: they all have little gaps here and there (3-6 minutes), but it's obvious our code is recovering from the issue, and your code is handling the recovery gracefully :) |
Assuming all is well? |
i think this is a great project, and it gives me a good idea to make a NVR system. thank you very much.
i think when i scroll the mouse wheel, the |
Hi @trc-turing, Version 3 has removed SQLIte entirely, and its now based on a JSON file per segment, this seems to have removed a lot of problems, v3 is currently being used in a very large installation, that seems to be quite stable. See v3 Change Log As for nvr-js/web/static/js/scripts.js Line 78 in 5639de6
in other other words, the scroll/zoom must be static for 500ms - try messing with this value. |
thanks for your reply. |
Yup, the timeline needs to load data based on the time span in view +- 2 hours. As the NVR can be running for weeks/months/years at a time, I don't load everything - as the browser could be overloaded with data, can you imagine 1 min segments over a week? That's 10,080 segments in the timeline! But then zooming out will do the same 😅 I therefore need to cap it based on the current view. I load what timespan is in view (plus 2 hours) You can override the 2 hour buffer by changing SearchTimeBufferHours in the scripts file. At the moment, I don't have a method (or time) to stop the unnecessary loading. I can probably improve it, but I need to find the time todo so, and it's currently not a priority of mine. I welcome PRs if you want to contribute 😇 |
wait published to npm |
Hello!
Firstly, I would like to commend you on the work you've done here. It's precisely what I need for a very important project for my city as well.
I am creating a fork of the repository now to begin diving into the codebase and attempt to determine what may be happening on our end, so please know I plan on rolling up my sleeves and helping contribute to a solution, NOT just log the bug :P
Anyhow, please see below:
ISSUE: Many hours after starting PM2 service for NVRJS, the timeline does not show video segments on the timeline.
Context:
We are looking to use NVRJS for the camera systems we've built utilizing Raspberry Pi's, PoE cams + switch, and USB harddrives.
We are so very close thanks to your work here. But after testing for roughly a week, we see timeline issues.
We DO see that it is properly restarting the ffmpeg processes and the files are logging to disk.
So everything seems to be working (ffmpeg, UI), except for some reason the segments stored to disk.
Also, NVRJS runs rock solid (haven't seen it rebooting over and over or anything for days at a time).
Once I DO end up restarting NVRJS, the timeline begins working normally, though is missing files that are definitely on disk.
I'll log here if I make progress on it!
Thank you for any insight you can help with though :)
The text was updated successfully, but these errors were encountered: