Hi,
for the past few months I've been trying to find a stable VDR+DXR3 setup. So far the most stable set of software has been the vdr-dxr3-0-2 CVS-branch, big thanks to the developers for that. However, there is still one major problem: every now and then, quite randomly, the picture freezes and the whole system jams. SSH connection doesn't work, but the puter seems to answer to ping. I'm not alone with this problem, I know of atleast two other persons having the same problem. Those people have quite different hardware, so it's most likely not a hardware issue. VDR 1.3.12 and the pre1 DXR3-drivers don't seem to have this problem, but lots of other issues. If you have any suggestions, need more details, anything, this is a desperate man writing, please reply.
On Tuesday 19 April 2005 19:00, Sami Hakkarainen wrote:
issues. If you have any suggestions, need more details, anything, this is a desperate man writing, please reply.
If the whole system hangs that means there's a problem with some driver in the kernel or the hardware, not vdr. But then you say it still responds to ping so there's no telling if it's just some runaway program spawning processes as fast as it can making the system too busy (perhaps a good idea to set the sshd process to -20 priority to make sure it can always do what it needs to do) to respond before timeout or if some part of the kernel actually decided to stop working.
"It just hangs" is not very much to go on but I guess the basic things to check are still worth mentioning, again.
First disable the ACPI and APIC (and MSI if you were crazy enough to compile it in) irq nonsense (ie boot with pci=noacpi noapic). Most (all?) distributions enable these two troublemakers by default. I honestly have no idea why. Then (after you disabled the acpi pci irq routing thing, very important) check for irq conflicts in lspci. Change cards around in the pci slots until you find a non-conflicting setup. Sure, they should happily share irqs but try tell that to my computers; they're not listening to the theory at all. Yours probably isn't either.
If you're using an AMD K7 cpu you probably want to disable athlon power saving with athcool in case your board (like both my boards (epox boards btw)) enables it by default. If you have this you've probably noticed it already. For me it causes constant dropouts in video and audio. Took a while to figure out what was going on. Sometimes the box just locks hard with this thing enabled. I didn't think it possible that some manufacturer would be stupid enough to enable this nonsense by default. Just leave it to epox, they'll manage to get it wrong.
Make sure you're using the latest em8300 from cvs. Disable NPTL if your system was compiled with it. Try different kernel versions, not just different vdr versions. Vanilla kernels are probably the best place to start testing. On my vdr box 2.6.9+cvs lirc+cvs em8300+cvs dvb-kernel appears to be a very stable combination. Anything above that and it's five days and the kernel decides to disable the dvb card's irq.
It would also be very interesting to see if there are any clues printed on the console or what kind of processes are running on the system when this "crash" occurs. You'll probably need a monitor and keyboard connected to the system to get this though.
Jukka Tastula wrote:
If you're using an AMD K7 cpu you probably want to disable athlon power saving with athcool in case your board (like both my boards (epox boards btw)) enables it by default.
OTOH I specifically *enabled* it with "athcool on" (which implies ACPI is on) and it never gave me a problem. And know what? I also have (apparently) local APIC irq routing enabled:
$ cat /proc/interrupts CPU0 0: 9421499 IO-APIC-edge timer 2: 0 XT-PIC cascade 4: 8179 IO-APIC-edge lirc_serial 7: 93 IO-APIC-edge parport0 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 15: 369 IO-APIC-edge ide1 18: 783785 IO-APIC-level em8300 19: 1919263 IO-APIC-level eth0, Skystar2 20: 0 IO-APIC-level ohci_hcd 21: 0 IO-APIC-level ohci_hcd NMI: 0 LOC: 9417805 ERR: 0 MIS: 0
I have the usual dxr3 problems but *never* a complete lockup (crossing fingers). Usually I can either restart vdr by blindly pressing the remote keys, or wait for the vdr timeout to kick in. Very rarely I have to login to the vdr machine and issue a "killall -KILL vdr", but even in those cases vdr happily restarts.
Bye
Sami Hakkarainen sami.hakkarainen@gmail.com writes:
However, there is still one major problem: every now and then, quite randomly, the picture freezes and the whole system jams. SSH connection doesn't work, but the puter seems to answer to ping.
My freeze can be various different type. Sometimes picture freezes, sometimes goes black. Sometimes only vdr is in strange state, and restarting it with ssh helps. Sometimes whole system dies, no ssh, no ping, not even working reset button.
Now I've had FF card, and are experimenting with it on my office. During last night whole system had crashed, keyboard not working, no ping.
Last messages included message from saa7110: ARM Crashed, and realoading DVB modules. So does not seem any better with FF card.
Jukka Tastula jukka.tastula@kotinet.com writes:
If the whole system hangs that means there's a problem with some driver in the kernel or the hardware, not vdr.
Yes, most likely. However, you could get similar results with process messing whole system with root privileges, but this is not the most likely reason.
But then you say it still responds to ping so there's no telling if it's just some runaway program spawning processes as fast as it can making the system too busy
It can be someting else also. Kernel responds ping even when it is halted. Some firewalls are running kernel to halt. Or system can just be in strange 'state'.
I'm currently having same kind of problems. I'm using VDR 1.3.23 + newest em8300 drivers and dxr3 -plugin from cachalot.mine.nu - My symptoms are halted picture. It also seems to be related to bad (low) signal from antenna. If signal is poor computer seems to halt much more often. System crashes maybe once a day (in good days) and several times if it's in bad mood.
I'll see if it writes any logs during crash. I let you know.
-Jere-
Markku Tavasti tavasti@iki.fi writes:
Now I've had FF card, and are experimenting with it on my office. During last night whole system had crashed, keyboard not working, no ping.
Last messages included message from saa7110: ARM Crashed, and realoading DVB modules. So does not seem any better with FF card.
Hmm, now same messages but no total crash, just vdr restart.
On Wed, 2005-04-20 at 08:21 +0300, Jere Malila wrote:
I'm currently having same kind of problems.
I used to have the same problems, but my system has been running stable for months now. I'm on cable, so it's probably not related to signal quality here. In my case, I believe the problem was solved by these operations (or a subset of them): - Upgraded to kernel 2.6.10 as distributed in Fedora Core 3. 2.6.11 works well too. - Upgraded to latest em8300 CVS HEAD. - Moved the DXR3 into a PCI slot where it does not share an IRQ with any other device.
On Wednesday 20 April 2005 07:21, Markku Tavasti wrote:
My freeze can be various different type. Sometimes picture freezes, sometimes goes black. Sometimes only vdr is in strange state, and restarting it with ssh helps. Sometimes whole system dies, no ssh, no ping, not even working reset button.
I actually planned way ahead with vdr hanging. I stopped trying to make vdr so stable it never needs maintenance and included a key on my remote to run killall -KILL vdr with irexec :)
I have to use it maybe two or three times a week.
The hang appears to happen pretty much the same way every time. When no one has been changing channels for a while (many many hours, never had it happen while I was actually watching something) cpu load drops to 0 and it just starts throwing up stuff like this in the logs
vdr[24832]: buffer usage: 70% (tid=6029322) vdr[24832]: buffer usage: 80% (tid=6029322) vdr[24832]: buffer usage: 90% (tid=6029322) vdr[24832]: buffer usage: 100% (tid=6029322) vdr[24832]: ERROR: 1 ring buffer overflow (177 bytes dropped) vdr[24832]: ERROR: 14383 ring buffer overflows (2704004 bytes dropped) vdr[24832]: ERROR: 14766 ring buffer overflows (2776008 bytes dropped) vdr[24832]: ERROR: 14682 ring buffer overflows (2760216 bytes dropped)
It'll keep doing this forever. Sometimes it hangs so bad you have to kill it and sometimes simply switching channels makes it go again.
This is really the only type of crash/hang I get. The rest of the system is always perfectly fine no matter what vdr does.
What seems to be the weirdest thing about this is that VDR 1.3.12 + DXR3 0.23-pre1 doesn't have this jamming problem. It does crash a lot, especially when viewing recordings, but it never takes the whole system down like the newer versions do. What is done differently in the newer versions that could cause this kind of behaviour? I think I'll soon be desperate enough to go through the old CVS versions one by one to find out when the changes that cause this have been made. It can't be a hardware issue if the older versions work fine, can it?
On 4/19/05, Jukka Tastula jukka.tastula@kotinet.com wrote:
On Tuesday 19 April 2005 19:00, Sami Hakkarainen wrote:
issues. If you have any suggestions, need more details, anything, this is a desperate man writing, please reply.
If the whole system hangs that means there's a problem with some driver in the kernel or the hardware, not vdr. But then you say it still responds to ping so there's no telling if it's just some runaway program spawning processes as fast as it can making the system too busy (perhaps a good idea to set the sshd process to -20 priority to make sure it can always do what it needs to do) to respond before timeout or if some part of the kernel actually decided to stop working.
"It just hangs" is not very much to go on but I guess the basic things to check are still worth mentioning, again.
First disable the ACPI and APIC (and MSI if you were crazy enough to compile it in) irq nonsense (ie boot with pci=noacpi noapic). Most (all?) distributions enable these two troublemakers by default. I honestly have no idea why. Then (after you disabled the acpi pci irq routing thing, very important) check for irq conflicts in lspci. Change cards around in the pci slots until you find a non-conflicting setup. Sure, they should happily share irqs but try tell that to my computers; they're not listening to the theory at all. Yours probably isn't either.
If you're using an AMD K7 cpu you probably want to disable athlon power saving with athcool in case your board (like both my boards (epox boards btw)) enables it by default. If you have this you've probably noticed it already. For me it causes constant dropouts in video and audio. Took a while to figure out what was going on. Sometimes the box just locks hard with this thing enabled. I didn't think it possible that some manufacturer would be stupid enough to enable this nonsense by default. Just leave it to epox, they'll manage to get it wrong.
Make sure you're using the latest em8300 from cvs. Disable NPTL if your system was compiled with it. Try different kernel versions, not just different vdr versions. Vanilla kernels are probably the best place to start testing. On my vdr box 2.6.9+cvs lirc+cvs em8300+cvs dvb-kernel appears to be a very stable combination. Anything above that and it's five days and the kernel decides to disable the dvb card's irq.
It would also be very interesting to see if there are any clues printed on the console or what kind of processes are running on the system when this "crash" occurs. You'll probably need a monitor and keyboard connected to the system to get this though.
vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
On Thursday 21 April 2005 15:54, Sami Hakkarainen wrote:
by one to find out when the changes that cause this have been made. It can't be a hardware issue if the older versions work fine, can it?
That's what too many people who overclock their cpus and memory to the limit and then try compile something say. "But my windows runs just fine, never had a crash!".
Hardware problems can be elusive but it doesn't mean they don't exist. I guess the older version of the program just doesn't poke the hardware where it's broken -> no crash/hang/jam. That is if it is broken to begin with, we're still not sure about that.
Zitat von Jukka Tastula jukka.tastula@kotinet.com:
On Thursday 21 April 2005 15:54, Sami Hakkarainen wrote:
by one to find out when the changes that cause this have been made. It can't be a hardware issue if the older versions work fine, can it?
That's what too many people who overclock their cpus and memory to the limit and then try compile something say. "But my windows runs just fine, never had a crash!".
Hardware problems can be elusive but it doesn't mean they don't exist. I guess the older version of the program just doesn't poke the hardware where it's broken -> no crash/hang/jam. That is if it is broken to begin with, we're still not sure about that.
cpuburn is your friend to test the stability of a new or changed system. Finds mmx/memory-transfer errors on broken mainboards/cpu's. It's designed to produce maximum heat/load on each cpu and finds errors by comparing calculated and expected results. Example: i found, that my old mainboard (labeled FSB133) did not reliable work at 133 MHz FSB although the right memory was used (tested with 3 different RAM chips), with 100 it was (almost) stable.
Typical userware might not be suitable for that job because of: 1. not stressing enough (look at the cpu-temp when cpuburn runs - it jumps +10°C almost) 2. most times the errors are not detected because the software just runs on producing wrong results, video glitches etc. but does not crash most the time
Sami Hakkarainen wrote:
However, there is still one major problem: every now and then, quite randomly, the picture freezes and the whole system jams. SSH connection doesn't work, but the puter seems to answer to ping. I'm
Could you try the latest dxr3 driver from:
http://dxr3.sourceforge.net/download/em8300-0.15.0.rc3.tar.gz
This has a tweak to part of the driver which handles the wait queues and i've not had any hard lockups on my machine since making this change.
Jon
Unfortunately this doesn't seem to help. Thanks for trying though.
On 4/30/05, Jon Burgess jburgess@uklinux.net wrote:
Sami Hakkarainen wrote:
However, there is still one major problem: every now and then, quite randomly, the picture freezes and the whole system jams. SSH connection doesn't work, but the puter seems to answer to ping. I'm
Could you try the latest dxr3 driver from:
http://dxr3.sourceforge.net/download/em8300-0.15.0.rc3.tar.gz
This has a tweak to part of the driver which handles the wait queues and i've not had any hard lockups on my machine since making this change.
Jon