Skip to content

Connection reset by peer while writing to hdfs #31

@StrongestNumber9

Description

@StrongestNumber9

Describe the bug

from cfe_39 logs

java[17409]: java.io.IOException: Connection reset by peer
java[17409]:         at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[?:1.8.0_412]
java[17409]:         at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:141) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) ~[cfe_39.jar:0.2.0]
java[17409]:         at java.io.FilterInputStream.read(FilterInputStream.java:83) ~[?:1.8.0_412]
java[17409]:         at java.io.FilterInputStream.read(FilterInputStream.java:83) ~[?:1.8.0_412]
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:519) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1811) [cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1728) [cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:713) [cfe_39.jar:0.2.0]
java[17409]: 17:19:18.487 [Thread-7] WARN  org.apache.hadoop.hdfs.DataStreamer - Abandoning BP-1857759457-XXXXX-1708423446635:blk_1075894953_2154131
java[17409]: 17:19:18.492 [Thread-7] WARN  org.apache.hadoop.hdfs.DataStreamer - Excluding datanode DatanodeInfoWithStorage[XXXXXX:9004,DS-dd4c87a8-8cd5-4c39-a777-b5e459e20f23,DISK]
java[17409]: 17:19:18.501 [Thread-7] WARN  org.apache.hadoop.hdfs.DataStreamer - Exception in createBlockOutputStream blk_1075894954_2154132

also

java[17409]: org.apache.hadoop.ipc.RemoteException: File /tmp/test/example/20.10369256 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
java[17409]:         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
java[17409]:         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
java[17409]:         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
java[17409]:         at java.security.AccessController.doPrivileged(Native Method)
java[17409]:         at javax.security.auth.Subject.doAs(Subject.java:422)
java[17409]:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
java[17409]:         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
java[17409]:         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.Client.call(Client.java:1513) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.Client.call(Client.java:1410) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) ~[cfe_39.jar:0.2.0]
java[17409]:         at com.sun.proxy.$Proxy26.addBlock(Unknown Source) ~[?:?]
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531) ~[cfe_39.jar:0.2.0]
java[17409]:         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_412]
java[17409]:         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_412]
java[17409]:         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_412]
java[17409]:         at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_412]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) ~[cfe_39.jar:0.2.0]
java[17409]:         at com.sun.proxy.$Proxy27.addBlock(Unknown Source) ~[?:?]
java[17409]:         at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:713) [cfe_39.jar:0.2.0]
java[17409]: Exception in thread "example1" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/test/example/20.10369256 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
java[17409]:         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
java[17409]:         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
java[17409]:         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
java[17409]:         at java.security.AccessController.doPrivileged(Native Method)
java[17409]:         at javax.security.auth.Subject.doAs(Subject.java:422)
java[17409]:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
java[17409]:         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.HDFSWrite.commit(HDFSWrite.java:182)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.DatabaseOutput.accept(DatabaseOutput.java:333)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.DatabaseOutput.accept(DatabaseOutput.java:71)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.KafkaReader.read(KafkaReader.java:95)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.ReadCoordinator.run(ReadCoordinator.java:133)
java[17409]:         at java.lang.Thread.run(Thread.java:750)

from datanode logs

hdfs[2572896]: 2024-07-08 17:19:18,482 INFO datanode.DataNode: Failed to read expected SASL data transfer protection handshake from client at /XXXXX:46818. Perhaps the client is running an older version of Hadoop which does not support SASL data transfer protection
hdfs[2572896]: org.apache.hadoop.hdfs.protocol.datatransfer.sasl.InvalidMagicNumberException: Received 1c508e instead of deadbeef from client.
hdfs[2572896]:         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:374)
hdfs[2572896]:         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:308)
hdfs[2572896]:         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:135)
hdfs[2572896]:         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:236)
hdfs[2572896]:         at java.lang.Thread.run(Thread.java:750)

Expected behavior

Writes properly

How to reproduce

QA environment, kerberized hdfs

Software version

0.2.0 beta

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions