使用 CodeQL 分析 Dubbo RCE

使用 CodeQL 分析 Dubbo RCE
2022-8-16 09:11:15 Author: 编码安全研究(查看原文) 阅读量:19 收藏

参考文章地址：

https://securitylab.github.com/research/apache-dubbo/

我只是学习一下pwntester的博客，写下自己的学习笔记，以加深自己对CodeQL的使用与理解，同时学一下pwntester的挖掘思路；

本次分析的Dubbo版本为：Dubbo-2.7.8

作者博客中延伸的一些其他资料：

作者之前的一些Dubbo RCE分析：https://securitylab.github.com/advisories/GHSL-2021-034_043-apache-dubbo/
Dubbo 2.7支持的协议：https://dubbo.apache.org/en/docs/v2.7/user/references/protocol/
CVE-2021-25641分析文章：https://checkmarx.com/blog/the-0xdabb-of-doom-cve-2021-25641/

这里借用LGTM进行数据库生成（因为项目太大了，构建数据库需要很久），具体做法为：

克隆Dubbo到本地（主要是为了本地代码分析）；
切换版本到Dubbo v2.7.8，然后复制到另一个文件夹，push到自己的github；
从LGTM导入刚才创建的github仓库，使用LGTM进行数据库生成；
从LGTM下载数据库到本地进行分析；

首先使用git查看dubbo-2.7.8的commit id：

然后checkout到这个commit id对应的版本：

首先需要保证本地能正常使用Maven进行编译（本地能编译通过了，那LGTM肯定也可以编译通过）。然后复制整个项目文件夹，使用git上传到github，然后LGTM导入这个仓库，会自动进行数据库生成：

数据库生成成功以后，直接下载数据库文件即可进行本地分析（70多m）：

个人认为CodeQL虽然很灵活，但是也只能作为辅助工具，要想靠它挖洞，还是得熟悉被挖的产品。只有熟悉了产品后，才能准确构建基于CodeQL的漏洞模型；所以pwntester第一步也是熟悉产品；Dubbo 2.7的说明文档为：

https://dubbo.apache.org/en/docs/v2.7/user/preface/architecture/

首先熟悉Dubbo的基本架构：

主要是如下几个节点角色：

Registry：注册中心，负责服务发现和配置；
Consumer：消费者，用于调用远程服务；
Provider：提供者，负责提供（暴露）远程服务；
Monitor：监视器，负责统计服务调用次数和耗时；
Container：容器，负责管理服务的生命周期；

各个节点的调用关系如下：

0：Container负责启动、加载和运行Provider；
1：Provider在启动时将其服务注册到Register；
2：Consumer在启动时从Register订阅它需要的服务；
3：Register将Providers列表返回给Consumer，当它发生变化时，Register会通过长连接将变化后的数据推送给Consumer；
4：Consumer根据软负载均衡算法选择其中一个Provider并执行调用，如果失败则选择另一个Provider；
5：Consumer 和Provider都会统计内存中的服务调用次数和耗时，每分钟将统计信息发送给 Monitor；

同时查看官方文档，发现Dubbo支持多种协议来进行服务暴露，官方文档为：https://dubbo.apache.org/en/docs/v2.7/user/references/protocol/

默认使用的是Dubbo协议，关于Dubbo协议的详细介绍为：https://dubbo.apache.org/en/docs/v2.7/user/references/protocol/dubbo/

关于provider和consumer之间的通信，一个模型图如下：

具体的各个角色的介绍如下：

Transporter：通信框架，用于将网络数据传输到Server处理器，可以看到，Transporter会对网络数据进行解码等操作，然后传递给Server处理器；Transporter可以使用mina, netty, grizzy，默认使用的是netty；
Serialization：dubbo, hessian2, java, json；
Dispatcher：all, direct, message, execution, connection；
ThreadPool：fixed, cached；

最后需要重点熟悉的就是Dubbo的调用链以及一些实现细节：

Dubbo调用链详细说明：https://dubbo.apache.org/zh/docs/v2.7/dev/design/#调用链
Dubbo实现细节说明：https://dubbo.apache.org/zh/docs/v2.7/dev/implementation/#fn:5

如下是Dubbo的调用链：

我个人对Dubbo调用链的一些简单理解如下：

（1）consumer消费服务：

首先看到consumer消费一个服务的过程：

首先 ReferenceConfig 类的 init 方法调用 Protocol 的 refer 方法生成 Invoker 实例(如上图中的红色部分)，这是服务消费的关键。接下来把 Invoker 转换为客户端需要的接口(如：HelloWorld)；

这里选择不同的 Protocal 会生成不同的 Invoker 实例；

（2）调用provider服务：

通过上面的步骤，我们拿到了一个客户端接口（其实是一个Invoker的proxy对象），通过这个proxy调用所代理的Invoker对服务端的exporter进行通信，服务端的exporter再调用其具体的Invoker，最终调用具体的服务实现类：

其实整个过程，有点类似 RMI 中的 stub 和 skeleton 的通信过程；那么类似 RMI ，consumer 端的 Invoker 和 provider 端的 Exporter 之间的通信，也就是网络层的通信，很有可能发生反序列化漏洞；

总结一下，Dubbo的consumer和provider整个通信过程中，比较重要的几个角色（按照provider端进行服务调用的先后顺序排列）：

Transporter：网络传输层，Dubbo默认使用Netty来传输数据；
Codec：编解码器，对网络数据进行编解码；
Serialization：序列化器，对网络数据进行序列化或者反序列化；

数据从Transporter到Seriallization的过程中，很有可能发生反序列化漏洞的；

“
Authentication and authorization and also every piece of code an attacker can interact with is commonly referred to as the attack surface of the application. 身份验证和授权、以及攻击者可以与之交互的每一段代码，通常被称为应用程序的攻击面。
”

作者首先是对应用程序（Dubbo）的攻击面进行一个基本的认识。如果应用程序不是很大，可以人工查看源码，但是像Dubbo这种大型软件，人工太耗时，所以此时可以采用CodeQL这种自动化工具来帮助认识应用程序的攻击面。

CodeQL中有一个自带的类型：RemoteFlowSource ，这个类型用于表示常见的外部数据源，可以通过这个类型查看Dubbo的所有外部数据源：

from DataFlow::Node source
where
    source instanceof RemoteFlowSource
    // 不处理test代码
    and not source.getLocation().getFile().getRelativePath().matches("%/src/test/%")
select
    source,
    // 因为有的source的DeclaringType可能是内部类，所以最好也查看一下source所在的java文件
    source.getLocation().getFile().getRelativePath().regexpCapture("(.*?/)+(\\S+\\.java)", 2) as javaFile,
    source.getEnclosingCallable().getDeclaringType() as sourceDeclaringType,
    source.(RemoteFlowSource).getSourceType() as sourceType

扫描结果如下：

通过这个扫描结果，可以知道，Dubbo的外部交互点（攻击面）主要是如下几个点：

reverse DNS lookup（第1、2行）；
HTTP client responses（第3行）；
基于Hessian、HTTP、XmlRpc协议的HTTP Servelet Request；

此时问题来了，这几个攻击面，都和Dubbo协议不相关，前面我们通过阅读官方文档知道Provider可以通过Dubbo协议暴露服务，但是这里却没有扫描到Dubbo协议相关的RemoteFlowSource，很显然，扫描结果是不完整的；

最主要的原因是RemoteFlowSource这个模型并不包括Dubbo协议的网络数据流；所以我们需要使用CodeQL创建一个新的模型；前面我们阅读官方文档，可以知道Dubbo默认使用的Transporter是Netty，Transporter负责处理网络层，也就是Consumer和Provider之间通信的协议处理：

因为Dubbo默认使用Netty作为通信框架，所以我们需要对Netty进行CodeQL建模才能扫描到类似Dubbo协议的外部交互点（当然，如果采用其他通信框架，则需要对其他通信框架做CodeQL建模）；

Netty建模

建模之前，我们需要掌握Netty的基本原理，可以查看这几篇文章结合官方文档进行快速学习：

Dubbo中使用Netty：https://blog.csdn.net/weixin_38308374/article/details/106630211
Netty4的ChannelInboundHandler：https://blog.csdn.net/qq_26323323/article/details/84226845
Netty中Pipline和ChannelHandler讲解：https://www.cnblogs.com/luoxn28/p/11963736.html

此处注意，本次分析的Dubbo 2.7.8版本中，存在两个版本的Netty（分别是Netty3和Netty4）：

我们先直接看到 org.apache.dubbo.remoting.transport.netty4.NettyServer#doOpen ，通过上面的学习，知道这是一个标准的Netty启动流程，需要关注的主要是添加的如下几个 ChanelHander ；我们关注的是攻击面，所以需要特别关注 ChanelHandler 的一个子接口 ChannelInboundHandler ，这个子接口会处理消息读取，很显然是我们关注的攻击面。此外， ByteToMessageDecoder 作为Netty的自带解码器，也是我们需要关注的，它有两个方法：decode 和 decodeLast ，这两个方法用于处理消息解码。

查看 ChannelInboundHandler 接口的官方api文档，可以知道，当Netty接收到客户端的消息后，会调用 io.netty.channel.ChannelInboundHandler#channelRead 方法接收外部传来的参数，这也是把它作为攻击面的原因。然后那个 userEventTriggered 方法会在事件触发时调用，常用于心跳捕获机制，且携带一个Object类型的参数作为事件实体，但是事件的发送者是内部，攻击者不可控，所以我们无需关心这个方法了：

同时，通过上面的学习，我们也能知道在Netty环境下，我们一般不是直接实现 ChannelInboundHandler 接口，而是通过继承 ChannelInboundHandlerAdapter 类、 ChannelDuplexHandler 类，或者通过继承 SimpleChannelInboundHandler 类来操作（这三个类实现了大部分接口方法，通过继承这三个类就无需全接口实现）。如果使用的是 SimpleChannelInboundHandler ，只需实现其 channelRead0 方法。如果使用的是其他的两个类，则需要实现 channelRead 方法；

同时通过阅读Netty的文档，可以发现Netty3还存在一个 ChannelUpstreamHandler 类，它的 handleUpstream 可以做一些消息，所以这个类也是我们需要关注的攻击面，类似的，我们一般是使用这个接口的一个子类 SimpleChannelUpstreamHandler ，而不是直接使用接口，使用这个子类的话，只需要重写其 messageReceived 方法接口进行消息事件的处理；

总结一下，现在Netty的攻击面可以概括为如下几个：

ChannelInboundHandler 接口的 channelRead 方法；
SimpleChannelInboundHandler 抽象类（实现了 ChannelInboundHandler 接口）的 channelRead0 方法（这个方法内部调用 channelRead 方法）；
ChannelUpstreamHandler 接口的 handleUpstream 方法；
SimpleChannelUpstreamHandler 类（实现了ChannelUpstreamHandler 接口）的 messageReceived 方法（这个方法内部调用 handleUpstream 方法）；
ByteToMessageDecoder 类的 decode 和 decodeLast 方法；

所以，我们现在可以构造如下Netty入站模型：

import java
import semmle.code.java.dataflow.FlowSources
import semmle.code.java.security.PathCreation
import DataFlow::PathGraph// 定义 ChannelInboundHandler 类
class ChannelInboundHandlerClass extends Class{
    ChannelInboundHandlerClass(){
        this.getASourceSupertype*().hasQualifiedName("io.netty.channel", "ChannelInboundHandler")
    }
}
// 定义 ChannelInboundHandler#channelRead(ChannelHandlerContext var1, Object var2) 方法
class ChannelReadMethod extends Method{
    ChannelReadMethod(){
        this.getName() = ["channelRead","channelRead0"]
        and this.getDeclaringType() instanceof ChannelInboundHandlerClass
    }
}
// 定义位于 ChannelInboundHandler#channelRead() 方法的外部source
class ChannelReadSource extends RemoteFlowSource{
    ChannelReadSource(){
        exists(ChannelReadMethod m |m.getParameter(1) = this.asParameter())
    }
    override string getSourceType() { result = "ChannelReadSource" }
}
// ==============================================================
// 定义 ChannelUpstreamHandler 类
class ChannelUpstreamHandlerClass extends Class{
    ChannelUpstreamHandlerClass(){
        this.getASourceSupertype*().hasQualifiedName("org.jboss.netty.channel", "ChannelUpstreamHandler")
    }
}
// 定义 ChannelUpstreamHandler#handleUpstream 方法
class HandleUpstreamMethod extends Method{
    HandleUpstreamMethod(){
        this.getName() = ["handleUpstream","messageReceived"]
        and this.getDeclaringType() instanceof ChannelUpstreamHandlerClass
    }
}
// 定义位于 HandleUpstreamMethod#HandleUpstream 方法的外部source
class HandleUpstreamSource extends RemoteFlowSource{
    HandleUpstreamSource(){
        exists( HandleUpstreamMethod m| m.getParameter(1) = this.asParameter())
    }
    override string getSourceType() { result = "HandleUpstreamSource" }
}
// ==============================================================
// 定义 ByteToMessageDecoder 类
class ByteToMessageDecoderClass extends Class{
    ByteToMessageDecoderClass(){
        this.getASourceSupertype*().hasQualifiedName("io.netty.handler.codec", "ByteToMessageDecoder")
    }
}
// 定义 ByteToMessageDecoder#decode 方法
class DecodeMethod extends Method{
    DecodeMethod(){
        this.getName() = ["
decode","decodeLast"]
        and this.getDeclaringType() instanceof ByteToMessageDecoderClass
    }
}// 定义位于 ByteToMessageDecoder#decode 方法的外部source
class ByteToMessageDecodeSource extends RemoteFlowSource{
    ByteToMessageDecodeSource(){
        exists( DecodeMethod m| m.getParameter(1) = this.asParameter())
    }
    override string getSourceType() { result = "ByteToMessageDecodeSource" }
}// ======================查询基于Netty的所有攻击面======================
from RemoteFlowSource source
where 
    (
        source instanceof ChannelReadSource
        or source instanceof HandleUpstreamSource
        or source instanceof ByteToMessageDecodeSource
    )
    and not source.getEnclosingCallable().getFile().getRelativePath().matches("%/src/test/%")
select
    source,
    // 因为有的source的DeclaringType可能是内部类，所以最好也查看一下source所在的java文件
    source.getLocation().getFile().getRelativePath().regexpCapture("(.*?/)+(\\S+\\.java)", 2) as javaFile,
    source.getEnclosingCallable().getDeclaringType() as sourceDeclaringType,
    source.(RemoteFlowSource).getSourceType() as sourceType

扫描结果如下，这个结果比之前扫描的结果更加完整，通过下面的扫描结果，可以知道Dubbo大概的攻击面（攻击者可控的入参点），结果并不多，可以一个个大概看下、了解下：

至于为啥会比pwntester多2个结果，主要是pwntester写错了：

经过我的修正后，可以这么写（上面已给出）：

Pre-auth RCE in the RPC invocation decoder

通过前面对Dubbo网络通信的学习，我们知道消息传递到provider，首先需要进行解码，这部分主要是 InternalDecoder （暂且称之为编解码处理器）负责，Netty Server注册编解码处理器的代码如下：

// 代码位置：org.apache.dubbo.remoting.transport.netty4.NettyServer#doOpen
bootstrap.group(bossGroup, workerGroup)
          .channel(NettyEventLoopFactory.serverSocketChannelClass())
          .option(ChannelOption.SO_REUSEADDR, Boolean.TRUE)
          .childOption(ChannelOption.TCP_NODELAY, Boolean.TRUE)
          .childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
          .childHandler(new ChannelInitializer<SocketChannel>() {
              @Override
              protected void initChannel(SocketChannel ch) throws Exception {
                  // FIXME: should we use getTimeout()?
                  int idleTimeout = UrlUtils.getIdleTimeout(getUrl());
                  NettyCodecAdapter adapter = new NettyCodecAdapter(getCodec(), getUrl(), NettyServer.this);
                  if (getUrl().getParameter(SSL_ENABLED_KEY, false)) {
                      ch.pipeline().addLast("negotiation",
                              SslHandlerInitializer.sslServerHandler(getUrl(), nettyServerHandler));
                  }
                  ch.pipeline()
                          .addLast("decoder", adapter.getDecoder()) // 此处添加解码器
                          .addLast("encoder", adapter.getEncoder())
                          .addLast("server-idle-handler", new IdleStateHandler(0, 0, idleTimeout, MILLISECONDS))
                          .addLast("handler", nettyServerHandler);
              }
          });

这个外部交互点在前面通过CodeQL其实也查出来过：

编解码器是一个 Codec2 接口，它有如下实现类：

这里每一个 Codec2 实现类，都是一个外部可控的 source ，且可以由攻击者选择使用哪个具体的 Codec2 。这里pwntester选择 DubboCodec 作为分析的入口；DubboCodec 类继承于 ExchangeCodec ，DubboCodec 没重写 decode 方法，是直接使用父类的方法；而 ExchangeCode#decode() 方法会根据请求的魔数选择进行不同的处理：

这里pwntester选择第二个流程作为分析入口；也就是 org.apache.dubbo.rpc.protocol.dubbo.DubboCodec#decodeBody 作为source：

可以看到这里是一系列的数据解析操作，通过最开始学习的Dubbo调用链图中，可以知道，provider解析数据的流程如下（从下往上）：

现在我们分析的流程已经进入了 Codec ，接下来就是使用 Serialization 进行反序列化，看到 org.apache.dubbo.common.serialize.Serialization#deserialize ：

/**
 * Get a deserialization implementation instance
 *
 * @param url URL address for the remote service
 * @param input the underlying input stream
 * @return deserializer
 * @throws IOException
 */
@Adaptive
ObjectInput deserialize(URL url, InputStream input) throws IOException;

这个接口方法会返回一个反序列化器的实现实例，是一个 ObjectInput 类型的对象，这个 ObjectInput 也是一个接口，看到他的接口方法：

/**
 * Object input interface.
 */
public interface ObjectInput extends DataInput {    /**
     * Consider use {@link #readObject(Class)} or {@link #readObject(Class, Type)} where possible
     *
     * @return object
     * @throws IOException if an I/O error occurs
     * @throws ClassNotFoundException if an ClassNotFoundException occurs
     */
    @Deprecated
    Object readObject() throws IOException, ClassNotFoundException;
    /**
     * read object
     *
     * @param cls object class
     * @return object
     * @throws IOException if an I/O error occurs
     * @throws ClassNotFoundException if an ClassNotFoundException occurs
     */
    <T> T readObject(Class<T> cls) throws IOException, ClassNotFoundException;
    /**
     * read object
     *
     * @param cls object class
     * @param type object type
     * @return object
     * @throws IOException if an I/O error occurs
     * @throws ClassNotFoundException if an ClassNotFoundException occurs
     */
    <T> T readObject(Class<T> cls, Type type) throws IOException, ClassNotFoundException;
    /**
     * The following methods are customized for the requirement of Dubbo's RPC protocol implementation. Legacy protocol
     * implementation will try to write Map, Throwable and Null value directly to the stream, which does not meet the
     * restrictions of all serialization protocols.
     *
     * <p>
     * See how ProtobufSerialization, KryoSerialization implemented these methods for more details.
     * <p>
     * <p>
     * The binding of RPC protocol and biz serialization protocol is not a good practice. Encoding of RPC protocol
     * should be highly independent and portable, easy to cross platforms and languages, for example, like the http headers,
     * restricting the content of headers / attachments to Ascii strings and uses ISO_8859_1 to encode them.
     * https://tools.ietf.org/html/rfc7540#section-8.1.2
     */
    default Throwable readThrowable() throws IOException, ClassNotFoundException {
        Object obj = readObject();
        if (!(obj instanceof Throwable)) {
            throw new IOException("Response data error, expect Throwable, but get " + obj);
        }
        return (Throwable) obj;
    }
    default Object readEvent() throws IOException, ClassNotFoundException {
        return readObject();
    }
    default Map<String, Object> readAttachments() throws IOException, ClassNotFoundException {
        return readObject(Map.class);
    }
}

所有方法都是 read 开头的，且都是返回一个反序列化对象，那么所有的 ObjectInput#read* 方法都应该被认为是 sink ；总结一下，现在CodeQL的污点数据流模型应该如下设计：

org.apache.dubbo.rpc.protocol.dubbo.DubboCodec#decodeBody 作为 source（这里pwntester先以这个为入口进行分析，当然其实可以选择其他的Codec作为入口）；
org.apache.dubbo.common.serialize.ObjectInput#read* 作为 sink ；

此时可以很容易的就编写如下CodeQL查询：

/**
 * @kind path-problem
 */
import java
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraphclass InsecureConfig extends TaintTracking::Configuration {
  InsecureConfig() { this = "InsecureConfig" }
  override predicate isSource(DataFlow::Node source) {
    exists(Method m |
      // org.apache.dubbo.rpc.protocol.dubbo.DubboCodec#decodeBody 作为source
      m.getParameter(1) = source.asParameter()
      and m.hasQualifiedName("org.apache.dubbo.rpc.protocol.dubbo", "DubboCodec", "decodeBody")
    )
  }
  override predicate isSink(DataFlow::Node sink) {
      exists(Call call |
        call.getCallee().getName().matches("read%")
        and call.getCallee().getDeclaringType().getASourceSupertype*().hasQualifiedName("org.apache.dubbo.common.serialize", "ObjectInput")
        and sink.asExpr() = call.getQualifier()
      )
  }
}
from InsecureConfig conf, DataFlow::PathNode source, DataFlow::PathNode sink
where conf.hasFlowPath(source, sink)
select sink, source, sink, "unsafe deserialization"

但是很遗憾，查询结果为空：

这种情况，一般都是因为污点数据流被断开了，其实这也是CodeQL的一个边界问题，CodeQL只会分析用户自己的代码，如果用户代码引用了第三方依赖，数据流到了第三方依赖，因为CodeQL不分析第三方依赖，所以此时对于CodeQL来讲，这个调用是一个黑盒，于是数据流就断开了；

最终确认是如下方法调用导致数据流断开（大约是因为这个接口是其他maven模块的，也就是外部依赖，所以导致数据流断开）：

org.apache.dubbo.common.serialize.Serialization#deserialize

使用如下谓词将断开的数据流连接起来，即可查出结果了：

override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) {
    exists(MethodAccess ma,Method m |
      m = ma.getMethod()
      and ma.getArgument(1) = n1.asExpr()
      and m.getDeclaringType().getASourceSupertype*().hasQualifiedName("org.apache.dubbo.common.serialize", "Serialization")
      and m.hasName("deserialize")
      and n2.asExpr() = ma
    )
  }

这些结果都是可能RCE的污点数据流。

因为时间关系，其实没写完，但是其实看到这里跟着pwntester的文章，后面的也能自己完成了；
CodeQL只能协助挖洞，不能拿着无脑的去扫描；
CodeQL也有边界，需要自己排坑；

如有侵权，请联系删除

推荐阅读

【入门教程】常见的Web漏洞--XSS

【入门教程】常见的Web漏洞--SQL注入

sql注入--入门到进阶

短信验证码安全常见逻辑漏洞

最全常见Web安全漏洞总结及推荐解决方案

常见的Web应用的漏洞总结（原理、危害、防御）

学习更多技术，关注我：

觉得文章不错给点个‘再看’吧

文章来源: http://mp.weixin.qq.com/s?__biz=Mzg2NDY1MDc2Mg==&mid=2247493331&idx=2&sn=81f11b3e92007a5757cfb8f8ba4704c7&chksm=ce64b5b6f9133ca0c5019996785dff63fe97b3ea74fbb94c44c2e3b9eeaa036ba8746a7bb055#rd
如有侵权请联系:admin#unsafe.sh