Pre
前置
org.apache.catalina.manager.HTMLManagerServlet
org.apache.catalina.manager.HTMLManagerServlet#upload
org.apache.catalina.core.ApplicationPart#getSubmittedFileName
|
"
(前提条件是里面有\
字符),那么就会去掉跳过从第二个字符开始,并且末尾也会往前移动一位,同时会忽略字符\
,师傅只提到了类似test.\war
这样的例子。filename=""y\4.\w\arK" 。
深入
org.apache.catalina.core.ApplicationPart#getSubmittedFileName
当中,一看到这个将字符串转换成map的操作总觉得里面会有更骚的东西(这里先是解析传入的参数再获取,如果解析过程有利用点那么也会影响到后面参数获取),不扯远继续回到正题Content-Disposition
当中的值,如果以form-data
或者attachment
开头就会进行我们的解析操作,跟进去一看果不其然,看到RFC2231Utility
瞬间不困了*
Asterisks ("*") are reused to provide the indicator that language and character set 2 information is present and encoding is being used. A single quote ("'") is used to delimit the character set and language information at the beginning of the parameter value. Percent signs ("%") are used as the encoding flag, which agrees with RFC 2047.
Specifically, an asterisk at the end of a parameter name acts as an indicator that character set and language information may appear at the beginning of the parameter value. A single quote is used to separate the character set, language, and actual value information in the parameter value string, and an percent sign is used to flag octets encoded in hexadecimal. For example:
Content-Type: application/x-stuff;
title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
public static String decodeText(final String encodedText) throws UnsupportedEncodingException {
final int langDelimitStart = encodedText.indexOf('\'');
if (langDelimitStart == -1) {
// missing charset
return encodedText;
}
final String mimeCharset = encodedText.substring(0, langDelimitStart);
final int langDelimitEnd = encodedText.indexOf('\'', langDelimitStart + 1);
if (langDelimitEnd == -1) {
// missing language
return encodedText;
}
final byte[] bytes = fromHex(encodedText.substring(langDelimitEnd + 1));
return new String(bytes, getJavaCharset(mimeCharset));
}
@param encodedText - Text to be decoded has a format of {@code <charset>'<language>'<encoded_value>}
,分别是编码,语言和待解码的字符串,同时这里还适配了对url编码的解码,也就是fromHex
函数,具体代码如下,其实就是url解码private static byte[] fromHex(final String text) {
final int shift = 4;
final ByteArrayOutputStream out = new ByteArrayOutputStream(text.length());
for (int i = 0; i < text.length();) {
final char c = text.charAt(i++);
if (c == '%') {
if (i > text.length() - 2) {
break; // unterminated sequence
}
final byte b1 = HEX_DECODE[text.charAt(i++) & MASK];
final byte b2 = HEX_DECODE[text.charAt(i++) & MASK];
out.write((b1 << shift) | b2);
} else {
out.write((byte) c);
}
}
return out.toByteArray();
}
支持编码的解码 值当中可以进行url编码 @code<charset>'<language>'<encoded_value> 中间这位language可以随便写,代码里没有用到这个的处理
|
//res
{"Big5","Big5-HKSCS","CESU-8","EUC-JP","EUC-KR","GB18030","GB2312","GBK","IBM-Thai","IBM00858","IBM01140","IBM01141","IBM01142","IBM01143","IBM01144","IBM01145","IBM01146","IBM01147","IBM01148","IBM01149","IBM037","IBM1026","IBM1047","IBM273","IBM277","IBM278","IBM280","IBM284","IBM285","IBM290","IBM297","IBM420","IBM424","IBM437","IBM500","IBM775","IBM850","IBM852","IBM855","IBM857","IBM860","IBM861","IBM862","IBM863","IBM864","IBM865","IBM866","IBM868","IBM869","IBM870","IBM871","IBM918","ISO-2022-CN","ISO-2022-JP","ISO-2022-JP-2","ISO-2022-KR","ISO-8859-1","ISO-8859-13","ISO-8859-15","ISO-8859-2","ISO-8859-3","ISO-8859-4","ISO-8859-5","ISO-8859-6","ISO-8859-7","ISO-8859-8","ISO-8859-9","JIS_X0201","JIS_X0212-1990","KOI8-R","KOI8-U","Shift_JIS","TIS-620","US-ASCII","UTF-16","UTF-16BE","UTF-16LE","UTF-32","UTF-32BE","UTF-32LE","UTF-8","windows-1250","windows-1251","windows-1252","windows-1253","windows-1254","windows-1255","windows-1256","windows-1257","windows-1258","windows-31j","x-Big5-HKSCS-2001","x-Big5-Solaris","x-COMPOUND_TEXT","x-euc-jp-linux","x-EUC-TW","x-eucJP-Open","x-IBM1006","x-IBM1025","x-IBM1046","x-IBM1097","x-IBM1098","x-IBM1112","x-IBM1122","x-IBM1123","x-IBM1124","x-IBM1166","x-IBM1364","x-IBM1381","x-IBM1383","x-IBM300","x-IBM33722","x-IBM737","x-IBM833","x-IBM834","x-IBM856","x-IBM874","x-IBM875","x-IBM921","x-IBM922","x-IBM930","x-IBM933","x-IBM935","x-IBM937","x-IBM939","x-IBM942","x-IBM942C","x-IBM943","x-IBM943C","x-IBM948","x-IBM949","x-IBM949C","x-IBM950","x-IBM964","x-IBM970","x-ISCII91","x-ISO-2022-CN-CNS","x-ISO-2022-CN-GB","x-iso-8859-11","x-JIS0208","x-JISAutoDetect","x-Johab","x-MacArabic","x-MacCentralEurope","x-MacCroatian","x-MacCyrillic","x-MacDingbat","x-MacGreek","x-MacHebrew","x-MacIceland","x-MacRoman","x-MacRomania","x-MacSymbol","x-MacThai","x-MacTurkish","x-MacUkraine","x-MS932_0213","x-MS950-HKSCS","x-MS950-HKSCS-XP","x-mswin-936","x-PCK","x-SJIS_0213","x-UTF-16LE-BOM","X-UTF-32BE-BOM","X-UTF-32LE-BOM","x-windows-50220","x-windows-50221","x-windows-874","x-windows-949","x-windows-950","x-windows-iso2022jp"}
UTF-16BE
filename=""y\4.\w\arK"
改成filename="UTF-16BE'Y4tacker'%00%22%00y%00%5C%004%00.%00%5C%00w%00%5C%00a%00r%00K"
------WebKitFormBoundaryQKTY1MomsixvN8vX
Content-Disposition: form-data*;;;;;;;;;;name*="UTF-16BE'Y4tacker'%00d%00e%00p%00l%00o%00y%00W%00a%00r";;;;;;;;filename*="UTF-16BE'Y4tacker'%00%22%00y%00%5C%004%00.%00%5C%00w%00%5C%00a%00r%00K"
Content-Type: application/octet-stream
123
------WebKitFormBoundaryQKTY1MomsixvN8vX--
变形 更新2022-06-20
org.apache.tomcat.util.http.fileupload.ParameterParser#parse(char[], int, int, char)
函数进行深入分析paramValue = parseQuotedToken(new char[] {separator });
,其实是按照分隔符;
分割,因此我们不难想到前面的东西其实可以不用"
进行包裹,在parseQuotedToken最后返回调用的是return getToken(true);
,这个函数也很简单就不必多解释private String getToken(final boolean quoted) {
// Trim leading white spaces
while ((i1 < i2) && (Character.isWhitespace(chars[i1]))) {
i1++;
}
// Trim trailing white spaces
while ((i2 > i1) && (Character.isWhitespace(chars[i2 - 1]))) {
i2--;
}
// Strip away quotation marks if necessary
if (quoted
&& ((i2 - i1) >= 2)
&& (chars[i1] == '"')
&& (chars[i2 - 1] == '"')) {
i1++;
i2--;
}
String result = null;
if (i2 > i1) {
result = new String(chars, i1, i2 - i1);
}
return result;
}
parse
解析参数时可以不被包裹,结合getToken函数我们可以知道在最后一个参数其实就不必要加;
了,并且解析完通过params.get("filename")
获取到参数后还会调用到org.apache.tomcat.util.http.parser.HttpParser#unquote
那也可以基于此再次变形扩大利用面
javax.servlet.http.HttpServletRequest#getParts
即可,简化了我们文件上传的代码负担(如果我是开发人员,我肯定首选也会使用,谁不想当懒狗呢)
|
更新Spring 2022-06-20
public interface MultipartFile extends InputStreamSource {
String getName(); //获取参数名
@Nullable
String getOriginalFilename();//原始的文件名
@Nullable
String getContentType();//内容类型
boolean isEmpty();
long getSize(); //大小
byte[] getBytes() throws IOException;// 获取字节数组
InputStream getInputStream() throws IOException;//以流方式进行读取
default Resource getResource() {
return new MultipartFileResource(this);
}
// 将上传的文件写入文件系统
void transferTo(File var1) throws IOException, IllegalStateException;
// 写入指定path
default void transferTo(Path dest) throws IOException, IllegalStateException {
FileCopyUtils.copy(this.getInputStream(), Files.newOutputStream(dest));
}
}
org.springframework.web.multipart.support.StandardMultipartHttpServletRequest#parseRequest
,抄个文件上传demo来进行测试分析Spring4
springboot1.5.20.RELEASE
内置Spring4.3.23
,具体小版本之间是否有差异这里就不再探究org.springframework.web.multipart.support.StandardMultipartHttpServletRequest#parseRequest
的调用也有些不同private void parseRequest(HttpServletRequest request) {
try {
Collection<Part> parts = request.getParts();
this.multipartParameterNames = new LinkedHashSet(parts.size());
MultiValueMap<String, MultipartFile> files = new LinkedMultiValueMap(parts.size());
Iterator var4 = parts.iterator();
while(var4.hasNext()) {
Part part = (Part)var4.next();
String disposition = part.getHeader("content-disposition");
String filename = this.extractFilename(disposition);
if (filename == null) {
filename = this.extractFilenameWithCharset(disposition);
}
if (filename != null) {
files.add(part.getName(), new StandardMultipartHttpServletRequest.StandardMultipartFile(part, filename));
} else {
this.multipartParameterNames.add(part.getName());
}
}
this.setMultipartFiles(files);
} catch (Throwable var8) {
throw new MultipartException("Could not parse multipart servlet request", var8);
}
}
filename*
格式的private String extractFilename(String contentDisposition, String key) {
if (contentDisposition == null) {
return null;
} else {
int startIndex = contentDisposition.indexOf(key);
if (startIndex == -1) {
return null;
} else {
//截取filename=后面的内容
String filename = contentDisposition.substring(startIndex + key.length());
int endIndex;
//如果后面开头是“则截取”“之间的内容
if (filename.startsWith("\"")) {
endIndex = filename.indexOf("\"", 1);
if (endIndex != -1) {
return filename.substring(1, endIndex);
}
} else {
//可以看到如果没有“”包裹其实也可以,这和当时陈师分享的其中一个trick是符合的
endIndex = filename.indexOf(";");
if (endIndex != -1) {
return filename.substring(0, endIndex);
}
}
return filename;
}
}
}
Spring5
parseRequest
private void parseRequest(HttpServletRequest request) {
try {
Collection<Part> parts = request.getParts();
this.multipartParameterNames = new LinkedHashSet(parts.size());
MultiValueMap<String, MultipartFile> files = new LinkedMultiValueMap(parts.size());
Iterator var4 = parts.iterator();
while(var4.hasNext()) {
Part part = (Part)var4.next();
String headerValue = part.getHeader("Content-Disposition");
ContentDisposition disposition = ContentDisposition.parse(headerValue);
String filename = disposition.getFilename();
if (filename != null) {
if (filename.startsWith("=?") && filename.endsWith("?=")) {
filename = StandardMultipartHttpServletRequest.MimeDelegate.decode(filename);
}
files.add(part.getName(), new StandardMultipartHttpServletRequest.StandardMultipartFile(part, filename));
} else {
this.multipartParameterNames.add(part.getName());
}
}
this.setMultipartFiles(files);
} catch (Throwable var9) {
this.handleParseFailure(var9);
}
}
filename.startsWith("=?") && filename.endsWith("?=")
,可以看出Spring对文件名也是支持QP编码org.springframework.http.ContentDisposition#parse
filename*
,同样获取值是截取"
之间的或者没找到就直接截取=
后面的部分filename*
后面的处理逻辑就是else分之,可以看出和我们上面分析spring4还是有点区别就是这里只支持UTF-8/ISO-8859-1/US_ASCII
,编码受限制int idx1 = value.indexOf(39);
int idx2 = value.indexOf(39, idx1 + 1);
if (idx1 != -1 && idx2 != -1) {
charset = Charset.forName(value.substring(0, idx1).trim());
Assert.isTrue(StandardCharsets.UTF_8.equals(charset) || StandardCharsets.ISO_8859_1.equals(charset), "Charset should be UTF-8 or ISO-8859-1");
filename = decodeFilename(value.substring(idx2 + 1), charset);
} else {
filename = decodeFilename(value, StandardCharsets.US_ASCII);
}
decodeFilename
RFC 5987
文档规定的Header字符就直接调用baos.write写入
|
%
然后16进制解码后两位,其实就是url解码,简单测试即可参考文章
https://www.ch1ng.com/blog/264.htmlhttps://datatracker.ietf.org/doc/html/rfc6266#section-4.3https://datatracker.ietf.org/doc/html/rfc2231https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.1https://y4tacker.github.io/2022/02/25/year/2022/2/Java%E6%96%87%E4%BB%B6%E4%B8%8A%E4%BC%A0%E5%A4%A7%E6%9D%80%E5%99%A8-%E7%BB%95waf(%E9%92%88%E5%AF%B9commons-fileupload%E7%BB%84%E4%BB%B6)/https://docs.oracle.com/javaee/7/api/javax/servlet/http/Part.html#getSubmittedFileName--http://t.zoukankan.com/summerday152-p-13969452.html#%E4%BA%8C%E3%80%81%E5%A4%84%E7%90%86%E4%B8%8A%E4%BC%A0%E6%96%87%E4%BB%B6multipartfile%E6%8E%A5%E5%8F%A3