【Go】bufio.Scannerを使う際の注意点

先日Goでスクリプトを書いているときに以下のようなバグに遭遇したので書き残しておきます。


以下のようなファイルがありました。

{"title": "Eloquent JavaScript, Second Edition","description": "JavaScript lies at the heart of almost every modern web application, from social apps to the newest browser-based games. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications."}
{"title": "Learning JavaScript Design Patterns","description": "With Learning JavaScript Design Patterns, you'll learn how to write beautiful, structured, and maintainable JavaScript by applying classical and modern design patterns to the language. If you want to keep your code efficient, more manageable, and up-to-date with the latest best practices, this book is for you."}
{"title": "Speaking JavaScript","description": "Like it or not, JavaScript is everywhere these days-from browser to server to mobile-and now you, too, need to learn the language or dive deeper than you have. This concise book guides you into and through JavaScript, written by a veteran programmer who once found himself in the same position."}
{"title": "Programming JavaScript Applications","description": "Take advantage of JavaScript's power to build robust web-scale or enterprise applications that are easy to extend and maintain. By applying the design patterns outlined in this practical book, experienced JavaScript developers will learn how to write flexible and resilient code that's easier-yes, easier-to work with as your code base grows."}
{"title": "Understanding ECMAScript 6","description": "ECMAScript 6 represents the biggest update to the core of JavaScript in the history of the language. In Understanding ECMAScript 6, expert developer Nicholas C. Zakas provides a complete guide to the object types, syntax, and other exciting changes that ECMAScript 6 brings to JavaScript."}
{"title": "You Don't Know JS","description": "No matter how much experience you have with JavaScript, odds are you don’t fully understand the language. As part of the "You Don’t Know JS" series, this compact guide focuses on new features available in ECMAScript 6 (ES6), the latest version of the standard upon which JavaScript is built."}
{"title": "Git Pocket Guide","description": "This pocket guide is the perfect on-the-job companion to Git, the distributed version control system. It provides a compact, readable introduction to Git for new users, as well as a reference to common commands and procedures for those of you with Git experience."}
{"title": "Designing Evolvable Web APIs with ASP.NET","description": "Design and build Web APIs for a broad range of clients—including browsers and mobile devices—that can adapt to change over time. This practical, hands-on guide takes you through the theory and tools you need to build evolvable HTTP services with Microsoft’s ASP.NET Web API framework. In the process, you’ll learn how design and implement a real-world Web API."}

(サンプルはこちらより拝借しました。)

一見するとJSONですが、[]で囲まれておらず、行末にカンマもないのでそのままではJSONとして処理することはできません。

ここでやりたかったことは、(1)各行を構造体としてデコードすること、(2)構造体に生のバイト列を保持すること、の二点です。 そのためbufio.Scannerを使い、以下のようなスクリプトを書いて処理しようとしました。

type Book struct {
    Title       string `json:"title"`
    Description string `json:"description"`
    Raw         []byte
}

func main() {
    f, _ := os.Open("./sample.txt")

    var books []Book
    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        var b Book
        bytes := scanner.Bytes()
        b.Raw = bytes

        json.Unmarshal(bytes, &b)

        books = append(books, b)
    }

    for _, b := range books {
        log.Println(string(b.Raw))
    }
}

これの出力、どうなるでしょうか。

以下のとおりです。

$ go run main.go 
2019/02/23 12:59:45 {"title": "Designing Evolvable Web APIs with ASP.NET","description": "Design and build Web APIs for a broad range of clients—including browsers and mobile devices—that can adapt to change over time. This practical, hands-on guide takes you through the theory and tools you need to build evolvable HTTP services with Microsoft’s 
2019/02/23 12:59:45 SP.NET Web API framework. In the process, you’ll learn how design and implement a real-world Web API."} you'll learn how to write beautiful, structured, and maintainable JavaScript by applying classical and modern design patterns to the language. If you want to keep your code efficient, more manageable, and up-to-date with the latest best practices, this book is for you."}
2019/02/23 12:59:45 {"title": "Speaking JavaScript","description": "Like it or not, JavaScript is everywhere these days-from browser to server to mobile-and now you, too, need to learn the language or dive deeper than you have. This concise book guides you into and through JavaScript, written by a veteran programmer who once found himself in the same position."}
2019/02/23 12:59:45 {"title": "Programming JavaScript Applications","description": "Take advantage of JavaScript's power to build robust web-scale or enterprise applications that are easy to extend and maintain. By applying the design patterns outlined in this practical book, experienced JavaScript developers will learn how to write flexible and resilient code that's easier-yes, easier-to work with as your code base grows."}
2019/02/23 12:59:45 {"title": "Understanding ECMAScript 6","description": "ECMAScript 6 represents the biggest update to the core of JavaScript in the history of the language. In Understanding ECMAScript 6, expert developer Nicholas C. Zakas provides a complete guide to the object types, syntax, and other exciting changes that ECMAScript 6 brings to JavaScript."}
2019/02/23 12:59:45 {"title": "You Don't Know JS","description": "No matter how much experience you have with JavaScript, odds are you don’t fully understand the language. As part of the "You Don’t Know JS" series, this compact guide focuses on new features available in ECMAScript 6 (ES6), the latest version of the standard upon which JavaScript is built."}
2019/02/23 12:59:45 {"title": "Git Pocket Guide","description": "This pocket guide is the perfect on-the-job companion to Git, the distributed version control system. It provides a compact, readable introduction to Git for new users, as well as a reference to common commands and procedures for those of you with Git experience."}
2019/02/23 12:59:45 {"title": "Designing Evolvable Web APIs with ASP.NET","description": "Design and build Web APIs for a broad range of clients—including browsers and mobile devices—that can adapt to change over time. This practical, hands-on guide takes you through the theory and tools you need to build evolvable HTTP services with Microsoft’s ASP.NET Web API framework. In the process, you’ll learn how design and implement a real-world Web API."}

よく見てみると1、2行目がおかしなことになっています。

どうもb.Rawが上書きされているようです。そこで以下の行をループ内に追加し、scanner.Bytes()が返す値の先頭位置を出力してみます。

log.Printf("&bytes[0] = %x\n", &bytes[0])
$ go run main.go 
2019/02/23 12:59:45 &bytes[0] = c00008e000
2019/02/23 12:59:45 &bytes[0] = c00008e14f
2019/02/23 12:59:45 &bytes[0] = c00008e2c9
2019/02/23 12:59:45 &bytes[0] = c00008e422
2019/02/23 12:59:45 &bytes[0] = c00008e5bb
2019/02/23 12:59:45 &bytes[0] = c00008e715
2019/02/23 12:59:45 &bytes[0] = c00008e86d
2019/02/23 12:59:45 &bytes[0] = c00008e000

1行目と8行目が同じ位置を指しており、最初のループの内容を8周目のscanner.Scan()が上書きしてしまっているのがわかりました。

というわけで正しい動作をするコードが以下です。

func main() {
    f, _ := os.Open("./sample.txt")

    var books []Book
    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        var b Book
        bytes := scanner.Bytes()
        b.Raw = make([]byte, len(bytes))
        copy(b.Raw, bytes)

        json.Unmarshal(bytes, &b)

        books = append(books, b)
    }

    for _, b := range books {
        log.Println(string(b.Raw))
    }
}

b.Rawにはmakeで新しいsliceをアサインしておき、scanner.Bytes()で返ってきた値をcopyすることで上書きを防ぎます。これにより以下のような正しい結果を得ることができました。

$ go run main.go 
2019/02/23 13:01:51 {"title": "Eloquent JavaScript, Second Edition","description": "JavaScript lies at the heart of almost every modern web application, from social apps to the newest browser-based games. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications."}
2019/02/23 13:01:51 {"title": "Learning JavaScript Design Patterns","description": "With Learning JavaScript Design Patterns, you'll learn how to write beautiful, structured, and maintainable JavaScript by applying classical and modern design patterns to the language. If you want to keep your code efficient, more manageable, and up-to-date with the latest best practices, this book is for you."}
2019/02/23 13:01:51 {"title": "Speaking JavaScript","description": "Like it or not, JavaScript is everywhere these days-from browser to server to mobile-and now you, too, need to learn the language or dive deeper than you have. This concise book guides you into and through JavaScript, written by a veteran programmer who once found himself in the same position."}
2019/02/23 13:01:51 {"title": "Programming JavaScript Applications","description": "Take advantage of JavaScript's power to build robust web-scale or enterprise applications that are easy to extend and maintain. By applying the design patterns outlined in this practical book, experienced JavaScript developers will learn how to write flexible and resilient code that's easier-yes, easier-to work with as your code base grows."}
2019/02/23 13:01:51 {"title": "Understanding ECMAScript 6","description": "ECMAScript 6 represents the biggest update to the core of JavaScript in the history of the language. In Understanding ECMAScript 6, expert developer Nicholas C. Zakas provides a complete guide to the object types, syntax, and other exciting changes that ECMAScript 6 brings to JavaScript."}
2019/02/23 13:01:51 {"title": "You Don't Know JS","description": "No matter how much experience you have with JavaScript, odds are you don’t fully understand the language. As part of the "You Don’t Know JS" series, this compact guide focuses on new features available in ECMAScript 6 (ES6), the latest version of the standard upon which JavaScript is built."}
2019/02/23 13:01:51 {"title": "Git Pocket Guide","description": "This pocket guide is the perfect on-the-job companion to Git, the distributed version control system. It provides a compact, readable introduction to Git for new users, as well as a reference to common commands and procedures for those of you with Git experience."}
2019/02/23 13:01:51 {"title": "Designing Evolvable Web APIs with ASP.NET","description": "Design and build Web APIs for a broad range of clients—including browsers and mobile devices—that can adapt to change over time. This practical, hands-on guide takes you through the theory and tools you need to build evolvable HTTP services with Microsoft’s ASP.NET Web API framework. In the process, you’ll learn how design and implement a real-world Web API."}

bufio.Scannerを使うときは気をつけよう、という話でした。